~/blog/nemoclaw-install-gx10-from-scratch

NemoClaw · part 2

[AI Agent] How to Install NemoClaw on DGX Spark (4 Undocumented Fixes)

cat --toc

TL;DR

NemoClaw on GX10 (GB10/SM121) needs four fixes that aren't in the official docs: upgrade Node.js to v22, manually run sudo npm link after the installer, install OpenShell from the tar.gz release (not the binary URL in the docs — that 404s), and add cgroupns=host to Docker daemon.json before onboard. After those, nemoclaw setup-spark and NEMOCLAW_NON_INTERACTIVE=1 nemoclaw onboard complete cleanly.

Plain-Language Version: Installing AI Agent Software on Unusual Hardware

Installing software on a computer usually means clicking "Download" and following the prompts. On specialized AI hardware like the NVIDIA DGX Spark (or the ASUS GX10, which uses the same chip), it is not that simple. The machine runs Linux, uses an ARM processor instead of the Intel/AMD chip in most PCs, and ships with older versions of some tools.

NemoClaw is NVIDIA's AI agent framework — software that lets an AI assistant run on your machine, read files, and execute tasks. The official install guide assumes you are on a standard setup. On the GX10, four things go wrong that the guide does not mention: the pre-installed Node.js is too old, a file permission step fails silently, a download link in the documentation is broken, and a Linux container setting needs to be changed.

None of these are hard to fix once you know about them. This article walks through each one in order, with the exact commands. Total time from a blank GX10 to a running AI agent sandbox: about 30 minutes.


Step 1: Upgrade Node.js to v22

NemoClaw requires Node.js 20+. The GX10 had v18.19.1 shipped with the system. The installer checks this early:

Error: Node.js version 18.19.1 is not supported. Please upgrade to 20 or later.

Upgraded to v22 via NodeSource:

curl -fsSL https://deb.nodesource.com/setup_22.x | sudo -E bash -
sudo apt-get install -y nodejs
node --version
# v22.22.1

v22 is fine even though the error only mentions 20+.

Step 2: Run the NemoClaw Installer

curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash

The installer exited with code 243 and printed:

nemoclaw.sh: line 47: BASH_SOURCE[0]: unbound variable

This is a set -euo pipefail issue: BASH_SOURCE[0] is unset when the script runs inside a curl | bash subshell. The error fires in the script's cleanup block — after the actual installation is already complete. Running nemoclaw --version immediately after:

nemoclaw v0.1.0

The CLI is installed at ~/.nemoclaw/source. The error is cosmetic but it is loud.

The installer tries to create a global symlink. That step failed:

npm error code EACCES
npm error syscall symlink
npm error path /usr/local/lib/node_modules/nemoclaw
npm error dest /usr/local/bin/nemoclaw
npm error errno -13

Standard npm global install permission issue on a system-managed Node.js. Fixed manually:

cd ~/.nemoclaw/source
sudo npm link

After this, nemoclaw is callable from any directory.

Step 4: Install OpenShell CLI

NemoClaw's onboard command requires the openshell CLI to be present. The official spark-install.md says to install it like this:

# This URL 404s — do NOT use
ARCH=$(uname -m)
curl -fsSL "https://github.com/NVIDIA/OpenShell/releases/latest/download/openshell-linux-${ARCH}" \
  -o /usr/local/bin/openshell

That URL returned HTTP 404. The actual release format is a tar.gz, not a bare binary. The correct install for aarch64 Linux:

curl -fsSL \
  "https://github.com/NVIDIA/OpenShell/releases/download/v0.0.13/openshell-aarch64-unknown-linux-musl.tar.gz" \
  -o /tmp/openshell.tar.gz
tar xzf /tmp/openshell.tar.gz -C /tmp
sudo mv /tmp/openshell /usr/local/bin/openshell
openshell --version
# openshell 0.0.13

For x86_64, replace aarch64-unknown-linux-musl with x86_64-unknown-linux-musl.

To get the current latest version programmatically:

curl -s https://api.github.com/repos/NVIDIA/OpenShell/releases/latest \
  | python3 -c "import sys,json; r=json.load(sys.stdin); print(r['tag_name'])"
# v0.0.13

Step 5: Fix Docker for cgroup v2

Ubuntu 24.04 ships with cgroup v2 (cgroup2fs). OpenShell's gateway container starts k3s internally, and k3s tries to create cgroup v1-style paths that don't exist under cgroup v2. Without a fix, onboard fails with:

K8s namespace not ready
openat2 /sys/fs/cgroup/kubepods/pids.max: no such file or directory
Failed to start ContainerManager: failed to initialize top level QOS containers

Verify cgroup version first:

stat -fc %T /sys/fs/cgroup/
# cgroup2fs   ← affected
# tmpfs       ← cgroup v1, no fix needed

If cgroup2fs, add cgroupns=host to Docker's daemon config:

sudo python3 -c "
import json, os
path = '/etc/docker/daemon.json'
d = json.load(open(path)) if os.path.exists(path) else {}
d['default-cgroupns-mode'] = 'host'
json.dump(d, open(path, 'w'), indent=2)
"
sudo systemctl restart docker

This makes all containers use the host cgroup namespace, which is what k3s expects. The existing containers (open-webui, pipelines, etc.) survived the restart without issues.

Step 6: Get an NVIDIA API Key

nemoclaw setup-spark and nemoclaw onboard both require a key from build.nvidia.com/settings/api-keys. The key starts with nvapi-. The free tier is valid for 30 days.

The key can be passed as an environment variable to skip the interactive prompt:

export NVIDIA_API_KEY=nvapi-...

Step 7: Run setup-spark

NVIDIA_API_KEY=nvapi-... nemoclaw setup-spark

Output:

>>> User 'coolthor' already in docker group
>>> Docker daemon already configured for cgroupns=host

>>> DGX Spark Docker configuration complete.
>>> Next step: run 'nemoclaw onboard' to set up your sandbox.

setup-spark checks Docker group membership and the daemon config, then exits. On GX10 — after the manual fixes above — it completes in under a second with nothing to do. On a fresh DGX Spark straight from the box it would apply the fixes automatically.

Step 8: Run onboard

The default nemoclaw onboard is interactive — it prompts for sandbox name, model selection, and other configuration. Since this was running over SSH without a TTY, the interactive mode stalled. Passing NEMOCLAW_NON_INTERACTIVE=1 uses defaults throughout:

NVIDIA_API_KEY=nvapi-... NEMOCLAW_NON_INTERACTIVE=1 nemoclaw onboard

The output scrolled through 7 stages:

[1/7] Preflight checks
  ✓ Docker is running
  ✓ openshell CLI: openshell 0.0.13
  ✓ Port 8080 available (OpenShell gateway)
  ✓ NVIDIA GPU detected: 1 GPU(s), 122502 MB VRAM

[2/7] Starting OpenShell gateway
  Gateway nemoclaw ready.
  ✓ Active gateway set to 'nemoclaw'

[3/7] Creating sandbox
...

[7/7] Done

The k3s startup logs printed during stage 2 are verbose — hundreds of lines of Kubernetes reconciler output. They're not errors. The process completed cleanly.

Verification:

nemoclaw status
# Sandboxes:
#   my-assistant * (nvidia/nemotron-3-super-120b-a12b)

nemoclaw my-assistant status
# Phase: Ready

The sandbox is up, policy enforcement is active, and the agent is connected to nvidia/nemotron-3-super-120b-a12b via NVIDIA's cloud inference API.

What Was Gained

What cost the most time: The OpenShell binary URL. spark-install.md documents a URL that 404s on the current release. The actual asset is a tar.gz with a different naming convention. The error from curl -fsSL on a 404 is silent by default — it just saves an HTML error page as the binary, which then fails with a confusing "Exec format error" on execution. Running curl -v would have shown the 404 immediately.

Transferable diagnostic: When a binary fails with Exec format error on Linux, the first thing to check is whether the download actually succeeded. file <binary> will show ASCII text instead of ELF if you saved an HTML page. GitHub's release asset naming is not consistent across versions — always look at the actual release page, not docs that reference a specific URL.

The pattern that applies everywhere: curl | bash installers on non-x86 hardware with strict bash mode have two independent failure points: the download step (architecture mismatch, URL rot) and the script's own bash compatibility. Both can fail silently in ways that look like a complete install failure when only part of the process failed. Check artifacts after every step.

Installation Checklist

For NemoClaw on GX10 / DGX Spark (aarch64, Ubuntu 24.04, cgroup v2):

  1. node --version → must be 20+. If not: install via NodeSource v22
  2. curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash → ignore exit code 243
  3. cd ~/.nemoclaw/source && sudo npm link
  4. Download OpenShell from GitHub releases (tar.gz), not the URL in the docs
  5. Check stat -fc %T /sys/fs/cgroup/ → if cgroup2fs, add cgroupns=host to daemon.json
  6. Get NVIDIA API key from build.nvidia.com (30-day free)
  7. NVIDIA_API_KEY=nvapi-... nemoclaw setup-spark
  8. NVIDIA_API_KEY=nvapi-... NEMOCLAW_NON_INTERACTIVE=1 nemoclaw onboard
  9. nemoclaw status → should show sandbox in Ready phase

Also in this series: Part 1 — NemoClaw: What It Is, Why It Exists, and How It Works

FAQ

Why does the NemoClaw installer fail on DGX Spark / GX10?
The installer fails for four reasons not covered in official docs: Node.js must be v20+ (GX10 ships with v18), npm link needs sudo on system-managed Node, the OpenShell binary URL in the docs returns 404 (the actual release is a tar.gz), and Ubuntu 24.04's cgroup v2 requires adding cgroupns=host to Docker daemon.json before onboard.
How do I install OpenShell on aarch64 Linux for NemoClaw?
Download the tar.gz from GitHub releases (not the binary URL in the docs, which 404s). For aarch64: curl the openshell-aarch64-unknown-linux-musl.tar.gz from the latest release, extract it, and move the binary to /usr/local/bin/openshell. For x86_64, use the x86_64-unknown-linux-musl variant.
What is the 'Exec format error' when running openshell on Linux?
This usually means the download failed silently — curl saved an HTML error page (404 response) instead of the actual binary. Run 'file openshell' to check: it will show 'ASCII text' instead of 'ELF'. Re-download from the correct GitHub release URL (tar.gz format).
How do I fix the cgroup v2 error during NemoClaw onboard?
Ubuntu 24.04 uses cgroup v2 by default, but OpenShell's internal k3s expects cgroup v1 paths. Add '"default-cgroupns-mode": "host"' to /etc/docker/daemon.json and restart Docker. Verify with 'stat -fc %T /sys/fs/cgroup/' — if it shows 'cgroup2fs', the fix is needed.