Model Triage with Nimakai
Nimakai is a NVIDIA NIM model latency benchmarker written in Nim. Use it to find which models are responsive before configuring opencode and oh-my-opencode.
Installation
1. Install the Nim toolchain
curl https://nim-lang.org/choosenim/init.sh -sSf | bash -s -- -y
export PATH="$HOME/.nimble/bin:$PATH"
Requires Nim >= 2.0.0 (tested with 2.2.8).
2. Install system dependencies
apt-get install -y libssl-dev
3. Build nimakai
git clone https://github.com/dirmacs/nimakai.git /opt/nimakai
cd /opt/nimakai
nimble build
4. Set your API key
export NVIDIA_API_KEY=$(grep NVIDIA_API_KEY ~/.config/opencode/.env | cut -d= -f2)
Usage
Quick benchmark
Run a single round against all models:
./nimakai list
This pings every model in the catalog and displays a table sorted by average latency, showing health status (UP, TIMEOUT, ERROR, NOT_FOUND) and verdict (Perfect, Slow, Unstable).
Continuous monitoring
./nimakai roulette
Interactive TUI with live-updating metrics. Sort with keyboard: A (avg), P (p95), S (stability), T (tier), N (name), U (uptime).
Model discovery
./nimakai discover
Compares live NVIDIA API models against the built-in catalog to find new models.
Agent recommendations
./nimakai recommend
Suggests optimal model→agent assignments based on latency and capability.
Interpreting results
| Verdict | Latency | Suitability |
|---|---|---|
| Perfect | <500ms | Any agent role |
| Normal | 500ms-1s | Most agent roles |
| Slow | 1-3s | Heavy tasks only (hephaestus) |
| Very Slow | 3-10s | Avoid for agents |
| Unstable | >10s | Do not use |
Key insight: Ping latency does not predict agent task completion time. A model at 300ms ping might complete an agent task in 2s, while a 370ms model takes 20s. The difference is tool-use capability.
Direct tool-use test
After nimakai identifies responsive models, verify tool-use capability with a direct API call before adding to OMO:
curl -sS --max-time 20 "https://integrate.api.nvidia.com/v1/chat/completions" \
-H "Authorization: Bearer $NVIDIA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "MODEL_ID_HERE",
"messages": [{"role": "user", "content": "Read the file at /etc/hostname"}],
"tools": [{"type": "function", "function": {"name": "read_file",
"description": "Read file contents",
"parameters": {"type": "object", "properties": {"path": {"type": "string"}}, "required": ["path"]}}}],
"max_tokens": 256, "stream": false
}'
If the response contains a tool_calls array, the model can do agent work. If it returns plain text, it cannot.
Known model quirks (2026-03-13)
| Model | Issue |
|---|---|
| MiniMax M2 | 410 Gone — decommissioned from NIM |
| MiniMax M2.1 | Pings OK but hangs on agent tool-use tasks |
| Kimi K2.5 | Intermittent timeouts — NIM server-side |
| Mistral Medium 3 | Fast ping (208ms) but cannot do tool-use — returns text |
| Nemotron Super 49B | Tool-use works via curl but too slow for OMO timeouts |
| Nemotron 3 Super | 1M context, agentic-optimized. Use temperature=1.0, top_p=0.95 |
| Qwen 3.5 VLM | Correct model ID is qwen/qwen3.5-397b-a17b, not qwen3.5-400b |
Workflow: selecting models for aegis
- Run
./nimakai listto identify responsive models - Run direct tool-use curl test (above) to verify agent capability
- Update
/opt/aegis/example/modules/ai-tools/opencode.tomlwith verified models - Regenerate:
aegis opencode generate --input example/modules/ai-tools/opencode.toml - Test agents:
npx oh-my-opencode run --port 6000 --agent <name> --directory <dir> "<prompt>" - Always use
--port 6000or higher — default ports 4096-4100 get stuck from zombie servers