Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Introduction

ARES is a multi-provider LLM platform that gives you a single, unified API to route requests across Groq, Anthropic, NVIDIA DeepSeek, and Ollama. It handles tool calling, retrieval-augmented generation (RAG), multi-step workflows, streaming, usage metering, and multi-tenant isolation out of the box — so you can focus on building your AI application instead of stitching together provider SDKs.

Key capabilities

  • Multi-provider LLM routing — Send requests to Groq, Anthropic, NVIDIA, or Ollama through one API. Switch models without changing your integration.
  • Tool calling — Define tools your agents can invoke. ARES manages the tool-call loop, execution, and response assembly.
  • Retrieval-augmented generation (RAG) — Ground LLM responses in your own data with built-in retrieval pipelines.
  • Workflows — Chain multiple agents and processing steps into deterministic, multi-step workflows.
  • Multi-tenant enterprise support — Tenant isolation, per-tenant agent configuration, API key scoping, and usage tracking at the tenant level.
  • Streaming — Server-Sent Events (SSE) streaming for real-time, token-by-token responses.
  • Usage metering — Track tokens, requests, and costs per tenant with built-in rate limiting and quota enforcement.

Who is ARES for?

  • Platform teams building internal AI infrastructure who need a reliable, multi-provider abstraction layer.
  • Enterprise clients who want managed AI agents with tenant isolation, usage visibility, and SLA guarantees.
  • Developers building AI applications who want a clean API without managing provider credentials, rate limits, and failover logic themselves.

Base URL

All API requests are made to:

https://api.ares.dirmacs.com
ResourceDescription
QuickstartZero to first API call in 5 minutes
AuthenticationAPI keys, JWT tokens, and admin auth
Models & ProvidersAvailable models, tiers, and provider configuration
ChangelogRelease history and breaking changes

Quickstart

Get from zero to your first ARES API call in under 5 minutes.

Prerequisites

  • An ARES API key (format: ares_xxx). Contact your administrator or use the Dirmacs Admin provisioning UI to generate one.

1. Make your first chat request

Send a message to an ARES agent using the chat endpoint.

curl

curl -X POST https://api.ares.dirmacs.com/v1/chat \
  -H "Authorization: Bearer ares_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "What can you help me with?",
    "agent_type": "product"
  }'

Python

import requests

response = requests.post(
    "https://api.ares.dirmacs.com/v1/chat",
    headers={
        "Authorization": "Bearer ares_xxx",
        "Content-Type": "application/json",
    },
    json={
        "message": "What can you help me with?",
        "agent_type": "product",
    },
)

data = response.json()
print(data["response"])

JavaScript

const response = await fetch("https://api.ares.dirmacs.com/v1/chat", {
  method: "POST",
  headers: {
    "Authorization": "Bearer ares_xxx",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    message: "What can you help me with?",
    agent_type: "product",
  }),
});

const data = await response.json();
console.log(data.response);

Response

{
  "response": "I can help you with product information, recommendations, and questions...",
  "agent": "product",
  "context_id": "ctx_a1b2c3d4"
}

The context_id is returned with every response. Pass it back in subsequent requests to maintain conversation context.

2. Try streaming

For real-time, token-by-token output, use the streaming endpoint. ARES streams responses using Server-Sent Events (SSE).

curl

curl -N -X POST https://api.ares.dirmacs.com/v1/chat/stream \
  -H "Authorization: Bearer ares_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "Explain how LLM routing works",
    "agent_type": "product"
  }'

The -N flag disables output buffering so you see tokens as they arrive.

Python

import requests

response = requests.post(
    "https://api.ares.dirmacs.com/v1/chat/stream",
    headers={
        "Authorization": "Bearer ares_xxx",
        "Content-Type": "application/json",
    },
    json={
        "message": "Explain how LLM routing works",
        "agent_type": "product",
    },
    stream=True,
)

for line in response.iter_lines():
    if line:
        decoded = line.decode("utf-8")
        if decoded.startswith("data: "):
            print(decoded[6:], end="", flush=True)

JavaScript

const response = await fetch("https://api.ares.dirmacs.com/v1/chat/stream", {
  method: "POST",
  headers: {
    "Authorization": "Bearer ares_xxx",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    message: "Explain how LLM routing works",
    agent_type: "product",
  }),
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const chunk = decoder.decode(value);
  const lines = chunk.split("\n");

  for (const line of lines) {
    if (line.startsWith("data: ")) {
      process.stdout.write(line.slice(6));
    }
  }
}

3. Continue a conversation

Use the context_id from a previous response to maintain conversation history:

curl -X POST https://api.ares.dirmacs.com/v1/chat \
  -H "Authorization: Bearer ares_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "Tell me more about that",
    "agent_type": "product",
    "context_id": "ctx_a1b2c3d4"
  }'

Next steps

  • Authentication — Learn about API keys, JWT tokens, and admin authentication.
  • Models & Providers — Understand which models are available and how to choose the right one.

Authentication

ARES supports three authentication methods, each designed for a different use case.

MethodHeaderRoutesUse case
API KeyAuthorization: Bearer ares_xxx/v1/*Client applications, backend services
JWTAuthorization: Bearer <access_token>/api/*End-user sessions, frontend apps
Admin SecretX-Admin-Secret: <secret>/api/admin/*Internal administration

API Key authentication

API keys are the simplest way to authenticate with ARES. Each key is scoped to a single tenant and carries that tenant’s permissions and rate limits.

Format: ares_ followed by a random string (e.g., ares_k7Gx9mPqR2vLwN4s).

How to get one: API keys are generated during tenant provisioning via the Dirmacs Admin dashboard, or through the admin API.

Usage

Pass the API key in the Authorization header on any /v1/* endpoint:

curl -X POST https://api.ares.dirmacs.com/v1/chat \
  -H "Authorization: Bearer ares_k7Gx9mPqR2vLwN4s" \
  -H "Content-Type: application/json" \
  -d '{"message": "Hello", "agent_type": "product"}'
import requests

headers = {
    "Authorization": "Bearer ares_k7Gx9mPqR2vLwN4s",
    "Content-Type": "application/json",
}

response = requests.post(
    "https://api.ares.dirmacs.com/v1/chat",
    headers=headers,
    json={"message": "Hello", "agent_type": "product"},
)
const response = await fetch("https://api.ares.dirmacs.com/v1/chat", {
  method: "POST",
  headers: {
    "Authorization": "Bearer ares_k7Gx9mPqR2vLwN4s",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({ message: "Hello", agent_type: "product" }),
});

Security: Treat API keys like passwords. Do not embed them in client-side code, commit them to version control, or expose them in logs. Use environment variables or a secrets manager.


JWT authentication

JWT authentication is designed for end-user sessions. Users register and log in to receive short-lived access tokens and long-lived refresh tokens.

  • Access tokens expire after 15 minutes.
  • Refresh tokens are used to obtain new access tokens without re-entering credentials.

Register a new user

curl -X POST https://api.ares.dirmacs.com/api/auth/register \
  -H "Content-Type: application/json" \
  -d '{
    "email": "developer@example.com",
    "password": "your-secure-password",
    "name": "Jane Developer"
  }'

Response:

{
  "message": "Registration successful",
  "user_id": "usr_abc123"
}

Log in

curl -X POST https://api.ares.dirmacs.com/api/auth/login \
  -H "Content-Type: application/json" \
  -d '{
    "email": "developer@example.com",
    "password": "your-secure-password"
  }'

Response:

{
  "access_token": "eyJhbGciOiJIUzI1NiIs...",
  "refresh_token": "rt_x9Kp2mQvL8wN3rTs...",
  "expires_in": 900
}

Use the access token

Pass the access token in the Authorization header on any /api/* endpoint:

curl https://api.ares.dirmacs.com/api/chat \
  -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIs..." \
  -H "Content-Type: application/json" \
  -d '{"message": "Hello", "agent_type": "product"}'

Refresh an expired token

When your access token expires, use the refresh token to get a new one:

curl -X POST https://api.ares.dirmacs.com/api/auth/refresh \
  -H "Content-Type: application/json" \
  -d '{
    "refresh_token": "rt_x9Kp2mQvL8wN3rTs..."
  }'

Response:

{
  "access_token": "eyJhbGciOiJIUzI1NiIs...",
  "expires_in": 900
}

Log out

Invalidate a refresh token when the user logs out:

curl -X POST https://api.ares.dirmacs.com/api/auth/logout \
  -H "Content-Type: application/json" \
  -d '{
    "refresh_token": "rt_x9Kp2mQvL8wN3rTs..."
  }'

Token management in Python

import requests
import time


class AresClient:
    def __init__(self, base_url="https://api.ares.dirmacs.com"):
        self.base_url = base_url
        self.access_token = None
        self.refresh_token = None
        self.token_expiry = 0

    def login(self, email, password):
        response = requests.post(
            f"{self.base_url}/api/auth/login",
            json={"email": email, "password": password},
        )
        data = response.json()
        self.access_token = data["access_token"]
        self.refresh_token = data["refresh_token"]
        self.token_expiry = time.time() + data["expires_in"]

    def _ensure_valid_token(self):
        if time.time() >= self.token_expiry - 30:  # Refresh 30s before expiry
            response = requests.post(
                f"{self.base_url}/api/auth/refresh",
                json={"refresh_token": self.refresh_token},
            )
            data = response.json()
            self.access_token = data["access_token"]
            self.token_expiry = time.time() + data["expires_in"]

    def chat(self, message, agent_type="product"):
        self._ensure_valid_token()
        response = requests.post(
            f"{self.base_url}/api/chat",
            headers={"Authorization": f"Bearer {self.access_token}"},
            json={"message": message, "agent_type": agent_type},
        )
        return response.json()

Token management in JavaScript

class AresClient {
  constructor(baseUrl = "https://api.ares.dirmacs.com") {
    this.baseUrl = baseUrl;
    this.accessToken = null;
    this.refreshToken = null;
    this.tokenExpiry = 0;
  }

  async login(email, password) {
    const response = await fetch(`${this.baseUrl}/api/auth/login`, {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ email, password }),
    });
    const data = await response.json();
    this.accessToken = data.access_token;
    this.refreshToken = data.refresh_token;
    this.tokenExpiry = Date.now() + data.expires_in * 1000;
  }

  async ensureValidToken() {
    if (Date.now() >= this.tokenExpiry - 30000) {
      const response = await fetch(`${this.baseUrl}/api/auth/refresh`, {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({ refresh_token: this.refreshToken }),
      });
      const data = await response.json();
      this.accessToken = data.access_token;
      this.tokenExpiry = Date.now() + data.expires_in * 1000;
    }
  }

  async chat(message, agentType = "product") {
    await this.ensureValidToken();
    const response = await fetch(`${this.baseUrl}/api/chat`, {
      method: "POST",
      headers: {
        "Authorization": `Bearer ${this.accessToken}`,
        "Content-Type": "application/json",
      },
      body: JSON.stringify({ message, agent_type: agentType }),
    });
    return response.json();
  }
}

Admin Secret authentication

The admin secret provides full access to ARES administration endpoints. It is intended for internal tools and the Dirmacs Admin dashboard only.

Pass the secret in the X-Admin-Secret header:

curl https://api.ares.dirmacs.com/api/admin/tenants \
  -H "X-Admin-Secret: your-admin-secret"

Warning: The admin secret grants unrestricted access to all tenants, agents, and configuration. Never expose it outside your infrastructure. It should only be used in server-to-server calls from trusted internal services.


Error responses

Authentication failures return standard HTTP status codes:

StatusMeaning
401 UnauthorizedMissing or invalid credentials
403 ForbiddenValid credentials but insufficient permissions
429 Too Many RequestsRate limit exceeded for this API key or tenant

Example error response:

{
  "error": "Invalid or expired token",
  "code": "AUTH_INVALID_TOKEN"
}

Models & Providers

ARES routes LLM requests across multiple providers through a single API. You do not call providers directly — ARES selects the appropriate model based on the agent configuration and handles credentials, rate limits, and failover transparently.

Available models

TierProviderModelBest for
fastGroqllama-3.1-8b-instantQuick responses, classification, simple Q&A
balancedGroqllama-3.3-70b-versatileGeneral-purpose tasks, GPT-4 class quality
powerfulAnthropicclaude-sonnet-4-6Complex reasoning, long-form analysis, nuanced tasks
deepseekNVIDIAdeepseek-v3.2Code generation, technical documentation, structured output
localOllamaministral-3:3bDevelopment, testing, offline use

How model selection works

You do not specify a model directly in your API calls. Instead, you specify an agent_type, and each agent is configured with a model tier.

# This request is routed to whichever model the "product" agent is configured to use
curl -X POST https://api.ares.dirmacs.com/v1/chat \
  -H "Authorization: Bearer ares_xxx" \
  -H "Content-Type: application/json" \
  -d '{"message": "Compare these two options", "agent_type": "product"}'

The mapping between agents and models is configured by your tenant administrator. A typical setup might look like:

AgentModel tierRationale
classifierfastNeeds speed, not depth
productbalancedGeneral-purpose, good quality
analystpowerfulComplex reasoning required
code-reviewdeepseekSpecialized for code tasks

This design means you can upgrade an agent’s underlying model without changing any client code.

Provider architecture

ARES uses a named-provider system. Each provider is configured with its API endpoint, credentials, and rate limits. Models reference their provider by name.

┌─────────────┐
│  Your App   │
│  agent_type │
└──────┬──────┘
       │
       ▼
┌─────────────┐     ┌──────────┐
│    ARES     │────▶│   Groq   │  fast, balanced
│   Router    │     └──────────┘
│             │     ┌──────────┐
│             │────▶│Anthropic │  powerful
│             │     └──────────┘
│             │     ┌──────────┐
│             │────▶│  NVIDIA  │  deepseek
│             │     └──────────┘
│             │     ┌──────────┐
│             │────▶│  Ollama  │  local
└─────────────┘     └──────────┘

Provider details

Groq — High-throughput inference on custom LPUs. Extremely fast response times. Hosts open-source models (Llama, Mixtral). Free tier available with rate limits.

Anthropic — Claude models. Best-in-class for complex reasoning, instruction following, and safety. Requires a paid API key.

NVIDIA (DeepSeek) — NVIDIA-hosted DeepSeek models via the NVIDIA AI API. Strong at code generation and structured technical output.

Ollama — Self-hosted, local inference. No external API calls. Useful for development, air-gapped environments, or when you need to keep data on-premises.

Rate limits

Rate limits are enforced per provider and per tenant. The following are default limits for the Groq free tier:

Model tierRequests per dayTokens per minute
fast (llama-3.1-8b)14,40020,000
balanced (llama-3.3-70b)6,0006,000

Anthropic and NVIDIA rate limits depend on your API plan with those providers. ARES surfaces rate limit errors transparently:

{
  "error": "Rate limit exceeded for provider 'groq'",
  "code": "RATE_LIMIT_EXCEEDED",
  "retry_after": 60
}

Tenant-level rate limits and quotas are configured separately by your administrator and enforced by ARES regardless of provider limits.

Adding your own providers

If you are self-hosting ARES, you can add providers in your ares.toml configuration:

[[providers]]
name = "my-openai"
kind = "openai"
api_base = "https://api.openai.com/v1"
api_key_env = "OPENAI_API_KEY"

[[models]]
name = "gpt-4o"
provider = "my-openai"
model_id = "gpt-4o"
tier = "powerful"

Any provider that exposes an OpenAI-compatible API (vLLM, Together AI, Fireworks, etc.) can be added using the openai provider kind.

Choosing the right tier

If you need…Use tier
Fastest possible responsefast
Good quality at reasonable speedbalanced
Maximum reasoning capabilitypowerful
Code generation or technical tasksdeepseek
Offline or local developmentlocal

When in doubt, start with balanced. It provides the best trade-off between quality, speed, and cost for most use cases.

Chat & Conversations

Send messages to ARES agents and manage multi-turn conversations.


Send a message

POST /api/chat

Send a message to an agent and receive a response. ARES routes the message to the appropriate agent based on the agent_type parameter, or uses the default router agent if none is specified.

Authentication

Requires a JWT access token: Authorization: Bearer <jwt_access_token>

Request body

ParameterTypeRequiredDescription
messagestringYesThe user’s message or prompt.
agent_typestringNoWhich agent handles the request (e.g., "product", "research", "router"). Defaults to the router agent.
context_idstringNoConversation context ID. Pass this value back on subsequent requests to continue a multi-turn conversation.

Response

{
  "response": "Here's what I found about your question...",
  "agent": "product",
  "context_id": "ctx_a1b2c3d4",
  "sources": null
}
FieldTypeDescription
responsestringThe agent’s response text.
agentstringThe agent that handled the request.
context_idstringContext identifier. Pass this back to continue the conversation.
sourcesarray|nullSource references, if the agent performed retrieval. Otherwise null.

Examples

curl

curl -X POST https://api.ares.dirmacs.com/api/chat \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer eyJhbGciOi..." \
  -d '{
    "message": "What pricing plans do you offer?",
    "agent_type": "product"
  }'

Python

import requests

response = requests.post(
    "https://api.ares.dirmacs.com/api/chat",
    headers={
        "Content-Type": "application/json",
        "Authorization": "Bearer eyJhbGciOi..."
    },
    json={
        "message": "What pricing plans do you offer?",
        "agent_type": "product"
    }
)

data = response.json()
print(data["response"])

# Continue the conversation using the returned context_id
follow_up = requests.post(
    "https://api.ares.dirmacs.com/api/chat",
    headers={
        "Content-Type": "application/json",
        "Authorization": "Bearer eyJhbGciOi..."
    },
    json={
        "message": "How does the Pro plan compare to Enterprise?",
        "context_id": data["context_id"]
    }
)

JavaScript

const response = await fetch("https://api.ares.dirmacs.com/api/chat", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": "Bearer eyJhbGciOi..."
  },
  body: JSON.stringify({
    message: "What pricing plans do you offer?",
    agent_type: "product"
  })
});

const data = await response.json();
console.log(data.response);

// Continue the conversation
const followUp = await fetch("https://api.ares.dirmacs.com/api/chat", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": "Bearer eyJhbGciOi..."
  },
  body: JSON.stringify({
    message: "How does the Pro plan compare to Enterprise?",
    context_id: data.context_id
  })
});

Stream a response

POST /api/chat/stream

Send a message and receive the response as a stream of Server-Sent Events (SSE). Each event contains a text chunk. This is the recommended approach for user-facing applications where you want to display the response as it is generated.

The request body is identical to POST /api/chat.

Authentication

Requires a JWT access token: Authorization: Bearer <jwt_access_token>

Response format

The response uses the text/event-stream content type. Each SSE event contains a chunk of the agent’s response:

data: Here's
data:  what I
data:  found about
data:  your question...

Collect all chunks to form the complete response. The connection closes automatically when the response is complete.

Examples

curl

curl -N -X POST https://api.ares.dirmacs.com/api/chat/stream \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer eyJhbGciOi..." \
  -H "Accept: text/event-stream" \
  -d '{
    "message": "Explain quantum computing",
    "agent_type": "research"
  }'

Python

import requests

response = requests.post(
    "https://api.ares.dirmacs.com/api/chat/stream",
    headers={
        "Content-Type": "application/json",
        "Authorization": "Bearer eyJhbGciOi...",
        "Accept": "text/event-stream"
    },
    json={
        "message": "Explain quantum computing",
        "agent_type": "research"
    },
    stream=True
)

for line in response.iter_lines():
    if line:
        decoded = line.decode("utf-8")
        if decoded.startswith("data: "):
            chunk = decoded[6:]  # Strip "data: " prefix
            print(chunk, end="", flush=True)

JavaScript

const response = await fetch("https://api.ares.dirmacs.com/api/chat/stream", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": "Bearer eyJhbGciOi...",
    "Accept": "text/event-stream"
  },
  body: JSON.stringify({
    message: "Explain quantum computing",
    agent_type: "research"
  })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const text = decoder.decode(value, { stream: true });
  for (const line of text.split("\n")) {
    if (line.startsWith("data: ")) {
      const chunk = line.slice(6);
      process.stdout.write(chunk); // Node.js
      // Or append to DOM in browsers
    }
  }
}

Conversations

Manage stored conversations and their message history.

List conversations

GET /api/conversations

Returns all conversations for the authenticated user.

Authentication: JWT required.

curl https://api.ares.dirmacs.com/api/conversations \
  -H "Authorization: Bearer eyJhbGciOi..."

Get a conversation

GET /api/conversations/{id}

Returns a single conversation along with its full message history.

Authentication: JWT required.

ParameterTypeInDescription
idstringpathThe conversation ID
curl https://api.ares.dirmacs.com/api/conversations/conv_abc123 \
  -H "Authorization: Bearer eyJhbGciOi..."

Update a conversation

PUT /api/conversations/{id}

Update the title of a conversation.

Authentication: JWT required.

Request body:

{
  "title": "Pricing discussion"
}
curl -X PUT https://api.ares.dirmacs.com/api/conversations/conv_abc123 \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer eyJhbGciOi..." \
  -d '{"title": "Pricing discussion"}'

Delete a conversation

DELETE /api/conversations/{id}

Permanently delete a conversation and all its messages.

Authentication: JWT required.

curl -X DELETE https://api.ares.dirmacs.com/api/conversations/conv_abc123 \
  -H "Authorization: Bearer eyJhbGciOi..."

User memory

GET /api/memory

Retrieve memory and preferences that ARES has learned from your conversations. This includes user preferences, context, and behavioral patterns the system has observed.

Authentication: JWT required.

curl https://api.ares.dirmacs.com/api/memory \
  -H "Authorization: Bearer eyJhbGciOi..."

Agents

ARES agents are autonomous units that process requests using a configured LLM model, a system prompt, and a set of tools. Each agent is specialized for a particular domain or task — routing, research, product knowledge, risk analysis, and more.

Agents are defined by four properties:

  • Model — The LLM that powers the agent (e.g., llama-3.3-70b, claude-3-5-sonnet, deepseek-r1).
  • System prompt — Instructions that shape the agent’s behavior, personality, and domain knowledge.
  • Tools — Capabilities the agent can invoke during processing (e.g., calculator, web_search, code_interpreter).
  • Name — A unique identifier used to route requests to this agent.

Agents can be platform-provided (available to all users) or user-defined (private, created via API or TOON config).


List all agents

GET /api/agents

Returns all available agents on the platform. This endpoint does not require authentication.

Response

[
  {
    "name": "router",
    "description": "Routes incoming requests to the most appropriate specialist agent.",
    "model": "llama-3.3-70b-versatile",
    "tools": []
  },
  {
    "name": "research",
    "description": "Conducts deep multi-step research with source synthesis.",
    "model": "deepseek-r1-distill-llama-70b",
    "tools": ["web_search", "calculator"]
  },
  {
    "name": "product",
    "description": "Answers product-related questions with detailed knowledge.",
    "model": "llama-3.3-70b-versatile",
    "tools": []
  }
]

Examples

curl

curl https://api.ares.dirmacs.com/api/agents

Python

import requests

response = requests.get("https://api.ares.dirmacs.com/api/agents")
agents = response.json()

for agent in agents:
    print(f"{agent['name']}: {agent['description']}")

JavaScript

const response = await fetch("https://api.ares.dirmacs.com/api/agents");
const agents = await response.json();

agents.forEach(agent => {
  console.log(`${agent.name}: ${agent.description}`);
});

User agents

Create and manage your own custom agents. User agents are private to your account and can be configured with any available model, custom system prompts, and tool selections.

All user agent endpoints require JWT authentication: Authorization: Bearer <jwt_access_token>

List your agents

GET /api/user/agents

Returns all custom agents owned by the authenticated user.

curl https://api.ares.dirmacs.com/api/user/agents \
  -H "Authorization: Bearer eyJhbGciOi..."

Create an agent

POST /api/user/agents

Create a new custom agent.

Request body

ParameterTypeRequiredDescription
namestringYesUnique agent name (alphanumeric, hyphens).
modelstringYesLLM model identifier.
system_promptstringYesInstructions that define agent behavior.
toolsstring[]NoList of tool names the agent can use.

Example

curl -X POST https://api.ares.dirmacs.com/api/user/agents \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer eyJhbGciOi..." \
  -d '{
    "name": "code-reviewer",
    "model": "llama-3.3-70b-versatile",
    "system_prompt": "You are an expert code reviewer. Analyze code for bugs, security issues, and style problems. Be concise and actionable.",
    "tools": ["calculator"]
  }'
import requests

requests.post(
    "https://api.ares.dirmacs.com/api/user/agents",
    headers={
        "Content-Type": "application/json",
        "Authorization": "Bearer eyJhbGciOi..."
    },
    json={
        "name": "code-reviewer",
        "model": "llama-3.3-70b-versatile",
        "system_prompt": "You are an expert code reviewer. Analyze code for bugs, security issues, and style problems. Be concise and actionable.",
        "tools": ["calculator"]
    }
)

Get agent details

GET /api/user/agents/{name}

Retrieve the full configuration of a specific user agent.

ParameterTypeInDescription
namestringpathThe agent’s name
curl https://api.ares.dirmacs.com/api/user/agents/code-reviewer \
  -H "Authorization: Bearer eyJhbGciOi..."

Update an agent

PUT /api/user/agents/{name}

Update an existing agent’s configuration. You can modify the model, system prompt, or tools.

curl -X PUT https://api.ares.dirmacs.com/api/user/agents/code-reviewer \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer eyJhbGciOi..." \
  -d '{
    "model": "deepseek-r1-distill-llama-70b",
    "system_prompt": "You are a senior code reviewer specializing in Rust and TypeScript.",
    "tools": ["calculator", "web_search"]
  }'

Delete an agent

DELETE /api/user/agents/{name}

Permanently delete a user agent.

curl -X DELETE https://api.ares.dirmacs.com/api/user/agents/code-reviewer \
  -H "Authorization: Bearer eyJhbGciOi..."

TOON import/export

TOON is ARES’s agent configuration format. You can import and export agent configs as TOON to share agent definitions, back up configurations, or migrate agents between environments.

Import a TOON config

POST /api/user/agents/import

Import an agent definition from a TOON configuration file.

curl -X POST https://api.ares.dirmacs.com/api/user/agents/import \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer eyJhbGciOi..." \
  -d @agent-config.toon

Export as TOON

GET /api/user/agents/{name}/export

Export an agent’s configuration in TOON format. Useful for sharing agent definitions or version-controlling them alongside your codebase.

curl https://api.ares.dirmacs.com/api/user/agents/code-reviewer/export \
  -H "Authorization: Bearer eyJhbGciOi..." \
  -o code-reviewer.toon

Workflows

Workflows are multi-agent orchestration pipelines. A workflow defines an entry point agent (typically a router) that analyzes the incoming query and delegates to specialist agents in sequence. The result is a coordinated, multi-step response that leverages the strengths of different agents.

How workflows operate:

  1. The query enters through an entry agent (usually a router).
  2. The router analyzes intent and selects the most appropriate specialist agent.
  3. The specialist processes the query, optionally delegating further.
  4. Each step is recorded in the reasoning path, providing full transparency into the decision chain.
  5. The final response is returned along with metadata about the execution.

List workflows

GET /api/workflows

Returns the names of all available workflows.

Authentication

Requires a JWT access token: Authorization: Bearer <jwt_access_token>

Response

["default", "research", "support"]

Example

curl https://api.ares.dirmacs.com/api/workflows \
  -H "Authorization: Bearer eyJhbGciOi..."

Execute a workflow

POST /api/workflows/{workflow_name}

Execute a named workflow. The query is routed through the workflow’s agent chain, and the final synthesized response is returned along with execution metadata.

Authentication

Requires a JWT access token: Authorization: Bearer <jwt_access_token>

Path parameters

ParameterTypeDescription
workflow_namestringName of the workflow to execute

Request body

ParameterTypeRequiredDescription
querystringYesThe input query or task for the workflow.
contextobjectNoAdditional context passed to agents during execution.

Response

{
  "final_response": "Based on our analysis, the Pro plan at $49/month offers the best value for your use case. It includes 100K API calls, priority support, and access to all models. The Enterprise plan adds dedicated infrastructure and SLA guarantees, which may be worth considering if you expect to exceed 500K calls/month.",
  "steps_executed": 3,
  "agents_used": ["router", "sales", "product"],
  "reasoning_path": [
    {
      "agent": "router",
      "action": "Classified as pricing inquiry. Routing to sales agent."
    },
    {
      "agent": "sales",
      "action": "Retrieved pricing tiers. Consulting product agent for feature comparison."
    },
    {
      "agent": "product",
      "action": "Compared Pro vs Enterprise feature sets. Synthesized final recommendation."
    }
  ]
}
FieldTypeDescription
final_responsestringThe synthesized response from the workflow.
steps_executedintegerTotal number of agent steps in the execution.
agents_usedstring[]Ordered list of agents that participated.
reasoning_patharrayStep-by-step trace of each agent’s reasoning and actions.

Examples

curl

curl -X POST https://api.ares.dirmacs.com/api/workflows/default \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer eyJhbGciOi..." \
  -d '{
    "query": "Compare your Pro and Enterprise pricing plans for a mid-size SaaS company",
    "context": {
      "company_size": "50-200 employees",
      "expected_volume": "200K calls/month"
    }
  }'

Python

import requests

response = requests.post(
    "https://api.ares.dirmacs.com/api/workflows/default",
    headers={
        "Content-Type": "application/json",
        "Authorization": "Bearer eyJhbGciOi..."
    },
    json={
        "query": "Compare your Pro and Enterprise pricing plans for a mid-size SaaS company",
        "context": {
            "company_size": "50-200 employees",
            "expected_volume": "200K calls/month"
        }
    }
)

result = response.json()
print(result["final_response"])

# Inspect the reasoning chain
for step in result["reasoning_path"]:
    print(f"  [{step['agent']}] {step['action']}")

JavaScript

const response = await fetch(
  "https://api.ares.dirmacs.com/api/workflows/default",
  {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "Authorization": "Bearer eyJhbGciOi..."
    },
    body: JSON.stringify({
      query: "Compare your Pro and Enterprise pricing plans for a mid-size SaaS company",
      context: {
        company_size: "50-200 employees",
        expected_volume: "200K calls/month"
      }
    })
  }
);

const result = await response.json();
console.log(result.final_response);

// Inspect the reasoning chain
result.reasoning_path.forEach(step => {
  console.log(`  [${step.agent}] ${step.action}`);
});

Workflow behavior

Agent selection. The entry agent examines the query and routes to the specialist best suited to handle it. If a specialist determines it needs input from another agent, it can delegate further, creating a multi-hop chain.

Context propagation. The optional context object is available to every agent in the chain. Use it to pass structured information (user tier, session metadata, domain-specific parameters) that agents can reference during processing.

Determinism. Workflow routing is driven by the entry agent’s LLM reasoning, so the same query may route differently depending on phrasing. The reasoning_path in the response provides full visibility into routing decisions.

Research

The Research API performs deep, multi-step research on a topic using parallel sub-agents. Unlike a single chat request, a research query spawns multiple agents that independently explore facets of the question, synthesize findings, and produce a comprehensive result with source attribution.


Execute a research query

POST /api/research

Submit a research query for deep, multi-step investigation.

Authentication

Requires a JWT access token: Authorization: Bearer <jwt_access_token>

Request body

ParameterTypeRequiredDefaultDescription
querystringYesThe research question or topic.
depthintegerNo3How many levels deep the research goes. Higher values explore sub-topics more thoroughly.
max_iterationsintegerNo5Maximum total agent calls. Acts as a cost/time ceiling.

Understanding depth: At depth 1, the research agent answers the query directly. At depth 2, it identifies sub-questions, spawns agents to answer each, then synthesizes. At depth 3+, sub-agents can spawn their own sub-agents, creating a tree of investigation.

Understanding max_iterations: This is a hard cap on total agent invocations across all depth levels. If the research tree would require more calls than max_iterations, it stops expanding and synthesizes what it has. Use this to control cost and response time.

Response

{
  "findings": "## Market Analysis: Edge Computing in Healthcare\n\nEdge computing adoption in healthcare is accelerating, driven by three primary factors...\n\n### Key Findings\n1. **Latency requirements** — Real-time patient monitoring demands sub-10ms response times...\n2. **Data sovereignty** — HIPAA compliance increasingly favors on-premise processing...\n3. **Cost dynamics** — Edge deployment reduces cloud egress costs by 40-60% for imaging workloads...\n\n### Sources\n- Gartner Healthcare IT Report 2025\n- IEEE Edge Computing Survey\n- HHS HIPAA Guidance Update",
  "sources": [
    "Gartner Healthcare IT Report 2025",
    "IEEE Edge Computing Survey",
    "HHS HIPAA Guidance Update"
  ],
  "duration_ms": 8432
}
FieldTypeDescription
findingsstringThe synthesized research output, typically in Markdown.
sourcesstring[]References and sources discovered during research.
duration_msintegerTotal time taken for the research in milliseconds.

Examples

curl

curl -X POST https://api.ares.dirmacs.com/api/research \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer eyJhbGciOi..." \
  -d '{
    "query": "What are the current trends in edge computing for healthcare?",
    "depth": 3,
    "max_iterations": 5
  }'

Python

import requests

response = requests.post(
    "https://api.ares.dirmacs.com/api/research",
    headers={
        "Content-Type": "application/json",
        "Authorization": "Bearer eyJhbGciOi..."
    },
    json={
        "query": "What are the current trends in edge computing for healthcare?",
        "depth": 3,
        "max_iterations": 5
    }
)

result = response.json()
print(result["findings"])
print(f"\nCompleted in {result['duration_ms']}ms")
print(f"Sources: {', '.join(result['sources'])}")

JavaScript

const response = await fetch("https://api.ares.dirmacs.com/api/research", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": "Bearer eyJhbGciOi..."
  },
  body: JSON.stringify({
    query: "What are the current trends in edge computing for healthcare?",
    depth: 3,
    max_iterations: 5
  })
});

const result = await response.json();
console.log(result.findings);
console.log(`\nCompleted in ${result.duration_ms}ms`);
console.log(`Sources: ${result.sources.join(", ")}`);

Tuning research parameters

ScenarioRecommended depthRecommended max_iterations
Quick factual lookup12
Standard research question25
Deep competitive analysis310
Exhaustive literature review4+15+

Higher depth and iteration values produce more comprehensive results but take longer and consume more API quota. For most use cases, the defaults (depth: 3, max_iterations: 5) provide a good balance of thoroughness and speed.

RAG (Retrieval-Augmented Generation)

The RAG API lets you ingest documents, search them using multiple retrieval strategies, and manage document collections. RAG powers knowledge-grounded responses by retrieving relevant context from your documents before generating answers.

Feature flag: The RAG API requires ARES to be built with the ares-vector feature. If your deployment does not include this feature, these endpoints will return 404.


Ingest documents

POST /api/rag/ingest

Ingest content into a named collection. The content is automatically chunked and indexed for retrieval.

Authentication

Requires a JWT access token: Authorization: Bearer <jwt_access_token>

Request body

ParameterTypeRequiredDefaultDescription
collectionstringYesName of the collection to ingest into. Created automatically if it doesn’t exist.
contentstringYesThe text content to ingest.
metadataobjectNo{}Arbitrary key-value metadata attached to the document.
chunking_strategystringNo"word"How to split the content into chunks. Options: "word", "sentence", "paragraph".

Response

{
  "chunks_created": 5,
  "document_ids": [
    "doc_a1b2c3d4",
    "doc_e5f6g7h8",
    "doc_i9j0k1l2",
    "doc_m3n4o5p6",
    "doc_q7r8s9t0"
  ],
  "collection": "docs"
}
FieldTypeDescription
chunks_createdintegerNumber of chunks produced from the content.
document_idsstring[]IDs assigned to each chunk.
collectionstringThe collection the content was ingested into.

Examples

curl

curl -X POST https://api.ares.dirmacs.com/api/rag/ingest \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer eyJhbGciOi..." \
  -d '{
    "collection": "product-docs",
    "content": "ARES is a multi-agent AI platform that orchestrates specialized agents to handle complex queries. It supports multiple LLM providers including Groq, Anthropic, and NVIDIA...",
    "metadata": {
      "source": "documentation",
      "version": "2.0",
      "author": "engineering"
    },
    "chunking_strategy": "paragraph"
  }'

Python

import requests

response = requests.post(
    "https://api.ares.dirmacs.com/api/rag/ingest",
    headers={
        "Content-Type": "application/json",
        "Authorization": "Bearer eyJhbGciOi..."
    },
    json={
        "collection": "product-docs",
        "content": "ARES is a multi-agent AI platform...",
        "metadata": {"source": "documentation", "version": "2.0"},
        "chunking_strategy": "paragraph"
    }
)

result = response.json()
print(f"Created {result['chunks_created']} chunks in '{result['collection']}'")

JavaScript

const response = await fetch("https://api.ares.dirmacs.com/api/rag/ingest", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": "Bearer eyJhbGciOi..."
  },
  body: JSON.stringify({
    collection: "product-docs",
    content: "ARES is a multi-agent AI platform...",
    metadata: { source: "documentation", version: "2.0" },
    chunking_strategy: "paragraph"
  })
});

const result = await response.json();
console.log(`Created ${result.chunks_created} chunks in '${result.collection}'`);

Search documents

POST /api/rag/search

Search a collection using one of several retrieval strategies. Returns the most relevant document chunks.

Authentication

Requires a JWT access token: Authorization: Bearer <jwt_access_token>

Request body

ParameterTypeRequiredDefaultDescription
collectionstringYesCollection to search.
querystringYesThe search query.
strategystringNo"hybrid"Retrieval strategy (see below).
top_kintegerNo5Maximum number of results to return.
rerankbooleanNofalseWhether to rerank results for improved relevance ordering.

Search strategies

StrategyDescription
semanticVector similarity search. Best for conceptual or meaning-based queries.
bm25Classic keyword-based ranking (BM25 algorithm). Best for exact term matching.
fuzzyTolerates typos and approximate matches. Useful for user-facing search with imprecise input.
hybridCombines semantic and keyword search, then merges results. Best overall performance for most use cases.

Response

The response contains an array of matching document chunks, each with its content, relevance score, and metadata.

Examples

curl

curl -X POST https://api.ares.dirmacs.com/api/rag/search \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer eyJhbGciOi..." \
  -d '{
    "collection": "product-docs",
    "query": "how does agent routing work",
    "strategy": "hybrid",
    "top_k": 5,
    "rerank": true
  }'

Python

import requests

response = requests.post(
    "https://api.ares.dirmacs.com/api/rag/search",
    headers={
        "Content-Type": "application/json",
        "Authorization": "Bearer eyJhbGciOi..."
    },
    json={
        "collection": "product-docs",
        "query": "how does agent routing work",
        "strategy": "hybrid",
        "top_k": 5,
        "rerank": True
    }
)

results = response.json()
for result in results:
    print(result)

JavaScript

const response = await fetch("https://api.ares.dirmacs.com/api/rag/search", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": "Bearer eyJhbGciOi..."
  },
  body: JSON.stringify({
    collection: "product-docs",
    query: "how does agent routing work",
    strategy: "hybrid",
    top_k: 5,
    rerank: true
  })
});

const results = await response.json();
results.forEach(result => console.log(result));

List collections

GET /api/rag/collections

Returns all document collections for the authenticated user.

Authentication

Requires a JWT access token: Authorization: Bearer <jwt_access_token>

curl https://api.ares.dirmacs.com/api/rag/collections \
  -H "Authorization: Bearer eyJhbGciOi..."

Delete a collection

DELETE /api/rag/collection

Permanently delete a collection and all its indexed documents.

Authentication

Requires a JWT access token: Authorization: Bearer <jwt_access_token>

Request body

{
  "collection": "product-docs"
}

Example

curl -X DELETE https://api.ares.dirmacs.com/api/rag/collection \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer eyJhbGciOi..." \
  -d '{"collection": "product-docs"}'

Streaming

ARES supports real-time streaming responses via Server-Sent Events (SSE). Instead of waiting for the full response to be generated, you receive text chunks as they are produced. This enables responsive UIs that display text as it appears.


Endpoint

POST /api/chat/stream

JWT authentication: Authorization: Bearer <jwt_access_token>

POST /v1/chat/stream

API key authentication: Authorization: Bearer ares_xxx

Both endpoints accept the same request body as POST /api/chat and return the same SSE format.


SSE format

The response uses Content-Type: text/event-stream. Each event contains a data: field with a text chunk:

data: The
data:  answer
data:  to your
data:  question is
data:  as follows...

Each data: line represents one chunk of the response. Concatenate all chunks in order to reconstruct the complete response. The server closes the connection when generation is complete.


Examples

curl

The -N flag disables output buffering so chunks appear immediately:

curl -N -X POST https://api.ares.dirmacs.com/api/chat/stream \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer eyJhbGciOi..." \
  -H "Accept: text/event-stream" \
  -d '{
    "message": "Explain how neural networks learn",
    "agent_type": "research"
  }'

Python

Using the requests library with stream=True:

import requests

response = requests.post(
    "https://api.ares.dirmacs.com/api/chat/stream",
    headers={
        "Content-Type": "application/json",
        "Authorization": "Bearer eyJhbGciOi...",
        "Accept": "text/event-stream"
    },
    json={
        "message": "Explain how neural networks learn",
        "agent_type": "research"
    },
    stream=True
)

full_response = []

for line in response.iter_lines():
    if line:
        decoded = line.decode("utf-8")
        if decoded.startswith("data: "):
            chunk = decoded[6:]
            print(chunk, end="", flush=True)
            full_response.append(chunk)

complete_text = "".join(full_response)

For production use, consider using httpx with async streaming:

import httpx
import asyncio

async def stream_chat(message: str, token: str) -> str:
    chunks = []

    async with httpx.AsyncClient() as client:
        async with client.stream(
            "POST",
            "https://api.ares.dirmacs.com/api/chat/stream",
            headers={
                "Content-Type": "application/json",
                "Authorization": f"Bearer {token}",
                "Accept": "text/event-stream"
            },
            json={"message": message}
        ) as response:
            async for line in response.aiter_lines():
                if line.startswith("data: "):
                    chunk = line[6:]
                    print(chunk, end="", flush=True)
                    chunks.append(chunk)

    return "".join(chunks)

result = asyncio.run(stream_chat("Explain how neural networks learn", "eyJhbGciOi..."))

JavaScript (Browser)

Using the Fetch API with ReadableStream:

async function streamChat(message, token) {
  const response = await fetch("https://api.ares.dirmacs.com/api/chat/stream", {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "Authorization": `Bearer ${token}`,
      "Accept": "text/event-stream"
    },
    body: JSON.stringify({
      message: message,
      agent_type: "research"
    })
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();
  let fullResponse = "";

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const text = decoder.decode(value, { stream: true });
    for (const line of text.split("\n")) {
      if (line.startsWith("data: ")) {
        const chunk = line.slice(6);
        fullResponse += chunk;

        // Update your UI here
        document.getElementById("output").textContent = fullResponse;
      }
    }
  }

  return fullResponse;
}

JavaScript (Node.js)

async function streamChat(message, token) {
  const response = await fetch("https://api.ares.dirmacs.com/api/chat/stream", {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "Authorization": `Bearer ${token}`,
      "Accept": "text/event-stream"
    },
    body: JSON.stringify({ message })
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();
  let fullResponse = "";

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const text = decoder.decode(value, { stream: true });
    for (const line of text.split("\n")) {
      if (line.startsWith("data: ")) {
        const chunk = line.slice(6);
        fullResponse += chunk;
        process.stdout.write(chunk);
      }
    }
  }

  return fullResponse;
}

Go

package main

import (
	"bufio"
	"bytes"
	"encoding/json"
	"fmt"
	"net/http"
	"strings"
)

func streamChat(message, token string) (string, error) {
	body, _ := json.Marshal(map[string]string{
		"message":    message,
		"agent_type": "research",
	})

	req, err := http.NewRequest("POST",
		"https://api.ares.dirmacs.com/api/chat/stream",
		bytes.NewReader(body))
	if err != nil {
		return "", err
	}

	req.Header.Set("Content-Type", "application/json")
	req.Header.Set("Authorization", "Bearer "+token)
	req.Header.Set("Accept", "text/event-stream")

	resp, err := http.DefaultClient.Do(req)
	if err != nil {
		return "", err
	}
	defer resp.Body.Close()

	var fullResponse strings.Builder
	scanner := bufio.NewScanner(resp.Body)

	for scanner.Scan() {
		line := scanner.Text()
		if strings.HasPrefix(line, "data: ") {
			chunk := line[6:]
			fmt.Print(chunk)
			fullResponse.WriteString(chunk)
		}
	}

	return fullResponse.String(), scanner.Err()
}

func main() {
	result, err := streamChat("Explain how neural networks learn", "eyJhbGciOi...")
	if err != nil {
		panic(err)
	}
	fmt.Printf("\n\nFull response length: %d characters\n", len(result))
}

Error handling

If the request is invalid or authentication fails, the server returns a standard HTTP error response (not SSE). Always check the response status before attempting to read the stream:

response = requests.post(url, headers=headers, json=body, stream=True)

if response.status_code != 200:
    print(f"Error {response.status_code}: {response.text}")
else:
    for line in response.iter_lines():
        # process SSE events
const response = await fetch(url, { method: "POST", headers, body });

if (!response.ok) {
  throw new Error(`Error ${response.status}: ${await response.text()}`);
}

// proceed with stream reading

Best practices

  • Always set Accept: text/event-stream to signal that you expect a streaming response.
  • Disable client-side buffering where possible (e.g., -N in curl, stream=True in Python requests).
  • Handle connection drops gracefully. The stream may close unexpectedly due to network issues. Implement retry logic for production applications.
  • Set reasonable timeouts. Long research queries may stream for 30+ seconds. Configure your HTTP client timeout accordingly.
  • Concatenate chunks for the final result. Individual chunks may split mid-word. Only process the complete response for downstream use.

V1 Client API

The V1 API is the primary interface for enterprise clients integrating ARES into their applications. All endpoints are scoped to the authenticated tenant — you only see your own agents, runs, and usage.

Base URL: https://api.ares.dirmacs.com

Authentication

Every request to /v1/* must include your API key in the Authorization header:

Authorization: Bearer ares_xxx

API keys are issued during tenant provisioning. You can create additional keys via the API or request them from your platform administrator.


Agents

List Agents

GET /v1/agents?page=1&per_page=20

Returns a paginated list of agents configured for your tenant.

Query Parameters:

ParameterTypeDefaultDescription
pageinteger1Page number
per_pageinteger20Results per page

Response:

{
  "agents": [
    {
      "id": "uuid",
      "name": "risk-analyzer",
      "agent_type": "classifier",
      "status": "active",
      "config": { "model": "llama-3.3-70b", "tools": ["calculator"] },
      "created_at": "2026-03-01T00:00:00Z",
      "last_run": "2026-03-13T14:22:00Z",
      "total_runs": 1547,
      "success_rate": 0.982
    }
  ],
  "total": 4,
  "page": 1,
  "per_page": 20
}

Get Agent Details

GET /v1/agents/{name}

Returns full details for a single agent.

Response:

{
  "id": "uuid",
  "name": "risk-analyzer",
  "agent_type": "classifier",
  "status": "active",
  "config": {
    "model": "llama-3.3-70b",
    "system_prompt": "You are a risk analysis agent...",
    "tools": ["calculator"],
    "max_tokens": 2048
  },
  "created_at": "2026-03-01T00:00:00Z",
  "last_run": "2026-03-13T14:22:00Z",
  "total_runs": 1547,
  "success_rate": 0.982
}

Run an Agent

POST /v1/agents/{name}/run

Execute an agent with the provided input. This is the core endpoint for triggering agent work.

Request Body:

{
  "input": {
    "message": "Analyze the risk profile for transaction TX-9921",
    "context": {
      "amount": 15000,
      "currency": "USD",
      "merchant_category": "electronics"
    }
  }
}

Response:

{
  "id": "run-uuid",
  "agent_id": "agent-uuid",
  "status": "completed",
  "input": { "message": "Analyze the risk profile..." },
  "output": {
    "risk_score": 0.73,
    "risk_level": "medium",
    "reasoning": "Elevated amount for merchant category..."
  },
  "error": null,
  "started_at": "2026-03-13T14:22:00Z",
  "finished_at": "2026-03-13T14:22:01Z",
  "duration_ms": 1243,
  "tokens_used": 847
}

If the agent fails, status will be "failed" and error will contain a description.

List Agent Runs

GET /v1/agents/{name}/runs?page=1&per_page=20

Returns the run history for a specific agent, newest first.


Chat

Send a Chat Message

POST /v1/chat

Send a message to a model or agent and receive a complete response.

Request Body:

{
  "messages": [
    { "role": "user", "content": "Summarize Q1 revenue trends." }
  ],
  "model": "llama-3.3-70b",
  "agent_type": "analyst"
}

Response:

{
  "id": "msg-uuid",
  "content": "Based on the data, Q1 revenue showed...",
  "model": "llama-3.3-70b",
  "tokens_used": 312,
  "finish_reason": "stop"
}

Stream a Chat Response

POST /v1/chat/stream

Same request body as /v1/chat, but returns a Server-Sent Events (SSE) stream.

data: {"delta": "Based on", "finish_reason": null}
data: {"delta": " the data,", "finish_reason": null}
data: {"delta": " Q1 revenue", "finish_reason": null}
...
data: {"delta": "", "finish_reason": "stop", "tokens_used": 312}

Usage

Get Usage Summary

GET /v1/usage

Returns your tenant’s usage for the current billing period.

Response:

{
  "period_start": "2026-03-01T00:00:00Z",
  "period_end": "2026-03-31T23:59:59Z",
  "total_runs": 4821,
  "total_tokens": 2847193,
  "total_api_calls": 5290,
  "quota_runs": 100000,
  "quota_tokens": 10000000,
  "daily_usage": [
    { "date": "2026-03-13", "runs": 312, "tokens": 184920, "api_calls": 340 },
    { "date": "2026-03-12", "runs": 287, "tokens": 171003, "api_calls": 315 }
  ]
}

API Keys

List API Keys

GET /v1/api-keys

Returns all API keys for your tenant. The full key secret is never returned after creation.

Response:

{
  "keys": [
    {
      "id": "key-uuid",
      "name": "android-production",
      "prefix": "ares_a1b2",
      "created_at": "2026-03-01T00:00:00Z",
      "expires_at": "2027-03-01T00:00:00Z",
      "last_used": "2026-03-13T14:00:00Z"
    }
  ]
}

Create API Key

POST /v1/api-keys

Request Body:

{
  "name": "mobile-app-key",
  "expires_in_days": 365
}

expires_in_days is optional. If omitted, the key does not expire.

Response:

{
  "key": "key-uuid",
  "secret": "ares_x7k9m2p4q8r1s5t3..."
}

Important: The secret field is only returned once at creation time. Store it securely — it cannot be retrieved again.

Revoke API Key

DELETE /v1/api-keys/{id}

Immediately invalidates the key. Returns 204 No Content on success.


Examples

Run an Agent (curl)

curl -X POST https://api.ares.dirmacs.com/v1/agents/risk-analyzer/run \
  -H "Authorization: Bearer ares_x7k9m2p4q8r1s5t3" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "message": "Evaluate this transaction",
      "context": {"amount": 15000, "currency": "USD"}
    }
  }'

Run an Agent (Python)

import requests

API_KEY = "ares_x7k9m2p4q8r1s5t3"
BASE_URL = "https://api.ares.dirmacs.com"

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json",
}

# Run an agent
response = requests.post(
    f"{BASE_URL}/v1/agents/risk-analyzer/run",
    headers=headers,
    json={
        "input": {
            "message": "Evaluate this transaction",
            "context": {"amount": 15000, "currency": "USD"},
        }
    },
)

result = response.json()
print(f"Status: {result['status']}")
print(f"Output: {result['output']}")
print(f"Duration: {result['duration_ms']}ms")
print(f"Tokens: {result['tokens_used']}")

Check Usage (curl)

curl https://api.ares.dirmacs.com/v1/usage \
  -H "Authorization: Bearer ares_x7k9m2p4q8r1s5t3"

Check Usage (Python)

response = requests.get(f"{BASE_URL}/v1/usage", headers=headers)
usage = response.json()

print(f"Runs this month: {usage['total_runs']} / {usage['quota_runs']}")
print(f"Tokens this month: {usage['total_tokens']} / {usage['quota_tokens']}")

Chat with Streaming (Python)

import requests
import json

response = requests.post(
    f"{BASE_URL}/v1/chat/stream",
    headers=headers,
    json={
        "messages": [{"role": "user", "content": "Explain quantum computing."}],
        "model": "llama-3.3-70b",
    },
    stream=True,
)

for line in response.iter_lines():
    if line:
        text = line.decode("utf-8")
        if text.startswith("data: "):
            data = json.loads(text[6:])
            print(data.get("delta", ""), end="", flush=True)

Chat with Streaming (JavaScript)

const response = await fetch("https://api.ares.dirmacs.com/v1/chat/stream", {
  method: "POST",
  headers: {
    "Authorization": "Bearer ares_x7k9m2p4q8r1s5t3",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    messages: [{ role: "user", content: "Explain quantum computing." }],
    model: "llama-3.3-70b",
  }),
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const text = decoder.decode(value);
  for (const line of text.split("\n")) {
    if (line.startsWith("data: ")) {
      const data = JSON.parse(line.slice(6));
      process.stdout.write(data.delta || "");
    }
  }
}

Admin API

The Admin API provides full platform management capabilities for ARES operators. Use it to provision tenants, manage agents, monitor usage, and operate the platform.

Base URL: https://api.ares.dirmacs.com

Authentication

Every request to /api/admin/* must include the admin secret:

X-Admin-Secret: <secret>

This secret is set in your ares.toml configuration. Guard it carefully — it grants full platform access.


Tenants

Create Tenant

POST /api/admin/tenants

Request Body:

{
  "name": "acme-corp",
  "tier": "pro"
}

Valid tiers: free, dev, pro, enterprise.

Response:

{
  "id": "tenant-uuid",
  "name": "acme-corp",
  "tier": "pro",
  "created_at": "2026-03-13T00:00:00Z"
}

List Tenants

GET /api/admin/tenants

Response:

{
  "tenants": [
    {
      "id": "tenant-uuid",
      "name": "acme-corp",
      "tier": "pro",
      "agent_count": 4,
      "created_at": "2026-03-13T00:00:00Z"
    }
  ]
}

Get Tenant Details

GET /api/admin/tenants/{id}

Response:

{
  "id": "tenant-uuid",
  "name": "acme-corp",
  "tier": "pro",
  "agent_count": 4,
  "api_key_count": 2,
  "total_runs": 12849,
  "total_tokens": 7291034,
  "created_at": "2026-03-13T00:00:00Z"
}

Update Tenant Tier

PUT /api/admin/tenants/{id}/quota

Request Body:

{
  "tier": "enterprise"
}

Response: Updated tenant object.


Provisioning

Provision a Client

POST /api/admin/provision-client

This is the recommended way to onboard a new enterprise client. It atomically creates a tenant, clones the appropriate agent templates, and generates an API key — all in a single transaction. If any step fails, everything is rolled back.

Request Body:

{
  "name": "acme-corp",
  "tier": "pro",
  "product_type": "kasino",
  "api_key_name": "production"
}
FieldTypeRequiredDescription
namestringYesUnique tenant name (lowercase, alphanumeric + hyphens)
tierstringYesOne of: free, dev, pro, enterprise
product_typestringYesTemplate set to clone: generic, kasino, ehb
api_key_namestringYesLabel for the initial API key

Response:

{
  "tenant_id": "tenant-uuid",
  "tenant_name": "acme-corp",
  "tier": "pro",
  "product_type": "kasino",
  "api_key_id": "key-uuid",
  "api_key_prefix": "ares_a1b2",
  "raw_api_key": "ares_a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5",
  "agents_created": [
    "kasino-classifier",
    "kasino-risk",
    "kasino-transaction",
    "kasino-report"
  ]
}

Important: The raw_api_key is only returned once. Store it securely and deliver it to the client through a secure channel.

curl Example:

curl -X POST https://api.ares.dirmacs.com/api/admin/provision-client \
  -H "X-Admin-Secret: your-admin-secret" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "acme-corp",
    "tier": "pro",
    "product_type": "kasino",
    "api_key_name": "production"
  }'

API Keys

Create API Key for Tenant

POST /api/admin/tenants/{id}/api-keys

Request Body:

{
  "name": "staging-key"
}

Response:

{
  "id": "key-uuid",
  "prefix": "ares_x7k9",
  "raw_key": "ares_x7k9m2p4q8r1s5t3...",
  "created_at": "2026-03-13T00:00:00Z"
}

List API Keys for Tenant

GET /api/admin/tenants/{id}/api-keys

Response:

{
  "keys": [
    {
      "id": "key-uuid",
      "name": "production",
      "prefix": "ares_a1b2",
      "created_at": "2026-03-13T00:00:00Z",
      "last_used": "2026-03-13T14:00:00Z"
    }
  ]
}

Tenant Agents

List Tenant Agents

GET /api/admin/tenants/{id}/agents

Response:

{
  "agents": [
    {
      "id": "agent-uuid",
      "name": "kasino-classifier",
      "agent_type": "classifier",
      "status": "active",
      "model": "llama-3.3-70b",
      "total_runs": 2841,
      "success_rate": 0.991
    }
  ]
}

Create Tenant Agent

POST /api/admin/tenants/{id}/agents

Request Body:

{
  "name": "custom-analyzer",
  "agent_type": "analyzer",
  "config": {
    "model": "llama-3.3-70b",
    "system_prompt": "You are a financial data analyzer...",
    "tools": ["calculator"],
    "max_tokens": 4096
  }
}

Update Tenant Agent

PUT /api/admin/tenants/{id}/agents/{name}

Request Body: Same structure as create. Fields provided will be updated.

Delete Tenant Agent

DELETE /api/admin/tenants/{id}/agents/{name}

Returns 204 No Content on success.


Templates and Models

List Agent Templates

GET /api/admin/agent-templates?product_type=kasino

Returns the pre-configured agent templates available for a given product type. These are cloned during provisioning.

Response:

{
  "templates": [
    {
      "name": "kasino-classifier",
      "agent_type": "classifier",
      "product_type": "kasino",
      "config": {
        "model": "llama-3.3-70b",
        "system_prompt": "You are a transaction classifier...",
        "tools": []
      }
    }
  ]
}

List Available Models

GET /api/admin/models

Returns all models configured across all providers.

Response:

{
  "models": [
    {
      "id": "llama-3.3-70b",
      "provider": "groq",
      "context_length": 131072,
      "supports_tools": true
    },
    {
      "id": "deepseek-r1",
      "provider": "nvidia-deepseek",
      "context_length": 65536,
      "supports_tools": false
    },
    {
      "id": "claude-3.5-sonnet",
      "provider": "anthropic",
      "context_length": 200000,
      "supports_tools": true
    }
  ]
}

Usage and Analytics

Tenant Usage Summary

GET /api/admin/tenants/{id}/usage

Response:

{
  "tenant_id": "tenant-uuid",
  "tenant_name": "acme-corp",
  "tier": "pro",
  "period_start": "2026-03-01T00:00:00Z",
  "period_end": "2026-03-31T23:59:59Z",
  "total_runs": 4821,
  "total_tokens": 2847193,
  "quota_runs": 100000,
  "quota_tokens": 10000000
}

Daily Usage Breakdown

GET /api/admin/tenants/{id}/usage/daily?days=30

Response:

{
  "daily": [
    { "date": "2026-03-13", "runs": 312, "tokens": 184920 },
    { "date": "2026-03-12", "runs": 287, "tokens": 171003 }
  ]
}

Agent Run History

GET /api/admin/tenants/{id}/agents/{name}/runs?limit=50

Response:

{
  "runs": [
    {
      "id": "run-uuid",
      "status": "completed",
      "started_at": "2026-03-13T14:22:00Z",
      "duration_ms": 1243,
      "tokens_used": 847
    }
  ]
}

Agent Stats

GET /api/admin/tenants/{id}/agents/{name}/stats

Response:

{
  "agent_name": "kasino-classifier",
  "total_runs": 2841,
  "successful_runs": 2815,
  "failed_runs": 26,
  "success_rate": 0.991,
  "avg_duration_ms": 1102,
  "avg_tokens": 723,
  "last_run": "2026-03-13T14:22:00Z"
}

Cross-Tenant Agent List

GET /api/admin/agents

Returns agents across all tenants. Useful for platform-wide visibility.

Platform Stats

GET /api/admin/stats

Response:

{
  "total_tenants": 12,
  "total_agents": 47,
  "total_runs_today": 3291,
  "total_tokens_today": 1948271,
  "active_alerts": 2
}

Alerts and Audit

List Alerts

GET /api/admin/alerts?severity=critical&resolved=false&limit=100

Query Parameters:

ParameterTypeDefaultDescription
severitystringallFilter by: info, warning, critical
resolvedbooleanallFilter by resolution status
limitinteger100Maximum results to return

Response:

{
  "alerts": [
    {
      "id": "alert-uuid",
      "severity": "critical",
      "message": "Tenant acme-corp approaching token quota (92%)",
      "tenant_id": "tenant-uuid",
      "created_at": "2026-03-13T10:00:00Z",
      "resolved": false
    }
  ]
}

Resolve Alert

POST /api/admin/alerts/{id}/resolve

Returns 200 OK with the updated alert object.

Audit Log

GET /api/admin/audit-log?limit=50

Response:

{
  "entries": [
    {
      "id": "entry-uuid",
      "action": "tenant.created",
      "actor": "admin",
      "details": { "tenant_name": "acme-corp", "tier": "pro" },
      "timestamp": "2026-03-13T00:00:00Z"
    },
    {
      "id": "entry-uuid",
      "action": "agent.deleted",
      "actor": "admin",
      "details": { "tenant_id": "...", "agent_name": "old-agent" },
      "timestamp": "2026-03-12T23:00:00Z"
    }
  ]
}

Deployment API

The Deployment API allows you to trigger, monitor, and inspect deployments of ARES platform services. Deployments run server-side on the VPS and stream build output for observability.

Base URL: https://api.ares.dirmacs.com

Authentication

All deployment endpoints require the admin secret:

X-Admin-Secret: <secret>

Trigger a Deployment

POST /api/admin/deploy

Starts a deployment for the specified target service. The deployment runs asynchronously — you receive a deployment ID immediately and poll for completion.

Request Body:

{
  "target": "ares"
}
TargetDescription
aresARES backend — pulls latest code, rebuilds, and restarts
admindirmacs-admin dashboard — rebuilds Leptos frontend
erukaEruka backend — pulls, rebuilds, and restarts

Response:

{
  "id": "deploy-uuid",
  "status": "running",
  "message": "Deployment started for ares"
}

curl Example:

curl -X POST https://api.ares.dirmacs.com/api/admin/deploy \
  -H "X-Admin-Secret: your-admin-secret" \
  -H "Content-Type: application/json" \
  -d '{"target": "ares"}'

Poll Deployment Status

GET /api/admin/deploy/{id}

Returns the current status of a deployment. Poll this endpoint until status is no longer "running".

Response:

{
  "id": "deploy-uuid",
  "target": "ares",
  "status": "success",
  "started_at": "2026-03-13T14:00:00Z",
  "finished_at": "2026-03-13T14:03:42Z",
  "output": "Pulling latest changes...\nCompiling ares-server v0.1.0...\nFinished release target(s) in 3m 41s\nRestarting ares.service...\nService started successfully."
}

Status Values:

StatusMeaning
runningDeployment is in progress
successDeployment completed successfully
failedDeployment failed — check output for details

Polling Pattern

The recommended approach is to trigger a deployment, then poll every 3 seconds until it completes:

# 1. Trigger deployment
DEPLOY_ID=$(curl -s -X POST https://api.ares.dirmacs.com/api/admin/deploy \
  -H "X-Admin-Secret: your-admin-secret" \
  -H "Content-Type: application/json" \
  -d '{"target": "ares"}' | jq -r '.id')

echo "Deployment started: $DEPLOY_ID"

# 2. Poll until complete
while true; do
  RESULT=$(curl -s https://api.ares.dirmacs.com/api/admin/deploy/$DEPLOY_ID \
    -H "X-Admin-Secret: your-admin-secret")

  STATUS=$(echo "$RESULT" | jq -r '.status')
  echo "Status: $STATUS"

  if [ "$STATUS" != "running" ]; then
    echo "$RESULT" | jq -r '.output'
    break
  fi

  sleep 3
done

Python Example:

import requests
import time

ADMIN_SECRET = "your-admin-secret"
BASE_URL = "https://api.ares.dirmacs.com"
headers = {
    "X-Admin-Secret": ADMIN_SECRET,
    "Content-Type": "application/json",
}

# Trigger
resp = requests.post(
    f"{BASE_URL}/api/admin/deploy",
    headers=headers,
    json={"target": "ares"},
)
deploy_id = resp.json()["id"]
print(f"Deployment started: {deploy_id}")

# Poll
while True:
    resp = requests.get(
        f"{BASE_URL}/api/admin/deploy/{deploy_id}",
        headers=headers,
    )
    result = resp.json()
    print(f"Status: {result['status']}")

    if result["status"] != "running":
        print(result["output"])
        break

    time.sleep(3)

List Recent Deployments

GET /api/admin/deploys

Returns the 20 most recent deployments, newest first.

Response:

{
  "deploys": [
    {
      "id": "deploy-uuid",
      "target": "ares",
      "status": "success",
      "started_at": "2026-03-13T14:00:00Z",
      "finished_at": "2026-03-13T14:03:42Z"
    },
    {
      "id": "deploy-uuid-2",
      "target": "admin",
      "status": "failed",
      "started_at": "2026-03-12T10:00:00Z",
      "finished_at": "2026-03-12T10:02:15Z"
    }
  ]
}

curl Example:

curl https://api.ares.dirmacs.com/api/admin/deploys \
  -H "X-Admin-Secret: your-admin-secret"

Service Health

List All Services

GET /api/admin/services

Returns the runtime status of all managed services.

Response:

{
  "ares": {
    "status": "running",
    "pid": 12847,
    "port": 3000
  },
  "eruka": {
    "status": "running",
    "pid": 12901,
    "port": 8081
  },
  "admin": {
    "status": "running",
    "pid": null,
    "port": null
  }
}
StatusMeaning
runningService is up and healthy
stoppedService is not running
degradedService is running but unhealthy

curl Example:

curl https://api.ares.dirmacs.com/api/admin/services \
  -H "X-Admin-Secret: your-admin-secret"

Get Service Logs

GET /api/admin/services/{name}/logs

Returns recent log output from the service’s systemd journal.

Response:

{
  "service": "ares",
  "lines": [
    "Mar 13 14:03:42 vps ares-server[12847]: Listening on 0.0.0.0:3000",
    "Mar 13 14:03:42 vps ares-server[12847]: Connected to PostgreSQL",
    "Mar 13 14:03:43 vps ares-server[12847]: Loaded 29 agents, 4 providers, 11 models",
    "Mar 13 14:04:01 vps ares-server[12847]: POST /v1/agents/risk-analyzer/run 200 1243ms"
  ]
}

curl Example:

curl https://api.ares.dirmacs.com/api/admin/services/ares/logs \
  -H "X-Admin-Secret: your-admin-secret"

Multi-Tenant Architecture

ARES is a multi-tenant platform. Each enterprise client operates within an isolated tenant, with their own agents, API keys, usage quotas, and data boundaries. This page explains the tenancy model and how to provision new clients.


Core Concepts

Tenants

A tenant is an isolated namespace on the ARES platform. Each tenant has:

  • A unique name and ID
  • A tier that determines rate limits and quotas
  • Its own set of agents (cloned from templates or created manually)
  • One or more API keys for authentication
  • Independent usage tracking and billing data

Tenants cannot see or interact with each other’s resources. A request authenticated with Tenant A’s API key will never return Tenant B’s agents, runs, or usage data.

Tiers

Every tenant is assigned a tier that governs their resource limits:

TierMonthly RequestsMonthly TokensDaily Rate LimitUse Case
Free1,000100,000100/dayEvaluation and testing
Dev10,0001,000,0001,000/dayDevelopment and staging
Pro100,00010,000,00010,000/dayProduction workloads
EnterpriseUnlimitedUnlimitedUnlimitedHigh-volume clients

Tiers can be changed at any time via the Admin API without disrupting the tenant’s service.

Agent Templates

When a tenant is provisioned, ARES clones a set of pre-configured agent templates based on the specified product_type. Templates provide a working starting point that can be customized after creation.

Available product types:

Product TypeTemplates IncludedDescription
genericGeneral-purpose agentsDefault chat and analysis agents
kasinokasino-classifier, kasino-risk, kasino-transaction, kasino-reportTransaction analysis and reporting
ehbHealth-oriented agentseHealthBuddy clinical agents

Each template defines the agent’s model, system prompt, tool access, and default configuration. After provisioning, agents can be freely modified or new ones added.

API Key Scoping

Every API key is bound to exactly one tenant. When a request arrives with an API key:

  1. ARES looks up the key and identifies the associated tenant
  2. All operations execute within that tenant’s scope
  3. Usage is tracked against that tenant’s quotas
  4. The response only includes that tenant’s data

A tenant can have multiple API keys (e.g., separate keys for production, staging, and mobile). Each key’s usage is tracked individually but counts toward the shared tenant quota.

Data Isolation

Tenant isolation is enforced at the database query level. Every data-accessing query includes the tenant ID as a filter condition. This means:

  • Agent listings only return the requesting tenant’s agents
  • Run history only shows runs from the requesting tenant
  • Usage data only reflects the requesting tenant’s consumption
  • There is no API surface to query across tenant boundaries (except via the Admin API)

Provisioning Flow

The recommended way to onboard a new client is the atomic provisioning endpoint. It creates all required resources in a single database transaction.

Step 1: Provision the Client

curl -X POST https://api.ares.dirmacs.com/api/admin/provision-client \
  -H "X-Admin-Secret: your-admin-secret" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "acme-corp",
    "tier": "pro",
    "product_type": "kasino",
    "api_key_name": "production"
  }'

Response:

{
  "tenant_id": "550e8400-e29b-41d4-a716-446655440000",
  "tenant_name": "acme-corp",
  "tier": "pro",
  "product_type": "kasino",
  "api_key_id": "key-uuid",
  "api_key_prefix": "ares_a1b2",
  "raw_api_key": "ares_a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5",
  "agents_created": [
    "kasino-classifier",
    "kasino-risk",
    "kasino-transaction",
    "kasino-report"
  ]
}

This single call:

  1. Creates the tenant with the specified tier
  2. Looks up the agent templates for the given product_type
  3. Clones each template as a tenant-specific agent
  4. Generates an API key bound to the new tenant
  5. Returns the raw API key (shown only once)

If any step fails, the entire operation is rolled back. You will never end up with a half-provisioned tenant.

Step 2: Deliver the API Key

Securely deliver the raw_api_key to your client. This is the only time the full key is visible — ARES stores only a hashed version internally.

Step 3: Verify the Setup

Confirm the tenant’s agents are accessible using their new API key:

curl https://api.ares.dirmacs.com/v1/agents \
  -H "Authorization: Bearer ares_a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5"

The client should see their four provisioned agents.

Step 4: Test an Agent Run

curl -X POST https://api.ares.dirmacs.com/v1/agents/kasino-classifier/run \
  -H "Authorization: Bearer ares_a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "message": "Classify this transaction: $500 at electronics store"
    }
  }'

Managing Tenants After Provisioning

Add More Agents

curl -X POST https://api.ares.dirmacs.com/api/admin/tenants/{tenant_id}/agents \
  -H "X-Admin-Secret: your-admin-secret" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "custom-summarizer",
    "agent_type": "summarizer",
    "config": {
      "model": "llama-3.3-70b",
      "system_prompt": "You summarize financial reports concisely.",
      "tools": [],
      "max_tokens": 2048
    }
  }'

Issue Additional API Keys

curl -X POST https://api.ares.dirmacs.com/api/admin/tenants/{tenant_id}/api-keys \
  -H "X-Admin-Secret: your-admin-secret" \
  -H "Content-Type: application/json" \
  -d '{"name": "staging-key"}'

Upgrade a Tenant’s Tier

curl -X PUT https://api.ares.dirmacs.com/api/admin/tenants/{tenant_id}/quota \
  -H "X-Admin-Secret: your-admin-secret" \
  -H "Content-Type: application/json" \
  -d '{"tier": "enterprise"}'

Monitor Usage

# Current period summary
curl https://api.ares.dirmacs.com/api/admin/tenants/{tenant_id}/usage \
  -H "X-Admin-Secret: your-admin-secret"

# Daily breakdown for the last 30 days
curl "https://api.ares.dirmacs.com/api/admin/tenants/{tenant_id}/usage/daily?days=30" \
  -H "X-Admin-Secret: your-admin-secret"

Architecture Notes

  • Shared infrastructure: All tenants run on the same ARES instance and database. Isolation is logical, not physical. This keeps operational costs low for the MVP phase.
  • Atomic provisioning: The provisioning endpoint uses a database transaction. If agent template cloning fails halfway through, the tenant and any partially created resources are rolled back.
  • Key hashing: API keys are hashed before storage. The raw key is returned exactly once during creation. Lost keys must be revoked and replaced.
  • Auto-migration: ARES runs database migrations on startup (sqlx::migrate!()). New tenant-related schema changes are applied automatically when the server restarts.

Rate Limits and Quotas

ARES enforces two independent layers of rate limiting to protect the platform and ensure fair resource allocation across tenants.


Layer 1: IP-Based Rate Limiting

Every incoming request is subject to per-IP rate limiting via tower_governor. This layer protects against abuse, brute-force attacks, and accidental request floods regardless of authentication status.

IP-based limits apply to all routes, including unauthenticated endpoints like /health. The specific thresholds are configured server-side and are intentionally generous for normal usage patterns.

If you hit the IP rate limit, you will receive a 429 Too Many Requests response. Back off and retry after a short delay.


Layer 2: Tenant Quotas

Authenticated requests to /v1/* are additionally subject to tenant-level quotas based on the tenant’s tier. These quotas reset at the beginning of each calendar month.

TierMonthly RequestsMonthly TokensDaily Rate Limit
Free1,000100,000100/day
Dev10,0001,000,0001,000/day
Pro100,00010,000,00010,000/day
EnterpriseUnlimitedUnlimitedUnlimited

What Counts as a Request

Each API call to a metered endpoint counts as one request:

  • POST /v1/agents/{name}/run — 1 request
  • POST /v1/chat — 1 request
  • POST /v1/chat/stream — 1 request
  • GET /v1/agents — 1 request

Read-only endpoints like GET /v1/usage and GET /v1/api-keys are metered but count toward the request total.

What Counts as Tokens

Token usage is tracked per request based on the combined input and output token count from the LLM provider. Both the prompt tokens and completion tokens are summed.


Response Headers

When you make a request to a metered endpoint, ARES includes rate limit information in the response headers:

HeaderDescription
X-RateLimit-LimitMaximum requests allowed in the current period
X-RateLimit-RemainingRequests remaining in the current period
X-RateLimit-ResetUTC timestamp when the current period resets
X-Quota-Tokens-RemainingTokens remaining in the current monthly period

Example headers:

X-RateLimit-Limit: 10000
X-RateLimit-Remaining: 7482
X-RateLimit-Reset: 2026-04-01T00:00:00Z
X-Quota-Tokens-Remaining: 8241037

Exceeding Limits

When you exceed either rate limit layer, ARES returns:

HTTP/1.1 429 Too Many Requests
Content-Type: application/json

{
  "error": "Rate limit exceeded. Daily request limit reached for your tier."
}

The error message indicates which limit was hit:

Error MessageCauseResolution
Rate limit exceededIP-based rate limitWait and retry. Reduce request frequency.
Daily request limit reached for your tierTenant daily capWait until the next UTC day, or upgrade your tier.
Monthly request quota exceededTenant monthly capWait until the next billing period, or upgrade.
Monthly token quota exceededTenant token capWait until the next billing period, or upgrade.

Checking Your Usage

You can proactively monitor your consumption to avoid hitting limits:

curl https://api.ares.dirmacs.com/v1/usage \
  -H "Authorization: Bearer ares_xxx"

Response:

{
  "period_start": "2026-03-01T00:00:00Z",
  "period_end": "2026-03-31T23:59:59Z",
  "total_runs": 4821,
  "total_tokens": 2847193,
  "total_api_calls": 5290,
  "quota_runs": 100000,
  "quota_tokens": 10000000,
  "daily_usage": [
    { "date": "2026-03-13", "runs": 312, "tokens": 184920, "api_calls": 340 }
  ]
}

Compare total_runs against quota_runs and total_tokens against quota_tokens to see how much headroom you have.


Best Practices

  1. Monitor usage proactively. Poll GET /v1/usage periodically rather than waiting for 429 errors.

  2. Implement exponential backoff. When you receive a 429, wait before retrying. A simple strategy: wait 1s, then 2s, then 4s, up to a maximum of 30s.

  3. Cache where possible. Agent listings and model metadata change infrequently. Cache these responses to reduce unnecessary API calls.

  4. Use streaming for chat. POST /v1/chat/stream counts as a single request regardless of response length, same as the non-streaming variant.

  5. Request a tier upgrade early. If you anticipate hitting your quota before month-end, contact your platform administrator to upgrade your tier. Tier changes take effect immediately.

Error Handling

ARES uses conventional HTTP status codes and a consistent JSON error format across all endpoints. This page documents the error response structure, status code meanings, and common errors with their solutions.


Error Response Format

All errors return a JSON object with an error field containing a human-readable message:

{
  "error": "Human-readable error message"
}

The HTTP status code indicates the category of error. The error string provides specific details about what went wrong.


HTTP Status Codes

Success Codes

CodeMeaningWhen Used
200OKSuccessful read or update operation
201CreatedResource successfully created (tenant, agent, API key)
204No ContentSuccessful delete with no response body

Client Error Codes

CodeMeaningWhen Used
400Bad RequestMalformed JSON, missing required fields, invalid parameter types
401UnauthorizedMissing or invalid authentication credentials
403ForbiddenValid credentials but insufficient permissions for this operation
404Not FoundResource does not exist, or does not belong to your tenant
409ConflictResource already exists (e.g., duplicate tenant name or agent name)
422Unprocessable EntityRequest is well-formed but contains invalid values (e.g., unknown tier, invalid model name)
429Too Many RequestsRate limit or quota exceeded

Server Error Codes

CodeMeaningWhen Used
500Internal Server ErrorUnexpected server-side failure

Common Errors and Solutions

Authentication Errors

Missing API key:

HTTP 401
{"error": "Missing authorization header"}

Add the Authorization: Bearer ares_xxx header to your request.

Invalid API key:

HTTP 401
{"error": "Invalid API key"}

Verify that the API key is correct and has not been revoked. API keys start with ares_.

Missing admin secret:

HTTP 401
{"error": "Missing X-Admin-Secret header"}

Admin endpoints require the X-Admin-Secret header, not the Authorization header.

Invalid admin secret:

HTTP 401
{"error": "Invalid admin secret"}

Verify the admin secret matches the value configured in ares.toml.

Resource Errors

Agent not found:

HTTP 404
{"error": "Agent not found: risk-analyzer"}

The agent does not exist for your tenant. Check the agent name with GET /v1/agents. Agent names are case-sensitive.

Tenant not found:

HTTP 404
{"error": "Tenant not found"}

The tenant ID does not exist. List tenants with GET /api/admin/tenants to find the correct ID.

Duplicate resource:

HTTP 409
{"error": "Agent with name 'risk-analyzer' already exists for this tenant"}

An agent with this name already exists. Use a different name or update the existing agent.

Validation Errors

Invalid tier:

HTTP 422
{"error": "Invalid tier: 'gold'. Valid tiers: free, dev, pro, enterprise"}

Use one of the supported tier values.

Missing required field:

HTTP 400
{"error": "Missing required field: name"}

Include all required fields in your request body. Refer to the API documentation for the specific endpoint.

Invalid JSON:

HTTP 400
{"error": "Invalid JSON in request body"}

Ensure your request body is valid JSON. Check for trailing commas, unquoted keys, or mismatched brackets. Verify the Content-Type: application/json header is set.

Rate Limit Errors

Quota exceeded:

HTTP 429
{"error": "Monthly request quota exceeded"}

Your tenant has used all allocated requests for the current billing period. Wait until the period resets or contact your administrator to upgrade your tier.

Daily limit:

HTTP 429
{"error": "Daily request limit reached for your tier"}

Your tenant has hit the daily rate cap. Wait until the next UTC day or upgrade your tier.

See Rate Limits and Quotas for details on limits by tier.

Server Errors

Internal server error:

HTTP 500
{"error": "Internal server error"}

An unexpected error occurred on the server. These are not caused by your request. If the error persists, check service health via GET /api/admin/services or inspect server logs.


Error Handling Best Practices

  1. Always check the HTTP status code first. The status code tells you the error category before you parse the response body.

  2. Parse the error message for user display. The error field is written to be human-readable and safe to show to end users.

  3. Retry on 429 and 500. Rate limit errors (429) should be retried with exponential backoff. Server errors (500) may be transient — retry once or twice before treating as a permanent failure.

  4. Do not retry on 400, 401, 403, 404, 409, or 422. These indicate problems with the request itself. Fix the request before retrying.

  5. Log the full response. When debugging, log both the HTTP status code and the response body. The error message often contains the specific field or value that caused the problem.

Example: Robust Error Handling (Python)

import requests

def run_agent(api_key, agent_name, input_data):
    response = requests.post(
        f"https://api.ares.dirmacs.com/v1/agents/{agent_name}/run",
        headers={
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json",
        },
        json={"input": input_data},
    )

    if response.status_code == 200:
        return response.json()

    error = response.json().get("error", "Unknown error")

    if response.status_code == 401:
        raise AuthenticationError(f"Authentication failed: {error}")
    elif response.status_code == 404:
        raise AgentNotFoundError(f"Agent '{agent_name}' not found: {error}")
    elif response.status_code == 429:
        raise RateLimitError(f"Rate limited: {error}")
    elif response.status_code >= 500:
        raise ServerError(f"Server error: {error}")
    else:
        raise APIError(f"API error ({response.status_code}): {error}")

Example: Robust Error Handling (JavaScript)

async function runAgent(apiKey, agentName, inputData) {
  const response = await fetch(
    `https://api.ares.dirmacs.com/v1/agents/${agentName}/run`,
    {
      method: "POST",
      headers: {
        "Authorization": `Bearer ${apiKey}`,
        "Content-Type": "application/json",
      },
      body: JSON.stringify({ input: inputData }),
    }
  );

  if (response.ok) {
    return await response.json();
  }

  const { error } = await response.json();

  switch (response.status) {
    case 401: throw new Error(`Authentication failed: ${error}`);
    case 404: throw new Error(`Agent '${agentName}' not found: ${error}`);
    case 429: throw new Error(`Rate limited: ${error}`);
    default:  throw new Error(`API error (${response.status}): ${error}`);
  }
}

Self-Hosting

Run your own ARES instance on your infrastructure. This guide covers local development setup, production deployment, and configuration options.


Prerequisites

RequirementMinimum VersionNotes
Rust1.91+Install via rustup
PostgreSQL15+Used for tenants, agents, usage tracking
Git2.xFor cloning the repository

Optional, depending on your provider configuration:

RequirementWhen Needed
Groq API keyUsing Groq as an LLM provider
Anthropic API keyUsing Anthropic as an LLM provider
NVIDIA API keyUsing NVIDIA-hosted DeepSeek models
OllamaRunning local models

Quick Start

1. Clone the Repository

git clone https://github.com/dirmacs/ares
cd ares

2. Set Up the Database

Create a PostgreSQL database for ARES:

createdb ares

ARES runs migrations automatically on startup. No manual schema setup is required.

3. Create Configuration

Copy the example config and customize it:

cp ares.example.toml ares.toml

Edit ares.toml to configure your providers and models. At minimum, you need one LLM provider:

[server]
port = 3000

[database]
url = "postgres://localhost/ares"

[[providers]]
name = "groq"
type = "openai"
base_url = "https://api.groq.com/openai/v1"
api_key_env = "GROQ_API_KEY"

[[providers.models]]
id = "llama-3.3-70b-versatile"
name = "llama-3.3-70b"
context_length = 131072

4. Set Environment Variables

export DATABASE_URL="postgres://localhost/ares"
export JWT_SECRET="your-secret-key-at-least-32-characters-long"
export API_KEY="your-admin-api-secret"
export GROQ_API_KEY="gsk_..."
VariableRequiredDescription
DATABASE_URLYesPostgreSQL connection string
JWT_SECRETYesSecret for signing JWT tokens (32+ characters)
API_KEYYesAdmin secret for /api/admin/* endpoints
GROQ_API_KEYIf using GroqGroq API key
ANTHROPIC_API_KEYIf using AnthropicAnthropic API key
NVIDIA_API_KEYIf using NVIDIANVIDIA API key

5. Build

cargo build --release --features openai,postgres,mcp

See Feature Flags for all available options.

6. Run

./target/release/ares-server

7. Verify

curl http://localhost:3000/health

You should receive a 200 OK response. ARES is running.


Feature Flags

ARES uses Cargo feature flags to control which capabilities are compiled into the binary. This keeps the binary lean — only include what you need.

FeatureDefaultDescription
openaiYesOpenAI-compatible provider support (also used for Groq, NVIDIA)
anthropicNoAnthropic Claude provider support
ollamaNoLocal Ollama model support
postgresYesPostgreSQL database backend
mcpNoModel Context Protocol support for external tool servers
ares-vectorNoVector storage and semantic search

Build Examples

Minimal build (Groq only):

cargo build --release --no-default-features --features openai,postgres

Full build (all providers):

cargo build --release --features openai,anthropic,ollama,postgres,mcp,ares-vector

Production build (recommended for VPS deployment):

cargo build --release --no-default-features --features openai,postgres,mcp

Production Deployment

systemd Service

Create a systemd unit file at /etc/systemd/system/ares.service:

[Unit]
Description=ARES AI Agent Platform
After=network.target postgresql.service
Wants=postgresql.service

[Service]
Type=simple
User=ares
Group=ares
WorkingDirectory=/opt/ares
ExecStart=/opt/ares/target/release/ares-server
Restart=on-failure
RestartSec=5
Environment=DATABASE_URL=postgres://dirmacs:password@localhost/ares
Environment=JWT_SECRET=your-production-jwt-secret
Environment=API_KEY=your-admin-secret
Environment=GROQ_API_KEY=gsk_...
Environment=RUST_LOG=info

[Install]
WantedBy=multi-user.target

Enable and start:

sudo systemctl enable ares
sudo systemctl start ares
sudo systemctl status ares

View logs:

journalctl -u ares -f

Caddy Reverse Proxy

Caddy provides automatic HTTPS with Let’s Encrypt. Create a Caddyfile:

api.ares.yourdomain.com {
    reverse_proxy localhost:3000
}

Start Caddy:

sudo systemctl enable caddy
sudo systemctl start caddy

Caddy automatically provisions and renews TLS certificates. No manual certificate management is needed.

PostgreSQL Setup

For production, create a dedicated database user:

CREATE USER ares WITH PASSWORD 'strong-password-here';
CREATE DATABASE ares OWNER ares;

Update your DATABASE_URL accordingly:

DATABASE_URL=postgres://ares:strong-password-here@localhost/ares

Configuration Reference

The ares.toml file is the primary configuration file. It controls server settings, providers, models, and agent definitions.

Server Section

[server]
port = 3000          # HTTP port (overrides PORT env var)
host = "0.0.0.0"     # Bind address

Database Section

[database]
url = "postgres://ares:password@localhost/ares"
max_connections = 10

Provider Section

Each provider is defined as a [[providers]] entry:

[[providers]]
name = "groq"
type = "openai"
base_url = "https://api.groq.com/openai/v1"
api_key_env = "GROQ_API_KEY"

[[providers.models]]
id = "llama-3.3-70b-versatile"
name = "llama-3.3-70b"
context_length = 131072

[[providers.models]]
id = "llama-3.1-8b-instant"
name = "llama-3.1-8b"
context_length = 131072

[[providers]]
name = "anthropic"
type = "anthropic"
api_key_env = "ANTHROPIC_API_KEY"

[[providers.models]]
id = "claude-3-5-sonnet-20241022"
name = "claude-3.5-sonnet"
context_length = 200000

[[providers]]
name = "local"
type = "ollama"
base_url = "http://localhost:11434"

[[providers.models]]
id = "mistral"
name = "mistral-7b"
context_length = 32768

Agent Section

Static agents can be defined in the config file:

[[agents]]
name = "general-assistant"
model = "llama-3.3-70b"
system_prompt = "You are a helpful assistant."
tools = ["calculator", "web_search"]
max_tokens = 4096

For tenant-specific agents, use the Admin API instead of config file definitions.


Updating

To update a running ARES instance:

cd /opt/ares
git pull origin main
cargo build --release --no-default-features --features openai,postgres,mcp
sudo systemctl restart ares

Database migrations run automatically on startup. No manual migration steps are needed.


Troubleshooting

Port already in use:

Error: Address already in use (os error 98)

Another process is using port 3000. Either stop it or change the port in ares.toml.

Database connection failed:

Error: error communicating with database

Verify PostgreSQL is running and your DATABASE_URL is correct. Check that the database user has permissions on the database.

Provider API key missing:

Error: Environment variable GROQ_API_KEY not set

Set the required API key environment variable, or remove the provider from ares.toml if you do not need it.

JWT secret too short:

Error: JWT_SECRET must be at least 32 characters

Use a longer secret. Generate one with: openssl rand -hex 32

Guide: Build a Chat Agent

This guide walks you through creating a custom chat agent on ARES — from defining its behavior to testing it in production.


What is an Agent?

An ARES agent is a configured LLM endpoint with a specific personality, instructions, and tool access. Each agent has:

  • A name — unique identifier used in API calls
  • A model — which LLM powers it (e.g., llama-3.3-70b, claude-3.5-sonnet)
  • A system prompt — instructions that define the agent’s behavior
  • Tools — optional capabilities like calculator or web_search
  • Configuration — max tokens, temperature, and other parameters

You can create agents in two ways: via the configuration file or via the API.


Option 1: Define in ares.toml

For agents that are part of your core platform, define them in the ares.toml configuration file:

[[agents]]
name = "financial-analyst"
model = "llama-3.3-70b"
system_prompt = """
You are a senior financial analyst. You help users understand financial data,
calculate metrics, and provide clear explanations of financial concepts.

Guidelines:
- Always show your calculations step by step
- Use the calculator tool for arithmetic to ensure accuracy
- Present numbers with appropriate formatting (commas, decimal places)
- When uncertain, clearly state your assumptions
"""
tools = ["calculator"]
max_tokens = 4096

Restart ARES to load the new agent. It will be available immediately at /api/chat using agent_type: "financial-analyst".

TOON Config Format

ARES also supports the TOON configuration format for more structured agent definitions:

[[agents]]
name = "support-agent"
model = "llama-3.3-70b"

[agents.toon]
role = "Customer Support Specialist"
personality = "Professional, empathetic, solution-oriented"
knowledge = ["product documentation", "pricing plans", "common issues"]
constraints = [
    "Never make up information about products",
    "Escalate billing disputes to human agents",
    "Always confirm the customer's issue before proposing a solution",
]
tools = ["web_search"]

The TOON format structures the system prompt into semantic fields that ARES assembles into a coherent prompt. This makes agent behavior easier to reason about and modify.


Option 2: Create via API

For tenant-specific agents or agents you want to manage programmatically, use the API.

As a Platform Admin

curl -X POST https://api.ares.dirmacs.com/api/admin/tenants/{tenant_id}/agents \
  -H "X-Admin-Secret: your-admin-secret" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "financial-analyst",
    "agent_type": "analyst",
    "config": {
      "model": "llama-3.3-70b",
      "system_prompt": "You are a senior financial analyst...",
      "tools": ["calculator"],
      "max_tokens": 4096
    }
  }'

As an Authenticated User

curl -X POST https://api.ares.dirmacs.com/api/user/agents \
  -H "Authorization: Bearer <jwt_token>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my-analyst",
    "agent_type": "analyst",
    "config": {
      "model": "llama-3.3-70b",
      "system_prompt": "You are a senior financial analyst...",
      "tools": ["calculator"],
      "max_tokens": 4096
    }
  }'

Testing Your Agent

Basic Chat

Send a message to your agent:

curl -X POST https://api.ares.dirmacs.com/api/chat \
  -H "Authorization: Bearer <jwt_token>" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "What is the compound annual growth rate if revenue went from $1M to $1.8M over 3 years?"}
    ],
    "agent_type": "financial-analyst"
  }'

Expected response:

{
  "content": "To calculate the Compound Annual Growth Rate (CAGR):\n\nCAGR = (Ending Value / Beginning Value)^(1/n) - 1\nCAGR = ($1,800,000 / $1,000,000)^(1/3) - 1\nCAGR = (1.8)^(0.3333) - 1\nCAGR = 1.2164 - 1\nCAGR = 0.2164\n\n**The CAGR is 21.64%.**\n\nThis means revenue grew at an average annual rate of approximately 21.6% over the 3-year period.",
  "model": "llama-3.3-70b",
  "tokens_used": 287
}

Multi-Turn Conversation

Include the conversation history in the messages array:

curl -X POST https://api.ares.dirmacs.com/api/chat \
  -H "Authorization: Bearer <jwt_token>" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "What is the CAGR from $1M to $1.8M over 3 years?"},
      {"role": "assistant", "content": "The CAGR is 21.64%..."},
      {"role": "user", "content": "What if the period was 5 years instead?"}
    ],
    "agent_type": "financial-analyst"
  }'

With Tool Usage

If your agent has tools enabled, ARES handles the tool calling loop automatically. You send a normal chat message, and the agent uses tools as needed:

curl -X POST https://api.ares.dirmacs.com/api/chat \
  -H "Authorization: Bearer <jwt_token>" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "Calculate 15% annual compound interest on $50,000 over 10 years"}
    ],
    "agent_type": "financial-analyst"
  }'

The agent will internally call the calculator tool to compute 50000 * (1.15)^10 and return the formatted result.

Streaming

For real-time responses, use the streaming endpoint:

curl -X POST https://api.ares.dirmacs.com/api/chat/stream \
  -H "Authorization: Bearer <jwt_token>" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "Explain the difference between NPV and IRR"}
    ],
    "agent_type": "financial-analyst"
  }'

This returns a Server-Sent Events stream. See the V1 API docs for client-side streaming examples.


Iterating on the System Prompt

The system prompt is the most important part of your agent. Here are practical guidelines:

Be Specific About Format

Bad:

You are a helpful assistant.

Good:

You are a financial analyst. When presenting calculations:
- Show each step on its own line
- Use the calculator tool for all arithmetic
- Format currency with $ and commas
- Round percentages to 2 decimal places
- End with a bold summary line

Define Boundaries

Tell the agent what it should not do:

Constraints:
- Never provide specific investment advice or recommend buying/selling securities
- If asked about tax implications, recommend consulting a tax professional
- Do not speculate about future market movements
- If you don't have enough data to answer accurately, say so

Include Examples

For complex formatting requirements, show the agent what you want:

When comparing metrics, use this format:

| Metric | 2024 | 2025 | Change |
|--------|------|------|--------|
| Revenue | $1.2M | $1.8M | +50% |
| EBITDA | $300K | $480K | +60% |

Test Edge Cases

After writing your system prompt, test these scenarios:

  1. Off-topic requests — Does the agent stay in character or helpfully redirect?
  2. Ambiguous inputs — Does the agent ask for clarification?
  3. Tool failures — Does the agent handle tool errors gracefully?
  4. Long conversations — Does the agent maintain context over multiple turns?

Adding Tool Access

Agents can use built-in tools to extend their capabilities:

[[agents]]
name = "research-agent"
model = "llama-3.3-70b"
system_prompt = "You are a research agent with access to web search and calculation tools."
tools = ["calculator", "web_search"]

Available built-in tools:

ToolDescription
calculatorEvaluate mathematical expressions
web_searchSearch the web for current information

See the Tool Calling guide for details on how tool execution works.


Choosing a Model

Different models have different strengths. Consider these factors when choosing:

ModelProviderBest For
llama-3.3-70bGroqGeneral-purpose, fast, good reasoning
llama-3.1-8bGroqSimple tasks, lowest latency
deepseek-r1NVIDIAComplex reasoning, chain-of-thought
claude-3.5-sonnetAnthropicNuanced writing, careful analysis

Start with llama-3.3-70b for most use cases. It offers a strong balance of capability, speed, and cost. Move to a specialized model only if you have a specific need.

Check available models with:

curl https://api.ares.dirmacs.com/api/admin/models \
  -H "X-Admin-Secret: your-admin-secret"

Guide: Tool Calling

ARES supports tool calling (also known as function calling), allowing agents to use external tools during a conversation. When an agent needs to perform a calculation, search the web, or interact with an external system, it requests a tool call. ARES executes the tool and feeds the result back to the agent, which then incorporates it into its response.


How It Works

Tool calling in ARES follows a multi-turn loop managed by the ToolCoordinator:

User message
    |
    v
Agent (LLM) generates response
    |
    ├── If response is final text → return to user
    |
    └── If response contains tool_calls →
            |
            v
        ARES executes each tool
            |
            v
        Results sent back to agent
            |
            v
        Agent generates next response (may call more tools or return final text)

This loop continues until the agent produces a final text response or the maximum iteration limit is reached. The entire process is transparent to the caller — you send a chat message and receive a complete response.


Built-in Tools

ARES ships with two built-in tools:

calculator

Evaluates mathematical expressions and returns the result.

Capabilities:

  • Basic arithmetic: +, -, *, /
  • Exponents: ^ or **
  • Parentheses for grouping
  • Common functions: sqrt, sin, cos, log, ln, abs
  • Constants: pi, e

Example tool call from agent:

{
  "name": "calculator",
  "arguments": {
    "expression": "50000 * (1.15 ^ 10)"
  }
}

Result returned to agent:

{
  "result": 202278.25
}

Searches the web and returns relevant results.

Example tool call from agent:

{
  "name": "web_search",
  "arguments": {
    "query": "current US federal interest rate 2026"
  }
}

Result returned to agent:

{
  "results": [
    {
      "title": "Federal Reserve holds rate at 4.25%",
      "url": "https://...",
      "snippet": "The Federal Reserve maintained its benchmark rate..."
    }
  ]
}

Configuring Tool Access

Per-Agent Tool Filtering

Each agent specifies which tools it can use. An agent without tools configured cannot make tool calls, even if the underlying model supports them.

In ares.toml:

[[agents]]
name = "research-assistant"
model = "llama-3.3-70b"
system_prompt = "You are a research assistant with access to web search and calculation tools."
tools = ["calculator", "web_search"]

[[agents]]
name = "math-tutor"
model = "llama-3.3-70b"
system_prompt = "You are a math tutor. Use the calculator to verify your work."
tools = ["calculator"]

[[agents]]
name = "simple-chat"
model = "llama-3.3-70b"
system_prompt = "You are a conversational assistant."
tools = []

Via the API:

curl -X POST https://api.ares.dirmacs.com/api/admin/tenants/{id}/agents \
  -H "X-Admin-Secret: your-admin-secret" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "analyst",
    "agent_type": "analyst",
    "config": {
      "model": "llama-3.3-70b",
      "system_prompt": "You are a data analyst.",
      "tools": ["calculator", "web_search"],
      "max_tokens": 4096
    }
  }'

ToolCoordinator

The ToolCoordinator is the internal component that manages the tool calling loop. It handles:

  • Multi-turn orchestration — Sending tool results back to the model and processing follow-up tool calls
  • Parallel execution — When the model requests multiple tools in a single turn, they execute concurrently
  • Timeout enforcement — Individual tool calls are bounded by a configurable timeout
  • Iteration limits — Prevents infinite tool-calling loops

Configuration

Tool calling behavior is configured at the server level:

SettingDefaultDescription
max_iterations10Maximum tool-calling rounds before forcing a text response
parallel_executiontrueExecute multiple tool calls concurrently within a single turn
tool_timeout30sMaximum time for a single tool execution

If an agent hits the iteration limit, ARES instructs the model to produce a final response using the information gathered so far.


Provider Compatibility

Tool calling requires model support. Not all providers and models support function calling:

ProviderModelsTool Calling
Groqllama-3.3-70b, llama-3.1-8bSupported
Anthropicclaude-3.5-sonnetSupported
NVIDIAdeepseek-r1Not supported
OllamaVaries by modelModel-dependent

If you assign tools to an agent using a model that does not support tool calling, the tools will be ignored and the agent will respond with text only.


Example: Conversation with Tool Calls

Here is what happens internally when a user asks a question that requires tool use.

User sends:

curl -X POST https://api.ares.dirmacs.com/v1/chat \
  -H "Authorization: Bearer ares_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "What is the monthly payment on a $400,000 mortgage at 6.5% for 30 years?"}
    ],
    "agent_type": "financial-analyst"
  }'

Internal flow:

  1. ARES sends the message to the LLM with the calculator tool definition
  2. The LLM responds with a tool call:
    {
      "tool_calls": [{
        "name": "calculator",
        "arguments": {"expression": "(400000 * (0.065/12) * (1 + 0.065/12)^360) / ((1 + 0.065/12)^360 - 1)"}
      }]
    }
    
  3. ARES executes the calculator and gets 2528.27
  4. ARES sends the result back to the LLM
  5. The LLM produces a final text response incorporating the calculated value

User receives:

{
  "content": "The monthly payment on a $400,000 mortgage at 6.5% APR over 30 years would be **$2,528.27**.\n\nThis is calculated using the standard amortization formula...",
  "model": "llama-3.3-70b",
  "tokens_used": 412
}

The tool-calling steps are invisible to the caller. You send a question and receive a complete answer.


Example: Multiple Tool Calls in One Turn

Models can request multiple tools simultaneously. For example, a research agent asked to “Compare the population of Tokyo and New York” might request two web searches in parallel:

{
  "tool_calls": [
    {"name": "web_search", "arguments": {"query": "Tokyo population 2026"}},
    {"name": "web_search", "arguments": {"query": "New York population 2026"}}
  ]
}

With parallel_execution enabled (the default), both searches execute concurrently. The results are sent back to the model together, and it produces a response comparing both cities.


Example: Multi-Turn Tool Usage

Some questions require multiple rounds of tool use. For example:

User: “What is 15% of the GDP of France?”

Turn 1 — Agent calls web_search:

{"name": "web_search", "arguments": {"query": "France GDP 2026 USD"}}

Result: France’s GDP is approximately $3.1 trillion.

Turn 2 — Agent calls calculator:

{"name": "calculator", "arguments": {"expression": "3100000000000 * 0.15"}}

Result: 465,000,000,000

Turn 3 — Agent produces final response: “15% of France’s GDP (approximately $3.1 trillion) is $465 billion.”

Each round counts toward the max_iterations limit.


Error Handling

If a tool call fails (timeout, invalid input, etc.), ARES returns an error result to the model:

{
  "tool_result": {
    "name": "web_search",
    "error": "Search timed out after 30 seconds"
  }
}

The model can then decide to:

  • Retry the tool call with different parameters
  • Use a different tool
  • Respond with what it knows, noting the tool failure

Well-designed system prompts should instruct the agent on how to handle tool failures gracefully.

Changelog

All notable changes to ARES are documented here. This project follows Semantic Versioning.


0.6.3

Multi-provider LLM, tenant agents, and enterprise metering.

This release transforms ARES from a single-provider system into a full multi-provider LLM platform with enterprise-grade tenant management.

Added

  • Multi-provider LLM routing — Support for 4 providers (Groq, Anthropic, NVIDIA DeepSeek, Ollama) and 11 models through a unified API.
  • Model tier systemfast, balanced, powerful, deepseek, and local tiers with automatic provider routing.
  • Tenant agent system — Agents stored in the database per tenant. Template-based provisioning with full CRUD via admin API.
  • Agent templates — Seed templates applied automatically on startup. New tenants receive a default agent set.
  • Usage meteringusage_events table, monthly_usage_cache, and daily_rate_limits for tracking tokens, requests, and costs per tenant.
  • API key authenticationAuthorization: Bearer ares_xxx on /v1/* routes with tenant scoping.
  • Kasino enterprise agents — 4 specialized agent templates (kasino-classifier, kasino-risk, kasino-transaction, kasino-report) for the first enterprise client.
  • Kasino API routes — Both JWT-protected (/api/kasino/*) and API-key (/v1/kasino/*) endpoints.
  • Admin provisioning API — Atomic tenant creation: schema + agents + API key in a single operation.

Changed

  • Chat handler now resolves tenant_id from authentication context instead of hardcoded values.
  • Provider configuration moved from code to ares.toml for runtime flexibility.
  • Rate limit enforcement now operates at both the provider and tenant level.

Fixed

  • Chat handler tenant_id resolution for multi-tenant requests.

0.6.2

Streaming and SSE support.

Added

  • Server-Sent Events streamingPOST /v1/chat/stream endpoint for real-time, token-by-token responses.
  • Stream handler — Unified streaming across all providers with consistent SSE format.
  • Context continuationcontext_id parameter for maintaining conversation history across requests.

Changed

  • Response format standardized to {"response", "agent", "context_id"} across all endpoints.

0.6.1

Tool calling and RAG foundations.

Added

  • Tool calling framework — Define tools per agent. ARES manages the tool-call loop, execution, and response assembly.
  • RAG pipeline — Retrieval-augmented generation with pluggable document stores.
  • Workflow engine — Chain multiple agents into multi-step workflows with deterministic execution.

Changed

  • Agent configuration schema extended to support tool definitions and RAG settings.

0.5.0

JWT authentication and user management.

Added

  • User registration and loginPOST /api/auth/register, POST /api/auth/login.
  • JWT token lifecycle — 15-minute access tokens, refresh token rotation, logout/invalidation.
  • Role-based access — User roles with permission checks on protected routes.
  • Admin authenticationX-Admin-Secret header for internal administration endpoints.

Changed

  • All /api/* routes now require JWT authentication.
  • Error responses standardized with error and code fields.

0.4.0

PostgreSQL backend and multi-tenant schema.

Added

  • PostgreSQL integration — Full migration from in-memory storage to PostgreSQL with sqlx.
  • Auto-migrationsqlx::migrate!() runs on startup. No manual SQL required.
  • Tenant schematenants, tenant_agents, and api_keys tables with foreign key relationships.
  • Tenant tiers — Free, Dev, Pro, and Enterprise tiers with configurable limits.

Changed

  • All state persistence moved from in-memory structures to PostgreSQL.
  • Connection pooling via sqlx::PgPool with configurable pool size.

For the complete commit history, see the ARES repository on GitHub.