Guide: Tool Calling

ARES supports tool calling (also known as function calling), allowing agents to use external tools during a conversation. When an agent needs to perform a calculation, search the web, or interact with an external system, it requests a tool call. ARES executes the tool and feeds the result back to the agent, which then incorporates it into its response.

How It Works

Tool calling in ARES follows a multi-turn loop managed by the ToolCoordinator:

User message
    |
    v
Agent (LLM) generates response
    |
    ├── If response is final text → return to user
    |
    └── If response contains tool_calls →
            |
            v
        ARES executes each tool
            |
            v
        Results sent back to agent
            |
            v
        Agent generates next response (may call more tools or return final text)

This loop continues until the agent produces a final text response or the maximum iteration limit is reached. The entire process is transparent to the caller — you send a chat message and receive a complete response.

Built-in Tools

ARES ships with two built-in tools:

calculator

Evaluates mathematical expressions and returns the result.

Capabilities:

Basic arithmetic: +, -, *, /
Exponents: ^ or **
Parentheses for grouping
Common functions: sqrt, sin, cos, log, ln, abs
Constants: pi, e

Example tool call from agent:

{
  "name": "calculator",
  "arguments": {
    "expression": "50000 * (1.15 ^ 10)"
  }
}

Result returned to agent:

{
  "result": 202278.25
}

web_search

Searches the web and returns relevant results.

Example tool call from agent:

{
  "name": "web_search",
  "arguments": {
    "query": "current US federal interest rate 2026"
  }
}

Result returned to agent:

{
  "results": [
    {
      "title": "Federal Reserve holds rate at 4.25%",
      "url": "https://...",
      "snippet": "The Federal Reserve maintained its benchmark rate..."
    }
  ]
}

Configuring Tool Access

Per-Agent Tool Filtering

Each agent specifies which tools it can use. An agent without tools configured cannot make tool calls, even if the underlying model supports them.

In ares.toml:

[[agents]]
name = "research-assistant"
model = "llama-3.3-70b"
system_prompt = "You are a research assistant with access to web search and calculation tools."
tools = ["calculator", "web_search"]

[[agents]]
name = "math-tutor"
model = "llama-3.3-70b"
system_prompt = "You are a math tutor. Use the calculator to verify your work."
tools = ["calculator"]

[[agents]]
name = "simple-chat"
model = "llama-3.3-70b"
system_prompt = "You are a conversational assistant."
tools = []

Via the API:

curl -X POST http://localhost:3000/api/admin/tenants/{id}/agents \
  -H "X-Admin-Secret: your-admin-secret" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "analyst",
    "agent_type": "analyst",
    "config": {
      "model": "llama-3.3-70b",
      "system_prompt": "You are a data analyst.",
      "tools": ["calculator", "web_search"],
      "max_tokens": 4096
    }
  }'

ToolCoordinator

The ToolCoordinator is the internal component that manages the tool calling loop. It handles:

Multi-turn orchestration — Sending tool results back to the model and processing follow-up tool calls
Parallel execution — When the model requests multiple tools in a single turn, they execute concurrently
Timeout enforcement — Individual tool calls are bounded by a configurable timeout
Iteration limits — Prevents infinite tool-calling loops

Configuration

Tool calling behavior is configured at the server level:

Setting	Default	Description
`max_iterations`	`10`	Maximum tool-calling rounds before forcing a text response
`parallel_execution`	`true`	Execute multiple tool calls concurrently within a single turn
`tool_timeout`	`30s`	Maximum time for a single tool execution

If an agent hits the iteration limit, ARES instructs the model to produce a final response using the information gathered so far.

Provider Compatibility

Tool calling requires model support. Not all providers and models support function calling:

Provider	Models	Tool Calling
Groq	llama-3.3-70b, llama-3.1-8b	Supported
Anthropic	claude-3.5-sonnet	Supported
NVIDIA	deepseek-r1	Not supported
Ollama	Varies by model	Model-dependent

If you assign tools to an agent using a model that does not support tool calling, the tools will be ignored and the agent will respond with text only.

Example: Conversation with Tool Calls

Here is what happens internally when a user asks a question that requires tool use.

User sends:

curl -X POST http://localhost:3000/v1/chat \
  -H "Authorization: Bearer ares_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "What is the monthly payment on a $400,000 mortgage at 6.5% for 30 years?"}
    ],
    "agent_type": "financial-analyst"
  }'

Internal flow:

ARES sends the message to the LLM with the calculator tool definition

The LLM responds with a tool call:

{
  "tool_calls": [{
    "name": "calculator",
    "arguments": {"expression": "(400000 * (0.065/12) * (1 + 0.065/12)^360) / ((1 + 0.065/12)^360 - 1)"}
  }]
}

ARES executes the calculator and gets 2528.27
ARES sends the result back to the LLM
The LLM produces a final text response incorporating the calculated value

User receives:

{
  "content": "The monthly payment on a $400,000 mortgage at 6.5% APR over 30 years would be **$2,528.27**.\n\nThis is calculated using the standard amortization formula...",
  "model": "llama-3.3-70b",
  "tokens_used": 412
}

The tool-calling steps are invisible to the caller. You send a question and receive a complete answer.

Example: Multiple Tool Calls in One Turn

Models can request multiple tools simultaneously. For example, a research agent asked to "Compare the population of Tokyo and New York" might request two web searches in parallel:

{
  "tool_calls": [
    {"name": "web_search", "arguments": {"query": "Tokyo population 2026"}},
    {"name": "web_search", "arguments": {"query": "New York population 2026"}}
  ]
}

With parallel_execution enabled (the default), both searches execute concurrently. The results are sent back to the model together, and it produces a response comparing both cities.

Example: Multi-Turn Tool Usage

Some questions require multiple rounds of tool use. For example:

User: "What is 15% of the GDP of France?"

Turn 1 — Agent calls web_search:

{"name": "web_search", "arguments": {"query": "France GDP 2026 USD"}}

Result: France's GDP is approximately $3.1 trillion.

Turn 2 — Agent calls calculator:

{"name": "calculator", "arguments": {"expression": "3100000000000 * 0.15"}}

Result: 465,000,000,000

Turn 3 — Agent produces final response: "15% of France's GDP (approximately $3.1 trillion) is $465 billion."

Each round counts toward the max_iterations limit.

Error Handling

If a tool call fails (timeout, invalid input, etc.), ARES returns an error result to the model:

{
  "tool_result": {
    "name": "web_search",
    "error": "Search timed out after 30 seconds"
  }
}

The model can then decide to:

Retry the tool call with different parameters
Use a different tool
Respond with what it knows, noting the tool failure

Well-designed system prompts should instruct the agent on how to handle tool failures gracefully.

ARES Documentation