English · Español

01 — Function-Calling Formats: A Convergence¶

🇪🇸 OpenAI, Anthropic y "raw JSON-Schema" tienen el mismo esqueleto: declarar herramientas, recibir una petición de llamada, ejecutar, devolver el resultado. Las diferencias son cosméticas. Conociendo el patrón, leer cualquier SDK nuevo toma cinco minutos.

This page surveys the major function-calling formats so Borja can read any provider's docs without translation overhead. The thesis: under the cosmetic differences, they are the same protocol.

Every tool-calling system in 2026 has these four elements:

Tool declarations. A list of {name, description, input_schema} triples sent to the model with the prompt.
A tool-call message. When the model decides to call a tool, it emits a structured response containing {tool_name, arguments} (or the equivalent fields under different names).
A tool-result message. The host executes the tool and feeds the result back to the model — usually as a special role-tagged message.
A continuation step. The model, now seeing the tool result, produces its next response (which may be another tool call or a final answer).

The differences between providers are entirely in field names and message-role conventions.

OpenAI format¶

// In the request:
{
  "model": "gpt-4o",
  "messages": [...],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "conjugate",
        "description": "Return the conjugated form of an English verb.",
        "parameters": { "type": "object", "properties": {...}, "required": [...] }
      }
    }
  ]
}

// Model's response when calling a tool:
{
  "choices": [{
    "message": {
      "role": "assistant",
      "tool_calls": [{
        "id": "call_abc",
        "type": "function",
        "function": { "name": "conjugate", "arguments": "{\"verb\":\"eat\",...}" }
      }]
    },
    "finish_reason": "tool_calls"
  }]
}

// Follow-up message from host:
{
  "role": "tool",
  "tool_call_id": "call_abc",
  "content": "ate"
}

Note: function.arguments is a string (JSON-encoded), not an object. This is a wart — the model emits JSON, then the wire format wraps it in a string. Phase 31 unwraps and validates.

Anthropic format¶

// In the request:
{
  "model": "claude-opus-4-7",
  "messages": [...],
  "tools": [
    {
      "name": "conjugate",
      "description": "Return the conjugated form of an English verb.",
      "input_schema": { "type": "object", "properties": {...}, "required": [...] }
    }
  ]
}

// Model's response when calling a tool:
{
  "stop_reason": "tool_use",
  "content": [
    { "type": "text", "text": "Let me check that conjugation." },
    {
      "type": "tool_use",
      "id": "toolu_abc",
      "name": "conjugate",
      "input": { "verb": "eat", "tense": "past_simple", "person": "3sg" }
    }
  ]
}

// Follow-up message from host:
{
  "role": "user",
  "content": [{
    "type": "tool_result",
    "tool_use_id": "toolu_abc",
    "content": "ate"
  }]
}

Note: input is an object, not a string. Cleaner than OpenAI's. Also note tool results come back under role: "user" with a tool_result content block — Anthropic models the tool result as "more input from the environment".

Raw JSON-Schema (the lowest common denominator)¶

Strip the provider conventions:

{
  "name": "conjugate",
  "description": "...",
  "input_schema": { JSON-Schema for the arguments }
}

This is what MCP uses. Provider-specific message-role conventions are abstracted away. Phase 31's Tool dataclass adopts this:

@dataclass(frozen=True)
class Tool:
    name: str
    description: str
    input_schema: dict   # JSON-Schema
    fn: Callable[..., Any]

Wrapping for OpenAI, Anthropic, or MCP is then a thin adapter — wrap each Tool in the provider's preferred envelope.

The argument-format question¶

Across providers, the model's job is to emit JSON matching the input_schema. The argument JSON is generated by the LM the same way as any other output. This is where Phase 30's JSONSchemaMask saves the day.

Without masking: - The model sometimes emits {verb: eat, ...} (missing quotes around the key — invalid JSON). - The model sometimes emits {"verb": "ate", ...} (a conjugated form, violating the enum). - The model sometimes emits {"verb": "eat", "tense": "past", "person": "3sg"} ("past" is outside the enum).

With masking: - The mask forbids unquoted keys → invalid JSON impossible. - The mask forbids string-values outside the verb enum → "ate" impossible at that position. - The mask forbids string-values outside the tense enum → "past" impossible.

The whole class of "tool-call corruption" failures is eliminated by construction. This is the central operational claim of structured-decoding-plus-tool-use.

The tool-result format¶

Less convergent than the call format. OpenAI uses role: "tool". Anthropic uses a tool_result content block under role: "user". MCP uses a separate JSON-RPC response message.

For Phase 31 we adopt MCP-style: the tool result is a JSON object returned by tools/call:

{
  "jsonrpc": "2.0",
  "id": "request-1",
  "result": {
    "content": [{ "type": "text", "text": "ate" }],
    "isError": false
  }
}

content is an array because some tools return multimodal results (text + images). For Phase 31 every tool returns text.

Errors¶

Errors are first-class. A tool that fails returns:

{
  "jsonrpc": "2.0",
  "id": "request-1",
  "result": {
    "content": [{ "type": "text", "text": "verb 'run' is out of scope" }],
    "isError": true
  }
}

Crucially, isError: true does not translate to JSON-RPC's error field. The protocol distinguishes:

Protocol errors (JSON-RPC error): malformed message, unknown method, schema-invalid args caught at the server boundary. These prevent the tool from running.
Tool errors (result.isError: true): the tool ran but returned a logical failure. The agent receives this as a normal result and can reason about it.

This distinction matters in Phase 32: the agent can recover from tool errors (re-plan, try a different tool) but cannot recover from protocol errors (the server isn't even there).

Number-of-tool-calls-per-turn¶

OpenAI and Anthropic allow multiple tool calls in a single model response. Phase 31's mini server supports one call per tools/call request; multi-call requests are decomposed into sequential single calls by the client. This is a simplification for pedagogical clarity, not a protocol limit.

What this means for Phase 31's design¶

src/miniagent/tools/base.py defines a single Tool dataclass with the raw JSON-Schema shape. mcp_server.py exposes it under MCP conventions. If a future phase wants OpenAI- or Anthropic-format adapters, they are 20-line wrappers around the same Tool registry. We do not write those adapters now.

What this page does NOT cover¶

Streaming tool calls. Some providers stream arguments token-by-token. We do not.
Parallel tool calls. Some providers support firing multiple tools concurrently. We do not.
Tool-result caching. Phase 33 may add this.
Multi-modal tool results. Our tools all return text.

Next: theory/02-mcp-architecture.md — the actual MCP protocol structure.