Writing Effective MCP Tool Definitions

12 min read

Executive summary

Well-designed tools are the single highest-leverage factor in building reliable agents.

  • Tools must be ergonomic for agents—intuitive, context-efficient, and resilient to error.

  • Poor tool design leads directly to low accuracy, hallucination, latency, and brittle workflows.

Currently (Feb 2026) context budget is the most valuable resource, in other words, a key limitation and must be managed thoughtfully. Tools with good ergonomics for agents are

  • Intuitive – obvious when and why to use them

  • Context‑efficient – minimal tokens, high signal

  • Outcome‑oriented – reflect user goals, not raw APIs

  • Fool‑proofed – resilient to partial, incorrect, or ambiguous calls

Tools should not simply wrap existing API endpoints. They should be designed interfaces for reasoning systems.


Tool Definition Standards

The quality of the tool definition is the single highest-leverage factor in agent reliability. Treat tool definitions as strictly typed contracts.

Naming & Descriptions

  • Use active verbs

  • Apply domain namespacing (github_get_user)

  • Explicitly state negative capabilities to prevent hallucinated capabilities, e.g. "Searches internal documentation only. Does not access the public internet."

Schema & Payloads

  • Enforce strict typing: Use enum for parameters with a finite set of valid values (e.g., status: ["open", "closed"]) instead of string. This constrains the model's output generation to valid inputs only.   

  • Large payloads increase latency and destroy context efficiency. Tools must not return massive JSON blobs. Implement default truncation (e.g., top 50 rows) and require pagination. Large return payloads invoke a "re-tokenization penalty" that kills latency.   


Tool Description "Prompt Engineering"

This is the single most important factor in tool performance. Tool descriptions act as decision‑making prompts. Use

  1. Detailed descriptions: At least 3-4 sentences per tool. Explicitly define specialized query formats, niche terminology, and relationships between resources. Descriptions should explain in detail

    1. what the tool does (and does not do to reduce hallucinations)

    2. when it should (and shouldn't) be used,

    3. what each parameter means and how it affects the tool's behavior, and

    4. any important caveats or limitations, such as what information the tool does not return if the tool name is unclear.

  2. Unambiguous Naming: Use precise names for input parameters. For example, use user_id instead of just user to ensure the agent understands exactly what data is required. Clear naming reduces invalid tool calls.

  3. Active Verbs: Begin descriptions with strong, distinct verbs (e.g., fetch_, update_, archive_) rather than vague terms like handle_ or manage_.

  4. Explicit Constraints: Clearly describe and enforce data models for inputs and outputs to reduce hallucinations and improper tool calls, in particular allowed values, required formats, field relationships. Strict schemas dramatically reduce hallucinated inputs.

  5. Input Examples (if available): Clear descriptions are most important, but for tools with complex inputs, nested objects, or format-sensitive parameters, provide schema‑validated examples using input_examples (if supported).

  6. Proximity to Action: Instructions only relevant to a tool should be in the tool's description to ensure it is only injected into the context window when the model is considering that tool. In particular, instructions that apply only to one tool must live in the tool description and the docstring, but not in the system prompt. This ensures they are injected only when relevant.

The following JSON schema definition for an MCP tool illustrates all six principles above. The tool searches SAP CRM for business accounts — a common scenario where multiple search tools coexist (e.g. for contacts, opportunities, employees).

json
{
  "name": "sap_crm_search_accounts",
  "description": "Search SAP CRM for business account records — companies, clients, and prospects. Returns one or more accounts matching the query, each with: CRM account ID, legal name, industry classification, lifecycle status (active / prospect / inactive), assigned relationship manager email, and last activity date. Scope is limited to organisational accounts only; individual contacts, deals, and opportunities are separate record types not returned here. Use for requests like 'find Allianz in CRM', 'which accounts does Anna Müller own?', or 'show me all active insurance clients'. Does not return full interaction history or financial data — use the returned account_id for subsequent detail calls.",
  "inputSchema": {
    "type": "object",
    "properties": {
      "query": {
        "type": "string",
        "description": "Free-text search matched against account name, alias, and ticker symbol. Supports partial matches — 'Zurich' matches 'Zurich Insurance Group AG'. Pass the user's company name here exactly as stated; do not reformat or abbreviate."
      },
      "account_status": {
        "type": "string",
        "enum": ["active", "prospect", "inactive", "all"],
        "default": "active",
        "description": "Lifecycle filter. Defaults to 'active'. Set to 'all' only when the user explicitly asks to include former or inactive accounts."
      }
    },
    "required": []
  }
}

How the six principles are applied:

  1. Detailed description — The top-level description field is four sentences long and covers: what the tool does ("Search SAP CRM for business account records"), what it returns (account ID, legal name, industry, status, RM email, last activity), the scope boundary ("organisational accounts only"), what it does not return ("no interaction history or financial data"), and concrete example trigger phrases. This eliminates the need for the model to guess any of these things.

  2. Unambiguous naming — The status filter is named account_status, not status. This matters when the model is holding multiple tools in context: a generic status parameter is ambiguous; account_status is not.

  3. Active verb — The description opens with "Search SAP CRM for…" — a strong, distinct verb that immediately signals the tool's action. Contrast with a vague opener like "This tool handles CRM account management…" which forces the model to read further to understand intent.

  4. Explicit constraintsaccount_status uses an enum (active, prospect, inactive, all) rather than a free string. The model cannot hallucinate a value like "churned" or "archived" — the schema enforces valid inputs at generation time.

  5. Input examples — Not included here. With only two simple parameters, the parameter descriptions are sufficient. Examples add value for tools with nested objects, compound query syntax, or format-sensitive fields (e.g. a date range parameter or a structured filter expression). Adding examples to simple tools wastes context budget.

  6. Proximity to action — The rule "Set to 'all' only when the user explicitly asks to include former or inactive accounts" lives inside the account_status parameter description, not in the system prompt. It is injected into the model's context only when the model is deciding how to fill that specific field — exactly when the instruction is needed.


Designing Tools for Agent Limitations

Agents are powerful—but constrained. Effective tool design starts by acknowledging these limitations.

A. Limited Context Window

  1. Navigation Speed: Agents cannot efficiently scan large payloads to find what they need. E.g. a search_contacts(query="john") tool is better than a list_contacts tool returning thousands of entries

  2. High-impact Workflows: Consolidate common multi‑step tasks into a single tool call.. Combine several operations (or API calls) under the hood to enrich tool responses or execute frequent multi-step tasks in a single tool call. (e.g., schedule_event which performs user lookup, availability checks and event creation instead of separate list_users, list_events, and create_event calls). Fewer tool calls = lower latency, fewer failures, less context usage.

B. Ease of Errors (Poka‑Yoke Design)

Poka-yoke (fool-proof) your tools. Agents might call the wrong tools, call the right tools with the wrong parameters, call too few tools, i.e. omit required steps, or misinterpret tool responses. Design tools to prevent mistakes by default.

To mitigate, use

  1. Disambiguation: Create a minimal set of tools

    1. with distinct purposes, clear boundaries and no overlapping responsibilities to prevent the agent from becoming confused about which tool to use for a specific task.

    2. that reflect natural subdivisions of tasks, minimizing the information needed into the tool descriptions. Natural task subdivisions reduce description length and decision complexity.

  2. Namespacing: Namespace tools to prevent collisions and confusion. E.g. by service/domain (e.g., asana_searchjira_search) and by resource (e.g., asana_projects_searchasana_users_search). Prefix‑ vs suffix‑based naming can produce meaningfully different model behavior. Validate using tool‑use evaluations.

  3. Attention Dilution: Avoid overwhelming the model with irrelevant instructions.

    1. Fetch only what is needed

    2. Keep system prompts lean

    3. Keep tool‑specific instructions inside tool definitions

Respect the rule of Proximity to Action:

The more specific an instruction is, the closer it should live to the tool that uses it.


Tool Response Design

What tools return is as important as how they are called. Tool responses also need to be designed to be meaningful for agents, by using the following:

A. Semantic Identifiers

Prefer:

  1. Human‑readable names

  2. Natural language labels

  3. Short numeric IDs (0‑indexed)

Avoid:

  1. UUIDs

  2. Opaque hashes

Agents perform retrieval and referencing tasks far more accurately with semantic identifiers.

B. High-signal Responses

Return only what the agent needs.

  • Exclude low‑level metadata (uuid, mime_type, internal flags)

  • Provide focused summaries

Support parameters such as:

  • response_format = "concise"

  • response_format = "detailed"

This allows the agent to conserve tokens for exploratory tasks and request depth only when required.

C. Helpful Error Messages

Instead of returning opaque tracebacks or error codes, prompt-engineer error responses to provide actionable instructions on how the agent can fix the input.

Errors should be instructional, not diagnostic.

Bad:

400 Bad Request

Good:

Invalid date format. Use YYYY-MM-DD and retry the tool call.

Error messages are part of the agent’s prompt.

D. Manage Context Limits:

For potentially large responses that could use up a lots of context:

  • Implement pagination

  • Support filtering

  • Apply sensible defaults (e.g., top 20–50 rows)

  • Upload a file to the KB (if possible) and return the file id for the LLM to use. This might be useful for e.g. code execution use cases.

If truncation occurs, return guidance steering the agent to use more targeted searches, e.g. "Results truncated. Use a narrower query or increase the limit parameter."


Instruction Distribution Patterns

Manage

  • Attention Dilution: don't bury tool-specific instructions in the system prompt

  • Confusion: provide sufficient detail in the tool definition

Poor instruction placement causes:

  • Context dilution

  • Ignored rules

  • Tool confusion

Use progressive disclosure.

Key principles:

  • Type as Contract Pydantic AI Documentation (ai.pydantic.dev)

    • Use Python Docstrings for tool-specific logic and Pydantic Fields for data validation.

    • Putting instructions in the docstring ensures it is only injected into the context window when the model is considering that tool.

  • Progressive Disclosure Claude 4.x Prompt Engineering Guide

    • Avoid "The Kitchen Sink" system prompt.

    • LLMs perform better when they "discover" tools and instructions as needed.

Distribution Hierarchy

Architectural patterns for distributing instructions about tools

Instruction Layer

Responsibility

System Prompt

Global identity, safety rules, persona, orchestration

Tool Definitions / Docstrings

When and why to use each tool

Schema / Types

Hard technical constraints

System Prompt — “The Constitution”

Use only for rules that apply to every turn:

  • Agent identity

  • Global safety constraints

  • Language and tone

  • Cross‑tool workflows and orchestration for many (10+) tools, e.g. "If search_user returns multiple results, you MUST call verify_identity before proceeding with reset_password."

  • Formatting instructions

Keep the system prompt lean.

System Prompt Best Practices summarized from OpenAI System Message Best Practices:

Summarized from OpenAI System Message Best Practices

In general, this will contain the following sections, usually in this order (though the exact optimal content and order may vary by which model you are using):

  • Identity: Describe the purpose, communication style, and high-level goals of the assistant.

  • Instructions: Provide guidance to the model on how to generate the response you want. What rules should it follow? What should the model do, and what should the model never do? This section could contain many subsections as relevant for your use case, like how the model should call custom functions.

  • Examples: Provide examples of possible inputs, along with the desired output from the model.

  • Context: Give the model any additional information it might need to generate a response, like private/proprietary data outside its training data, or any other data you know will be particularly relevant. This content is usually best positioned near the end of your prompt, as you may include different context for different generation requests.

Tool Definitions — “The Manual”

The model uses the docstring to decide which tool to pick. Tool docstrings should focus on when and how to use the tool:

  • Outcome‑oriented: focus on what the user wants to achieve

  • Trigger‑focused: "Use this tool only if the user has already provided an Order ID."

  • Local in scope

Put general formatting rules here. If the output is complex, include a "Response Guide" in the Tool Result itself (that is interpreted by the system prompt).

Example: If a tool returns a list of files, the tool should return: {"files": [...], "instruction": "Present these as a bulleted list sorted by size."}

Schema / Types — “The Guardrails”

Use strict typing:

  • Enum instead of string

  • Literal for fixed values

  • Length and range validation

Types act as executable contracts.


Tool Output Formatting Guidance

Instructions for how the model should process or present data from a tool should be split based on whether the data is for the model or the user.

  • For the model: inside tool output if the model must interpret complex output, e.g.

    json
    {
      "data": [...],
      "instruction": "Only inspect the 'status' field to confirm success."
    }

  • For the user: e.g. final presentation rules that belong globally should be in the system prompt. E.g. regarding markdown, tables, decimal precision. E.g. "All financial outputs must be summarized as a Markdown table with two decimal places."


MCP (Model Context Protocol) Tool Considerations

Model Context Protocol (MCP) tools often come with generic descriptions from the server and may be opaque or poorly described. To make them "Deep Agent" ready, you may need to wrap them or supplement them.

  1. Instruction Overrides: If an MCP tool description is too vague, use a wrapper or a "Skill" file (in Pydantic AI) to provide better context. Alternatively, wrap generic tools with contextual guidance. E.g. "Use this tool to read files from the local filesystem only."

  2. Specific MCP Context: If an MCP tool (like a database or filesystem) requires external context (e.g., "The root directory is /app/data"), put this in the System Prompt as a "Knowledge Base" section.

  3. Error-as-Instruction: If an MCP tool fails, ensure the error message returned to the LLM contains instructions on how to fix it.

    • Bad Error: 400 Bad Request

    • Good Error: Invalid Date Format. Please use YYYY-MM-DD and try the tool call again.


Architectural Patterns

A. Aggregator Gateway

Problem: Connecting a client directly to 10+ distinct MCP servers creates connection overhead and auth fragmentation. 

Solution: Deploy an MCP Gateway (Aggregator).

  • Function: Acts as a single endpoint for the agent. It manages persistent connections to downstream servers (GitHub, Postgres, S3).

  • Features: Handles authentication, namespacing (renaming tools on the fly), and allowing/denying specific tools.   

B. Hierarchical Routing (Router–Worker Pattern)

Problem: The "Lost in the Middle" phenomenon occurs when an LLM is presented with too many tools (100+). 

Solution: Hierarchical routing using a Router-Worker pattern.

  • Router Agent: performs intent classification. Has no execution tools. Has access to a classifier or "handoff" tools. It analyzes intent and delegates to a specialized sub-agent.

  • Worker Agent: hold specialized toolsets

  • Semantic Router: For extreme scale, use vector embeddings to match user queries to the correct toolset (e.g., semantic-router), avoiding an LLM call entirely for the routing step.   


Performance Optimization

Programmatic Tool Calling (Code Execution)

Best for: Complex data tasks, multi-step logic. Instead of the standard LLM -> Tool -> LLM -> Tool loop (which incurs network latency for every step), enable the agent to write a Python script that executes in a sandbox.

  • Mechanism: The agent imports MCP tools as Python functions: data = github.get_issues(); summary = process(data).

  • Impact: Reduces token usage by 85-98% for data-heavy tasks and significantly lowers latency by batching operations. More deterministic execution.

Lazy Loading (Dynamic Discovery)

Best for: Large toolsets (>50 tools). Do not load all tool definitions into the system prompt at start.

  • Workflow:

    1. Agent starts with one tool: search_tools(query).

    2. User expresses intent, e.g. asks for "weather".

    3. Agent calls search_tools("weather").

    4. System retrieves and injects the weather_get tool definition into the context dynamically.

  • Impact: Reduces initial context load from ~11k tokens to ~200 tokens.   


Key Takeaways

  • Tool definitions are prompts, not documentation

  • Context budget is the most valuable resource

  • Proximity to Action prevents hallucination

  • Strict schemas outperform verbose instructions

  • Fewer, better tools beat many generic ones

Design tools the way you would design APIs for another senior engineer—only this engineer reasons in tokens instead of code.


Key References

https://www.anthropic.com/engineering/writing-tools-for-agents
https://modelcontextprotocol.info/docs/tutorials/writing-effective-tools/
https://cookbook.openai.com/examples/gpt-5/gpt-5_prompting_guide

Last updated