Writing Effective MCP Tool Definitions
12 min read
Executive summary
Well-designed tools are the single highest-leverage factor in building reliable agents.
Tools must be ergonomic for agents—intuitive, context-efficient, and resilient to error.
Poor tool design leads directly to low accuracy, hallucination, latency, and brittle workflows.
Currently (Feb 2026) context budget is the most valuable resource, in other words, a key limitation and must be managed thoughtfully. Tools with good ergonomics for agents are
Intuitive – obvious when and why to use them
Context‑efficient – minimal tokens, high signal
Outcome‑oriented – reflect user goals, not raw APIs
Fool‑proofed – resilient to partial, incorrect, or ambiguous calls
Tools should not simply wrap existing API endpoints. They should be designed interfaces for reasoning systems.
Tool Definition Standards
The quality of the tool definition is the single highest-leverage factor in agent reliability. Treat tool definitions as strictly typed contracts.
Naming & Descriptions
Use active verbs
Apply domain namespacing (
github_get_user)Explicitly state negative capabilities to prevent hallucinated capabilities, e.g. "Searches internal documentation only. Does not access the public internet."
Schema & Payloads
Enforce strict typing: Use
enumfor parameters with a finite set of valid values (e.g., status:["open", "closed"]) instead ofstring. This constrains the model's output generation to valid inputs only.Large payloads increase latency and destroy context efficiency. Tools must not return massive JSON blobs. Implement default truncation (e.g., top 50 rows) and require pagination. Large return payloads invoke a "re-tokenization penalty" that kills latency.
Tool Description "Prompt Engineering"
This is the single most important factor in tool performance. Tool descriptions act as decision‑making prompts. Use
Detailed descriptions: At least 3-4 sentences per tool. Explicitly define specialized query formats, niche terminology, and relationships between resources. Descriptions should explain in detail
what the tool does (and does not do to reduce hallucinations)
when it should (and shouldn't) be used,
what each parameter means and how it affects the tool's behavior, and
any important caveats or limitations, such as what information the tool does not return if the tool name is unclear.
Unambiguous Naming: Use precise names for input parameters. For example, use
user_idinstead of justuserto ensure the agent understands exactly what data is required. Clear naming reduces invalid tool calls.Active Verbs: Begin descriptions with strong, distinct verbs (e.g.,
fetch_,update_,archive_) rather than vague terms likehandle_ormanage_.Explicit Constraints: Clearly describe and enforce data models for inputs and outputs to reduce hallucinations and improper tool calls, in particular allowed values, required formats, field relationships. Strict schemas dramatically reduce hallucinated inputs.
Input Examples (if available): Clear descriptions are most important, but for tools with complex inputs, nested objects, or format-sensitive parameters, provide schema‑validated examples using
input_examples(if supported).Proximity to Action: Instructions only relevant to a tool should be in the tool's description to ensure it is only injected into the context window when the model is considering that tool. In particular, instructions that apply only to one tool must live in the tool description and the docstring, but not in the system prompt. This ensures they are injected only when relevant.
Worked Example: SAP CRM Account Search
The following JSON schema definition for an MCP tool illustrates all six principles above. The tool searches SAP CRM for business accounts — a common scenario where multiple search tools coexist (e.g. for contacts, opportunities, employees).
{
"name": "sap_crm_search_accounts",
"description": "Search SAP CRM for business account records — companies, clients, and prospects. Returns one or more accounts matching the query, each with: CRM account ID, legal name, industry classification, lifecycle status (active / prospect / inactive), assigned relationship manager email, and last activity date. Scope is limited to organisational accounts only; individual contacts, deals, and opportunities are separate record types not returned here. Use for requests like 'find Allianz in CRM', 'which accounts does Anna Müller own?', or 'show me all active insurance clients'. Does not return full interaction history or financial data — use the returned account_id for subsequent detail calls.",
"inputSchema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Free-text search matched against account name, alias, and ticker symbol. Supports partial matches — 'Zurich' matches 'Zurich Insurance Group AG'. Pass the user's company name here exactly as stated; do not reformat or abbreviate."
},
"account_status": {
"type": "string",
"enum": ["active", "prospect", "inactive", "all"],
"default": "active",
"description": "Lifecycle filter. Defaults to 'active'. Set to 'all' only when the user explicitly asks to include former or inactive accounts."
}
},
"required": []
}
}How the six principles are applied:
Detailed description — The top-level
descriptionfield is four sentences long and covers: what the tool does ("Search SAP CRM for business account records"), what it returns (account ID, legal name, industry, status, RM email, last activity), the scope boundary ("organisational accounts only"), what it does not return ("no interaction history or financial data"), and concrete example trigger phrases. This eliminates the need for the model to guess any of these things.Unambiguous naming — The status filter is named
account_status, notstatus. This matters when the model is holding multiple tools in context: a genericstatusparameter is ambiguous;account_statusis not.Active verb — The description opens with "Search SAP CRM for…" — a strong, distinct verb that immediately signals the tool's action. Contrast with a vague opener like "This tool handles CRM account management…" which forces the model to read further to understand intent.
Explicit constraints —
account_statususes anenum(active,prospect,inactive,all) rather than a freestring. The model cannot hallucinate a value like"churned"or"archived"— the schema enforces valid inputs at generation time.Input examples — Not included here. With only two simple parameters, the parameter descriptions are sufficient. Examples add value for tools with nested objects, compound query syntax, or format-sensitive fields (e.g. a date range parameter or a structured filter expression). Adding examples to simple tools wastes context budget.
Proximity to action — The rule "Set to 'all' only when the user explicitly asks to include former or inactive accounts" lives inside the
account_statusparameter description, not in the system prompt. It is injected into the model's context only when the model is deciding how to fill that specific field — exactly when the instruction is needed.
Designing Tools for Agent Limitations
Agents are powerful—but constrained. Effective tool design starts by acknowledging these limitations.
A. Limited Context Window
Navigation Speed: Agents cannot efficiently scan large payloads to find what they need. E.g. a
search_contacts(query="john")tool is better than alist_contactstool returning thousands of entriesHigh-impact Workflows: Consolidate common multi‑step tasks into a single tool call.. Combine several operations (or API calls) under the hood to enrich tool responses or execute frequent multi-step tasks in a single tool call. (e.g.,
schedule_eventwhich performs user lookup, availability checks and event creation instead of separatelist_users,list_events, andcreate_eventcalls). Fewer tool calls = lower latency, fewer failures, less context usage.
B. Ease of Errors (Poka‑Yoke Design)
Poka-yoke (fool-proof) your tools. Agents might call the wrong tools, call the right tools with the wrong parameters, call too few tools, i.e. omit required steps, or misinterpret tool responses. Design tools to prevent mistakes by default.
To mitigate, use
Disambiguation: Create a minimal set of tools
with distinct purposes, clear boundaries and no overlapping responsibilities to prevent the agent from becoming confused about which tool to use for a specific task.
that reflect natural subdivisions of tasks, minimizing the information needed into the tool descriptions. Natural task subdivisions reduce description length and decision complexity.
Namespacing: Namespace tools to prevent collisions and confusion. E.g. by service/domain (e.g.,
asana_search,jira_search) and by resource (e.g.,asana_projects_search,asana_users_search). Prefix‑ vs suffix‑based naming can produce meaningfully different model behavior. Validate using tool‑use evaluations.Attention Dilution: Avoid overwhelming the model with irrelevant instructions.
Fetch only what is needed
Keep system prompts lean
Keep tool‑specific instructions inside tool definitions
Respect the rule of Proximity to Action:
The more specific an instruction is, the closer it should live to the tool that uses it.
Tool Response Design
What tools return is as important as how they are called. Tool responses also need to be designed to be meaningful for agents, by using the following:
A. Semantic Identifiers
Prefer:
Human‑readable names
Natural language labels
Short numeric IDs (0‑indexed)
Avoid:
UUIDs
Opaque hashes
Agents perform retrieval and referencing tasks far more accurately with semantic identifiers.
B. High-signal Responses
Return only what the agent needs.
Exclude low‑level metadata (
uuid,mime_type, internal flags)Provide focused summaries
Support parameters such as:
response_format = "concise"response_format = "detailed"
This allows the agent to conserve tokens for exploratory tasks and request depth only when required.
C. Helpful Error Messages
Instead of returning opaque tracebacks or error codes, prompt-engineer error responses to provide actionable instructions on how the agent can fix the input.
Errors should be instructional, not diagnostic.
Bad:
400 Bad Request
Good:
Invalid date format. Use YYYY-MM-DD and retry the tool call.
Error messages are part of the agent’s prompt.
D. Manage Context Limits:
For potentially large responses that could use up a lots of context:
Implement pagination
Support filtering
Apply sensible defaults (e.g., top 20–50 rows)
Upload a file to the KB (if possible) and return the file id for the LLM to use. This might be useful for e.g. code execution use cases.
If truncation occurs, return guidance steering the agent to use more targeted searches, e.g. "Results truncated. Use a narrower query or increase the limit parameter."
Instruction Distribution Patterns
Manage
Attention Dilution: don't bury tool-specific instructions in the system prompt
Confusion: provide sufficient detail in the tool definition
Poor instruction placement causes:
Context dilution
Ignored rules
Tool confusion
Use progressive disclosure.
Key principles:
Type as Contract Pydantic AI Documentation (ai.pydantic.dev)
Use Python Docstrings for tool-specific logic and Pydantic Fields for data validation.
Putting instructions in the docstring ensures it is only injected into the context window when the model is considering that tool.
Progressive Disclosure Claude 4.x Prompt Engineering Guide
Avoid "The Kitchen Sink" system prompt.
LLMs perform better when they "discover" tools and instructions as needed.
Distribution Hierarchy
Architectural patterns for distributing instructions about tools
Instruction Layer | Responsibility |
|---|---|
System Prompt | Global identity, safety rules, persona, orchestration |
Tool Definitions / Docstrings | When and why to use each tool |
Schema / Types | Hard technical constraints |
System Prompt — “The Constitution”
Use only for rules that apply to every turn:
Agent identity
Global safety constraints
Language and tone
Cross‑tool workflows and orchestration for many (10+) tools, e.g. "If
search_userreturns multiple results, you MUST callverify_identitybefore proceeding withreset_password."Formatting instructions
Keep the system prompt lean.
Tool Definitions — “The Manual”
The model uses the docstring to decide which tool to pick. Tool docstrings should focus on when and how to use the tool:
Outcome‑oriented: focus on what the user wants to achieve
Trigger‑focused: "Use this tool only if the user has already provided an Order ID."
Local in scope
Put general formatting rules here. If the output is complex, include a "Response Guide" in the Tool Result itself (that is interpreted by the system prompt).
Example: If a tool returns a list of files, the tool should return: {"files": [...], "instruction": "Present these as a bulleted list sorted by size."}
Schema / Types — “The Guardrails”
Use strict typing:
Enuminstead ofstringLiteralfor fixed valuesLength and range validation
Types act as executable contracts.
Tool Output Formatting Guidance
Instructions for how the model should process or present data from a tool should be split based on whether the data is for the model or the user.
For the model: inside tool output if the model must interpret complex output, e.g.
json{ "data": [...], "instruction": "Only inspect the 'status' field to confirm success." }For the user: e.g. final presentation rules that belong globally should be in the system prompt. E.g. regarding markdown, tables, decimal precision. E.g. "All financial outputs must be summarized as a Markdown table with two decimal places."
MCP (Model Context Protocol) Tool Considerations
Model Context Protocol (MCP) tools often come with generic descriptions from the server and may be opaque or poorly described. To make them "Deep Agent" ready, you may need to wrap them or supplement them.
Instruction Overrides: If an MCP tool description is too vague, use a wrapper or a "Skill" file (in Pydantic AI) to provide better context. Alternatively, wrap generic tools with contextual guidance. E.g. "Use this tool to read files from the local filesystem only."
Specific MCP Context: If an MCP tool (like a database or filesystem) requires external context (e.g., "The root directory is
/app/data"), put this in the System Prompt as a "Knowledge Base" section.Error-as-Instruction: If an MCP tool fails, ensure the error message returned to the LLM contains instructions on how to fix it.
Bad Error:
400 Bad RequestGood Error:
Invalid Date Format. Please use YYYY-MM-DD and try the tool call again.
Architectural Patterns
A. Aggregator Gateway
Problem: Connecting a client directly to 10+ distinct MCP servers creates connection overhead and auth fragmentation.
Solution: Deploy an MCP Gateway (Aggregator).
Function: Acts as a single endpoint for the agent. It manages persistent connections to downstream servers (GitHub, Postgres, S3).
Features: Handles authentication, namespacing (renaming tools on the fly), and allowing/denying specific tools.
B. Hierarchical Routing (Router–Worker Pattern)
Problem: The "Lost in the Middle" phenomenon occurs when an LLM is presented with too many tools (100+).
Solution: Hierarchical routing using a Router-Worker pattern.
Router Agent: performs intent classification. Has no execution tools. Has access to a classifier or "handoff" tools. It analyzes intent and delegates to a specialized sub-agent.
Worker Agent: hold specialized toolsets
Semantic Router: For extreme scale, use vector embeddings to match user queries to the correct toolset (e.g.,
semantic-router), avoiding an LLM call entirely for the routing step.
Performance Optimization
Programmatic Tool Calling (Code Execution)
Best for: Complex data tasks, multi-step logic. Instead of the standard LLM -> Tool -> LLM -> Tool loop (which incurs network latency for every step), enable the agent to write a Python script that executes in a sandbox.
Mechanism: The agent imports MCP tools as Python functions:
data = github.get_issues(); summary = process(data).Impact: Reduces token usage by 85-98% for data-heavy tasks and significantly lowers latency by batching operations. More deterministic execution.
Lazy Loading (Dynamic Discovery)
Best for: Large toolsets (>50 tools). Do not load all tool definitions into the system prompt at start.
Workflow:
Agent starts with one tool:
search_tools(query).User expresses intent, e.g. asks for "weather".
Agent calls
search_tools("weather").System retrieves and injects the
weather_gettool definition into the context dynamically.
Impact: Reduces initial context load from ~11k tokens to ~200 tokens.
Key Takeaways
Tool definitions are prompts, not documentation
Context budget is the most valuable resource
Proximity to Action prevents hallucination
Strict schemas outperform verbose instructions
Fewer, better tools beat many generic ones
Design tools the way you would design APIs for another senior engineer—only this engineer reasons in tokens instead of code.
Key References
https://www.anthropic.com/engineering/writing-tools-for-agents
https://modelcontextprotocol.info/docs/tutorials/writing-effective-tools/
https://cookbook.openai.com/examples/gpt-5/gpt-5_prompting_guide