🔗 Model Context Protocol (MCP)
LocalAI now supports the Model Context Protocol (MCP), enabling powerful agentic capabilities by connecting AI models to external tools and services. This feature allows your LocalAI models to interact with various MCP servers, providing access to real-time data, APIs, and specialized tools.
What is MCP?
The Model Context Protocol is a standard for connecting AI models to external tools and data sources. It enables AI agents to:
- Access real-time information from external APIs
- Execute commands and interact with external systems
- Use specialized tools for specific tasks
- Maintain context across multiple tool interactions
Key Features
- Real-time Tool Access: Connect to external MCP servers for live data
- Multiple Server Support: Configure both remote HTTP and local stdio servers
- Cached Connections: Efficient tool caching for better performance
- Secure Authentication: Support for bearer token authentication
- Multi-endpoint Support: Works with OpenAI Chat, Anthropic Messages, and Open Responses APIs
- Selective Server Activation: Use
metadata.mcp_serversto enable only specific servers per request - Server-side Tool Execution: Tools are executed on the server and results fed back to the model automatically
- Agent Configuration: Customizable execution limits and retry behavior
- MCP Prompts: Discover and expand reusable prompt templates from MCP servers
- MCP Resources: Browse and inject resource content (files, data) from MCP servers into conversations
Configuration
MCP support is configured in your model’s YAML configuration file using the mcp section:
Configuration Options
Remote Servers (remote)
Configure HTTP-based MCP servers:
url: The MCP server endpoint URLtoken: Bearer token for authentication (optional)
STDIO Servers (stdio)
Configure local command-based MCP servers:
command: The executable command to runargs: Array of command-line argumentsenv: Environment variables (optional)
Agent Configuration (agent)
max_iterations: Maximum number of MCP tool execution loop iterations (default: 10). Each iteration allows the model to call tools and receive results before generating the next response.
Usage
Selecting MCP Servers via metadata
All API endpoints support MCP server selection through the standard metadata field. Pass a comma-separated list of server names in metadata.mcp_servers:
- When present: Only the named MCP servers are activated for this request. Server names must match the keys in the model’s MCP config YAML (e.g.,
"weather-api","search-engine"). - When absent: Behavior depends on the endpoint:
- OpenAI Chat Completions and Anthropic Messages: No MCP tools are injected (standard behavior).
- Open Responses: If the model has MCP config and no user-provided tools, all MCP servers are auto-activated (backward compatible).
The mcp_servers metadata key is consumed by the MCP engine and stripped before reaching the backend. Clients that support the standard metadata field can use this without custom schema extensions.
API Endpoints
MCP tools work across all three API endpoints:
OpenAI Chat Completions (/v1/chat/completions)
Anthropic Messages (/v1/messages)
Open Responses (/v1/responses)
Server Listing Endpoint
You can list available MCP servers and their tools for a given model:
Returns:
MCP Prompts
MCP servers can provide reusable prompt templates. LocalAI supports discovering and expanding prompts from MCP servers.
List Prompts
Returns:
Expand a Prompt
Returns:
Inject Prompts via Metadata
You can inject MCP prompts into any chat request using metadata.mcp_prompt and metadata.mcp_prompt_args:
The prompt messages are prepended to the conversation before inference.
MCP Resources
MCP servers can expose data/content (files, database records, etc.) as resources identified by URI.
List Resources
Returns:
Read a Resource
Returns:
Inject Resources via Metadata
You can inject MCP resources into chat requests using metadata.mcp_resources (comma-separated URIs):
Resource contents are appended to the last user message as text blocks (following the same approach as llama.cpp’s WebUI).
Legacy Endpoint
The /mcp/v1/chat/completions endpoint is still supported for backward compatibility. It automatically enables all configured MCP servers (equivalent to not specifying mcp_servers).
Example Response
Example Configurations
Docker-based Tools
How It Works
- Tool Discovery: LocalAI connects to configured MCP servers and discovers available tools
- Tool Injection: Discovered tools are injected into the model’s tool/function list alongside any user-provided tools
- Inference Loop: The model generates a response. If it calls MCP tools, LocalAI executes them server-side, appends results to the conversation, and re-runs inference
- Response Generation: When the model produces a final response (no more MCP tool calls), it is returned to the client
The execution loop is bounded by agent.max_iterations (default 10) to prevent infinite loops.
Session Lifecycle
MCP sessions are automatically managed by LocalAI:
- Lazy initialization: Sessions are created the first time a model’s MCP tools are used
- Cached per model: Sessions are reused across requests for the same model
- Cleanup on model unload: When a model is unloaded (idle watchdog eviction, manual stop, or shutdown), all associated MCP sessions are closed and resources freed
- Graceful shutdown: All MCP sessions are closed when LocalAI shuts down
This means you don’t need to manually manage MCP connections — they follow the model’s lifecycle automatically.
Supported MCP Servers
LocalAI is compatible with any MCP-compliant server.
Best Practices
Security
- Use environment variables for sensitive tokens
- Validate MCP server endpoints before deployment
- Implement proper authentication for remote servers
Performance
- Cache frequently used tools
- Use appropriate timeout values for external APIs
- Monitor resource usage for stdio servers
Error Handling
- Implement fallback mechanisms for tool failures
- Log tool execution for debugging
- Handle network timeouts gracefully
With External Applications
Use MCP-enabled models in your applications:
MCP and adding packages
It might be handy to install packages before starting the container to setup the environment. This is an example on how you can do that with docker-compose (installing and configuring docker)
An example model config (to append to any existing model you have) can be:
Links
Client-Side MCP (Browser)
In addition to server-side MCP (where the backend connects to MCP servers), LocalAI supports client-side MCP where the browser connects directly to MCP servers. This is inspired by llama.cpp’s WebUI and works alongside server-side MCP.
How It Works
- Add servers in the UI: Click the “Client MCP” button in the chat header and add MCP server URLs
- Browser connects directly: The browser uses the MCP TypeScript SDK (
StreamableHTTPClientTransportorSSEClientTransport) to connect to MCP servers - Tool discovery: Connected servers’ tools are sent as
toolsin the chat request body - Browser-side execution: When the LLM calls a client-side tool, the browser executes it against the MCP server and sends the result back in a follow-up request
- Agentic loop: This continues (up to 10 turns) until the LLM produces a final response
CORS Proxy
Since browsers enforce CORS restrictions, LocalAI provides a built-in proxy at /api/cors-proxy. When “Use CORS proxy” is enabled (default), requests to external MCP servers are routed through:
The proxy forwards the request method, headers, and body to the target URL and streams the response back with appropriate CORS headers.
MCP Apps (Interactive Tool UIs)
LocalAI supports the MCP Apps extension, which allows MCP tools to declare interactive HTML UIs. When a tool has _meta.ui.resourceUri in its definition, calling that tool renders the app’s HTML inline in the chat as a sandboxed iframe.
How it works:
- When the LLM calls a tool with
_meta.ui.resourceUri, the browser fetches the HTML resource from the MCP server and renders it in an iframe - The iframe is sandboxed (
allow-scripts allow-forms, noallow-same-origin) for security - The app can call server tools, send messages, and update context via the
AppBridgeprotocol (JSON-RPC overpostMessage) - Tools marked as app-only (
_meta.ui.visibility: "app-only") are hidden from the LLM and only callable by the app iframe - On page reload, apps render statically until the MCP connection is re-established
Requirements:
- Only works with client-side MCP connections (the browser must be connected to the MCP server)
- The MCP server must implement the Apps extension (
_meta.ui.resourceUrion tools, resource serving)
Coexistence with Server-Side MCP
Both modes work simultaneously in the same chat:
- Server-side MCP tools are configured in model YAML files and executed by the backend. The backend handles these in its own agentic loop.
- Client-side MCP tools are configured per-user in the browser and sent as
toolsin the request. When the LLM calls them, the browser executes them.
If both sides have a tool with the same name, the server-side tool takes priority.
Security Considerations
- The CORS proxy can forward requests to any HTTP/HTTPS URL. It is only available when MCP is enabled (
LOCALAI_DISABLE_MCPis not set). - Client-side MCP server configurations are stored in the browser’s localStorage and are not shared with the server.
- Custom headers (e.g., API keys) for MCP servers are stored in localStorage. Use with caution on shared machines.
Disabling MCP Support
You can completely disable MCP functionality in LocalAI by setting the LOCALAI_DISABLE_MCP environment variable to true, 1, or yes:
When this environment variable is set, all MCP-related features will be disabled, including:
- MCP server connections (both remote and stdio)
- Agent tool execution
- The
/mcp/v1/chat/completionsendpoint
This is useful when you want to:
- Run LocalAI without MCP capabilities for security reasons
- Reduce the attack surface by disabling unnecessary features
- Troubleshoot MCP-related issues
Example
When MCP is disabled, any model configuration with mcp sections will be ignored, and attempts to use the MCP endpoint will return an error indicating that MCP support is disabled.