🔗 Model Context Protocol (MCP)

LocalAI now supports the Model Context Protocol (MCP), enabling powerful agentic capabilities by connecting AI models to external tools and services. This feature allows your LocalAI models to interact with various MCP servers, providing access to real-time data, APIs, and specialized tools.

What is MCP?

The Model Context Protocol is a standard for connecting AI models to external tools and data sources. It enables AI agents to:

  • Access real-time information from external APIs
  • Execute commands and interact with external systems
  • Use specialized tools for specific tasks
  • Maintain context across multiple tool interactions

Key Features

  • Real-time Tool Access: Connect to external MCP servers for live data
  • Multiple Server Support: Configure both remote HTTP and local stdio servers
  • Cached Connections: Efficient tool caching for better performance
  • Secure Authentication: Support for bearer token authentication
  • Multi-endpoint Support: Works with OpenAI Chat, Anthropic Messages, and Open Responses APIs
  • Selective Server Activation: Use metadata.mcp_servers to enable only specific servers per request
  • Server-side Tool Execution: Tools are executed on the server and results fed back to the model automatically
  • Agent Configuration: Customizable execution limits and retry behavior
  • MCP Prompts: Discover and expand reusable prompt templates from MCP servers
  • MCP Resources: Browse and inject resource content (files, data) from MCP servers into conversations

Configuration

MCP support is configured in your model’s YAML configuration file using the mcp section:

name: my-mcp-model
backend: llama-cpp
parameters:
  model: qwen3-4b.gguf

mcp:
  remote: |
    {
      "mcpServers": {
        "weather-api": {
          "url": "https://api.weather.com/v1",
          "token": "your-api-token"
        },
        "search-engine": {
          "url": "https://search.example.com/mcp",
          "token": "your-search-token"
        }
      }
    }

  stdio: |
    {
      "mcpServers": {
        "file-manager": {
          "command": "python",
          "args": ["-m", "mcp_file_manager"],
          "env": {
            "API_KEY": "your-key"
          }
        },
        "database-tools": {
          "command": "node",
          "args": ["database-mcp-server.js"],
          "env": {
            "DB_URL": "postgresql://localhost/mydb"
          }
        }
      }
    }

agent:
  max_iterations: 10             # Maximum MCP tool execution loop iterations

Configuration Options

Remote Servers (remote)

Configure HTTP-based MCP servers:

  • url: The MCP server endpoint URL
  • token: Bearer token for authentication (optional)

STDIO Servers (stdio)

Configure local command-based MCP servers:

  • command: The executable command to run
  • args: Array of command-line arguments
  • env: Environment variables (optional)

Agent Configuration (agent)

  • max_iterations: Maximum number of MCP tool execution loop iterations (default: 10). Each iteration allows the model to call tools and receive results before generating the next response.

Usage

Selecting MCP Servers via metadata

All API endpoints support MCP server selection through the standard metadata field. Pass a comma-separated list of server names in metadata.mcp_servers:

  • When present: Only the named MCP servers are activated for this request. Server names must match the keys in the model’s MCP config YAML (e.g., "weather-api", "search-engine").
  • When absent: Behavior depends on the endpoint:
    • OpenAI Chat Completions and Anthropic Messages: No MCP tools are injected (standard behavior).
    • Open Responses: If the model has MCP config and no user-provided tools, all MCP servers are auto-activated (backward compatible).

The mcp_servers metadata key is consumed by the MCP engine and stripped before reaching the backend. Clients that support the standard metadata field can use this without custom schema extensions.

API Endpoints

MCP tools work across all three API endpoints:

OpenAI Chat Completions (/v1/chat/completions)

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-mcp-model",
    "messages": [{"role": "user", "content": "What is the weather in New York?"}],
    "metadata": {"mcp_servers": "weather-api"},
    "stream": true
  }'

Anthropic Messages (/v1/messages)

curl http://localhost:8080/v1/messages \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-mcp-model",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "What is the weather in New York?"}],
    "metadata": {"mcp_servers": "weather-api"}
  }'

Open Responses (/v1/responses)

curl http://localhost:8080/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-mcp-model",
    "input": "What is the weather in New York?",
    "metadata": {"mcp_servers": "weather-api"}
  }'

Server Listing Endpoint

You can list available MCP servers and their tools for a given model:

curl http://localhost:8080/v1/mcp/servers/my-mcp-model

Returns:

[
  {
    "name": "weather-api",
    "type": "remote",
    "tools": ["get_weather", "get_forecast"]
  },
  {
    "name": "search-engine",
    "type": "remote",
    "tools": ["web_search", "image_search"]
  }
]

MCP Prompts

MCP servers can provide reusable prompt templates. LocalAI supports discovering and expanding prompts from MCP servers.

List Prompts

curl http://localhost:8080/v1/mcp/prompts/my-mcp-model

Returns:

[
  {
    "name": "code-review",
    "description": "Review code for best practices",
    "title": "Code Review",
    "arguments": [
      {"name": "language", "description": "Programming language", "required": true}
    ],
    "server": "dev-tools"
  }
]

Expand a Prompt

curl -X POST http://localhost:8080/v1/mcp/prompts/my-mcp-model/code-review \
  -H "Content-Type: application/json" \
  -d '{"arguments": {"language": "go"}}'

Returns:

{
  "messages": [
    {"role": "user", "content": "Please review the following Go code for best practices..."}
  ]
}

Inject Prompts via Metadata

You can inject MCP prompts into any chat request using metadata.mcp_prompt and metadata.mcp_prompt_args:

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-mcp-model",
    "messages": [{"role": "user", "content": "Review this function: func add(a, b int) int { return a + b }"}],
    "metadata": {
      "mcp_servers": "dev-tools",
      "mcp_prompt": "code-review",
      "mcp_prompt_args": "{\"language\": \"go\"}"
    }
  }'

The prompt messages are prepended to the conversation before inference.

MCP Resources

MCP servers can expose data/content (files, database records, etc.) as resources identified by URI.

List Resources

curl http://localhost:8080/v1/mcp/resources/my-mcp-model

Returns:

[
  {
    "name": "project-readme",
    "uri": "file:///README.md",
    "description": "Project documentation",
    "mimeType": "text/markdown",
    "server": "file-manager"
  }
]

Read a Resource

curl -X POST http://localhost:8080/v1/mcp/resources/my-mcp-model/read \
  -H "Content-Type: application/json" \
  -d '{"uri": "file:///README.md"}'

Returns:

{
  "uri": "file:///README.md",
  "content": "# My Project\n...",
  "mimeType": "text/markdown"
}

Inject Resources via Metadata

You can inject MCP resources into chat requests using metadata.mcp_resources (comma-separated URIs):

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-mcp-model",
    "messages": [{"role": "user", "content": "Summarize this project"}],
    "metadata": {
      "mcp_servers": "file-manager",
      "mcp_resources": "file:///README.md,file:///CHANGELOG.md"
    }
  }'

Resource contents are appended to the last user message as text blocks (following the same approach as llama.cpp’s WebUI).

Legacy Endpoint

The /mcp/v1/chat/completions endpoint is still supported for backward compatibility. It automatically enables all configured MCP servers (equivalent to not specifying mcp_servers).

curl http://localhost:8080/mcp/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-mcp-model",
    "messages": [
      {"role": "user", "content": "What is the current weather in New York?"}
    ]
  }'

Example Response

{
  "id": "chatcmpl-123",
  "created": 1699123456,
  "model": "my-mcp-model",
  "choices": [
    {
      "text": "The current weather in New York is 72°F (22°C) with partly cloudy skies."
    }
  ],
  "object": "text_completion"
}

Example Configurations

Docker-based Tools

name: docker-agent
backend: llama-cpp
parameters:
  model: qwen3-4b.gguf

mcp:
  stdio: |
    {
      "mcpServers": {
        "searxng": {
          "command": "docker",
          "args": [
            "run", "-i", "--rm",
            "quay.io/mudler/tests:duckduckgo-localai"
          ]
        }
      }
    }

agent:
  max_iterations: 10

How It Works

  1. Tool Discovery: LocalAI connects to configured MCP servers and discovers available tools
  2. Tool Injection: Discovered tools are injected into the model’s tool/function list alongside any user-provided tools
  3. Inference Loop: The model generates a response. If it calls MCP tools, LocalAI executes them server-side, appends results to the conversation, and re-runs inference
  4. Response Generation: When the model produces a final response (no more MCP tool calls), it is returned to the client

The execution loop is bounded by agent.max_iterations (default 10) to prevent infinite loops.

Session Lifecycle

MCP sessions are automatically managed by LocalAI:

  • Lazy initialization: Sessions are created the first time a model’s MCP tools are used
  • Cached per model: Sessions are reused across requests for the same model
  • Cleanup on model unload: When a model is unloaded (idle watchdog eviction, manual stop, or shutdown), all associated MCP sessions are closed and resources freed
  • Graceful shutdown: All MCP sessions are closed when LocalAI shuts down

This means you don’t need to manually manage MCP connections — they follow the model’s lifecycle automatically.

Supported MCP Servers

LocalAI is compatible with any MCP-compliant server.

Best Practices

Security

  • Use environment variables for sensitive tokens
  • Validate MCP server endpoints before deployment
  • Implement proper authentication for remote servers

Performance

  • Cache frequently used tools
  • Use appropriate timeout values for external APIs
  • Monitor resource usage for stdio servers

Error Handling

  • Implement fallback mechanisms for tool failures
  • Log tool execution for debugging
  • Handle network timeouts gracefully

With External Applications

Use MCP-enabled models in your applications:

import openai

client = openai.OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="your-api-key"
)

response = client.chat.completions.create(
    model="my-mcp-model",
    messages=[
        {"role": "user", "content": "Analyze the latest research papers on AI"}
    ],
    extra_body={"metadata": {"mcp_servers": "search-engine"}}
)

MCP and adding packages

It might be handy to install packages before starting the container to setup the environment. This is an example on how you can do that with docker-compose (installing and configuring docker)

services:
  local-ai:
    image: localai/localai:latest
    #image: localai/localai:latest-gpu-nvidia-cuda-13
    #image: localai/localai:latest-gpu-nvidia-cuda-12
    container_name: local-ai
    restart: always
    entrypoint: [ "/bin/bash" ]
    command: >
     -c "apt-get update &&
         apt-get install -y docker.io &&
         /entrypoint.sh"
    environment:
      - DEBUG=true
      - LOCALAI_WATCHDOG_IDLE=true
      - LOCALAI_WATCHDOG_BUSY=true
      - LOCALAI_WATCHDOG_IDLE_TIMEOUT=15m
      - LOCALAI_WATCHDOG_BUSY_TIMEOUT=15m
      - LOCALAI_API_KEY=my-beautiful-api-key
      - DOCKER_HOST=tcp://docker:2376
      - DOCKER_TLS_VERIFY=1
      - DOCKER_CERT_PATH=/certs/client
    ports:
      - "8080:8080"
    volumes:
      - /data/models:/models
      - /data/backends:/backends
      - certs:/certs:ro
    # uncomment for nvidia
    # deploy:
    #   resources:
    #     reservations:
    #       devices:
    #         - capabilities: [gpu]
    #           device_ids: ['7']
    # runtime: nvidia

  docker:
    image: docker:dind
    privileged: true
    container_name: docker
    volumes:
      - certs:/certs
    healthcheck:
      test: ["CMD", "docker", "info"]
      interval: 10s
      timeout: 5s
volumes:
  certs:

An example model config (to append to any existing model you have) can be:

mcp:
  stdio: |
     {
      "mcpServers": {
        "weather": {
          "command": "docker",
          "args": [
            "run", "-i", "--rm",
            "ghcr.io/mudler/mcps/weather:master"
          ]
        },
        "memory": {
          "command": "docker",
          "env": {
            "MEMORY_INDEX_PATH": "/data/memory.bleve"
          },
          "args": [
            "run", "-i", "--rm", "-v", "/host/data:/data",
            "ghcr.io/mudler/mcps/memory:master"
          ]
        },
        "ddg": {
          "command": "docker",
          "env": {
            "MAX_RESULTS": "10"
          },
          "args": [
            "run", "-i", "--rm", "-e", "MAX_RESULTS",
            "ghcr.io/mudler/mcps/duckduckgo:master"
          ]
        }
      }
     }

Client-Side MCP (Browser)

In addition to server-side MCP (where the backend connects to MCP servers), LocalAI supports client-side MCP where the browser connects directly to MCP servers. This is inspired by llama.cpp’s WebUI and works alongside server-side MCP.

How It Works

  1. Add servers in the UI: Click the “Client MCP” button in the chat header and add MCP server URLs
  2. Browser connects directly: The browser uses the MCP TypeScript SDK (StreamableHTTPClientTransport or SSEClientTransport) to connect to MCP servers
  3. Tool discovery: Connected servers’ tools are sent as tools in the chat request body
  4. Browser-side execution: When the LLM calls a client-side tool, the browser executes it against the MCP server and sends the result back in a follow-up request
  5. Agentic loop: This continues (up to 10 turns) until the LLM produces a final response

CORS Proxy

Since browsers enforce CORS restrictions, LocalAI provides a built-in proxy at /api/cors-proxy. When “Use CORS proxy” is enabled (default), requests to external MCP servers are routed through:

/api/cors-proxy?url=https://remote-mcp-server.example.com/sse

The proxy forwards the request method, headers, and body to the target URL and streams the response back with appropriate CORS headers.

MCP Apps (Interactive Tool UIs)

LocalAI supports the MCP Apps extension, which allows MCP tools to declare interactive HTML UIs. When a tool has _meta.ui.resourceUri in its definition, calling that tool renders the app’s HTML inline in the chat as a sandboxed iframe.

How it works:

  • When the LLM calls a tool with _meta.ui.resourceUri, the browser fetches the HTML resource from the MCP server and renders it in an iframe
  • The iframe is sandboxed (allow-scripts allow-forms, no allow-same-origin) for security
  • The app can call server tools, send messages, and update context via the AppBridge protocol (JSON-RPC over postMessage)
  • Tools marked as app-only (_meta.ui.visibility: "app-only") are hidden from the LLM and only callable by the app iframe
  • On page reload, apps render statically until the MCP connection is re-established

Requirements:

  • Only works with client-side MCP connections (the browser must be connected to the MCP server)
  • The MCP server must implement the Apps extension (_meta.ui.resourceUri on tools, resource serving)

Coexistence with Server-Side MCP

Both modes work simultaneously in the same chat:

  • Server-side MCP tools are configured in model YAML files and executed by the backend. The backend handles these in its own agentic loop.
  • Client-side MCP tools are configured per-user in the browser and sent as tools in the request. When the LLM calls them, the browser executes them.

If both sides have a tool with the same name, the server-side tool takes priority.

Security Considerations

  • The CORS proxy can forward requests to any HTTP/HTTPS URL. It is only available when MCP is enabled (LOCALAI_DISABLE_MCP is not set).
  • Client-side MCP server configurations are stored in the browser’s localStorage and are not shared with the server.
  • Custom headers (e.g., API keys) for MCP servers are stored in localStorage. Use with caution on shared machines.

Disabling MCP Support

You can completely disable MCP functionality in LocalAI by setting the LOCALAI_DISABLE_MCP environment variable to true, 1, or yes:

export LOCALAI_DISABLE_MCP=true

When this environment variable is set, all MCP-related features will be disabled, including:

  • MCP server connections (both remote and stdio)
  • Agent tool execution
  • The /mcp/v1/chat/completions endpoint

This is useful when you want to:

  • Run LocalAI without MCP capabilities for security reasons
  • Reduce the attack surface by disabling unnecessary features
  • Troubleshoot MCP-related issues

Example

# Disable MCP completely
LOCALAI_DISABLE_MCP=true localai run

# Or in Docker
docker run -e LOCALAI_DISABLE_MCP=true localai/localai:latest

When MCP is disabled, any model configuration with mcp sections will be ignored, and attempts to use the MCP endpoint will return an error indicating that MCP support is disabled.