Local MCP Lab - Step-by-Step documentation

Step-by-step documentation for setting up and running a local MCP lab workflow in an offline environment.

Local MCP Lab – Step-by-Step Documentation

Step 1: Download and Install Ollama

Ollama is used as the local LLM runtime for this lab. It allows open-weight language models to be downloaded and run entirely on the local machine, with no cloud inference.

In this step:

  • Ollama is downloaded and installed on Windows
  • A model is selected and downloaded (e.g. qwen3:4b, qwen3:30b)
  • Model files are stored locally on disk

What this enables:
All future LLM inference in this lab runs locally through localhost:11434.

📸 Screenshot: Ollama UI showing model download in progress

Step 2: Verify Ollama Installation via PowerShell

Once Ollama is installed, it is verified using the Windows PowerShell terminal.

Commands executed:

ollama --version

ollama run qwen3:4b

This confirms:

  • Ollama is installed correctly
  • The Ollama CLI is accessible from the system PATH
  • The selected model can be executed locally

What this proves:
The LLM runtime is functioning independently of any cloud service.

📸 Screenshot: PowerShell showing Ollama version and model execution

Step 3: Project Initialization and package.json Configuration

A Node.js project is initialized to host the MCP server and client.

Key configuration details:

  • Project is set to use ES Modules ("type": "module")
  • Dependencies include:
    • @modelcontextprotocol/sdk
    • axios

This configuration allows:

  • MCP server and client to use modern import syntax
  • Compatibility with the MCP SDK’s ESM exports

Why this matters:
Incorrect module configuration (commonjs vs module) will prevent the MCP server from starting correctly.

📸 Screenshot: package.json showing "type": "module" and dependencies

Step 4: Setting Up the MCP Server

The MCP server is implemented in server.js.

The server:

  • Initializes an MCP Server instance
  • Registers available tools
  • Listens for MCP requests over stdio
  • Forwards LLM requests to Ollama via localhost:11434

The server is explicitly configured to run with the qwen3:4b model.

Console output confirms successful startup:

MCP server running on stdio (model=qwen3:4b)

What this step accomplishes:
The agent control plane is now live and ready to accept tool calls.

📸 Screenshot: PowerShell showing MCP server running on stdio

Step 5: Tool Registration (Server-Side)

Two tools are registered with the MCP server:

read_file

  • Reads a UTF-8 text file from the local filesystem
  • Returns file path and contents

ask_model

  • Sends a prompt to the local LLM via Ollama
  • Returns the model’s generated response

Why tools matter:
The LLM cannot directly access system resources.
All capabilities must be explicitly exposed via tools, enforcing strict boundaries.

📸 Screenshot: server.js showing tool definitions

Step 6: Spinning Up the MCP Client

A separate Node.js client (client.js) is created to validate the system.

The client:

  1. Spawns the MCP server as a subprocess
  2. Connects using MCP stdio transport
  3. Discovers registered tools
  4. Calls read_file
  5. Passes file contents to ask_model

Console output confirms:

Connected to MCP server

Tools: [ 'read_file', 'ask_model' ]

What this demonstrates:
End-to-end agent orchestration using MCP.

📸 Screenshot: Client successfully connected and listing tools

Step 7: Executing a Full Agent Loop

The client executes a complete agent flow:

  • Reads test.txt from disk
  • Embeds file contents into a prompt
  • Sends the prompt to the local LLM
  • Waits for inference to complete

Observed behavior:

  • Smaller models return quickly
  • Larger models (qwen3:4b, qwen3:30b) may take several minutes on CPU
  • Timeouts may occur due to hardware constraints

📸 Screenshot: Successful read_file output
📸 Screenshot: Timeout occurring during ask_model execution

Step 8: Performance and Infrastructure Observations

Key findings from this lab:

  • CPU-only inference introduces significant latency
  • Model size has a direct impact on response time
  • Timeouts are a capacity limitation, not a configuration error

This step highlights real-world AI infrastructure tradeoffs.