Local MCP Lab – Step-by-Step Documentation

Step 1: Download and Install Ollama

Ollama is used as the local LLM runtime for this lab. It allows open-weight language models to be downloaded and run entirely on the local machine, with no cloud inference.

In this step:

Ollama is downloaded and installed on Windows
A model is selected and downloaded (e.g. qwen3:4b, qwen3:30b)
Model files are stored locally on disk

What this enables:
All future LLM inference in this lab runs locally through localhost:11434.

📸 Screenshot: Ollama UI showing model download in progress

Step 2: Verify Ollama Installation via PowerShell

Once Ollama is installed, it is verified using the Windows PowerShell terminal.

Commands executed:

ollama --version

ollama run qwen3:4b

This confirms:

Ollama is installed correctly
The Ollama CLI is accessible from the system PATH
The selected model can be executed locally

What this proves:
The LLM runtime is functioning independently of any cloud service.

📸 Screenshot: PowerShell showing Ollama version and model execution

Step 3: Project Initialization and package.json Configuration

A Node.js project is initialized to host the MCP server and client.

Key configuration details:

Project is set to use ES Modules ("type": "module")
Dependencies include:
- @modelcontextprotocol/sdk
- axios

This configuration allows:

MCP server and client to use modern import syntax
Compatibility with the MCP SDK’s ESM exports

Why this matters:
Incorrect module configuration (commonjs vs module) will prevent the MCP server from starting correctly.

📸 Screenshot: package.json showing "type": "module" and dependencies

Step 4: Setting Up the MCP Server

The MCP server is implemented in server.js.

The server:

Initializes an MCP Server instance
Registers available tools
Listens for MCP requests over stdio
Forwards LLM requests to Ollama via localhost:11434

The server is explicitly configured to run with the qwen3:4b model.

Console output confirms successful startup:

MCP server running on stdio (model=qwen3:4b)

What this step accomplishes:
The agent control plane is now live and ready to accept tool calls.

📸 Screenshot: PowerShell showing MCP server running on stdio

Step 5: Tool Registration (Server-Side)

Two tools are registered with the MCP server:

read_file

Reads a UTF-8 text file from the local filesystem
Returns file path and contents

ask_model

Sends a prompt to the local LLM via Ollama
Returns the model’s generated response

Why tools matter:
The LLM cannot directly access system resources.
All capabilities must be explicitly exposed via tools, enforcing strict boundaries.

📸 Screenshot: server.js showing tool definitions

Step 6: Spinning Up the MCP Client

A separate Node.js client (client.js) is created to validate the system.

The client:

Spawns the MCP server as a subprocess
Connects using MCP stdio transport
Discovers registered tools
Calls read_file
Passes file contents to ask_model

Console output confirms:

Connected to MCP server

Tools: [ 'read_file', 'ask_model' ]

What this demonstrates:
End-to-end agent orchestration using MCP.

📸 Screenshot: Client successfully connected and listing tools

Step 7: Executing a Full Agent Loop

The client executes a complete agent flow:

Reads test.txt from disk
Embeds file contents into a prompt
Sends the prompt to the local LLM
Waits for inference to complete

Observed behavior:

Smaller models return quickly
Larger models (qwen3:4b, qwen3:30b) may take several minutes on CPU
Timeouts may occur due to hardware constraints

📸 Screenshot: Successful read_file output
📸 Screenshot: Timeout occurring during ask_model execution

Step 8: Performance and Infrastructure Observations

Key findings from this lab:

CPU-only inference introduces significant latency
Model size has a direct impact on response time
Timeouts are a capacity limitation, not a configuration error

This step highlights real-world AI infrastructure tradeoffs.

Local MCP Lab - Step-by-Step documentation