Local MCP Lab – Step-by-Step Documentation
Step 1: Download and Install Ollama
Ollama is used as the local LLM runtime for this lab. It allows open-weight language models to be downloaded and run entirely on the local machine, with no cloud inference.
In this step:
- Ollama is downloaded and installed on Windows
- A model is selected and downloaded (e.g. qwen3:4b, qwen3:30b)
- Model files are stored locally on disk
What this enables:
All future LLM inference in this lab runs locally through localhost:11434.
📸 Screenshot: Ollama UI showing model download in progress
Step 2: Verify Ollama Installation via PowerShell
Once Ollama is installed, it is verified using the Windows PowerShell terminal.
Commands executed:
ollama --version
ollama run qwen3:4b
This confirms:
- Ollama is installed correctly
- The Ollama CLI is accessible from the system PATH
- The selected model can be executed locally
What this proves:
The LLM runtime is functioning independently of any cloud service.
📸 Screenshot: PowerShell showing Ollama version and model execution
Step 3: Project Initialization and package.json Configuration
A Node.js project is initialized to host the MCP server and client.
Key configuration details:
- Project is set to use ES Modules ("type": "module")
- Dependencies include:
- @modelcontextprotocol/sdk
- axios
- @modelcontextprotocol/sdk
This configuration allows:
- MCP server and client to use modern import syntax
- Compatibility with the MCP SDK’s ESM exports
Why this matters:
Incorrect module configuration (commonjs vs module) will prevent the MCP server from starting correctly.
📸 Screenshot: package.json showing "type": "module" and dependencies
Step 4: Setting Up the MCP Server
The MCP server is implemented in server.js.
The server:
- Initializes an MCP Server instance
- Registers available tools
- Listens for MCP requests over stdio
- Forwards LLM requests to Ollama via localhost:11434
The server is explicitly configured to run with the qwen3:4b model.
Console output confirms successful startup:
MCP server running on stdio (model=qwen3:4b)
What this step accomplishes:
The agent control plane is now live and ready to accept tool calls.
📸 Screenshot: PowerShell showing MCP server running on stdio
Step 5: Tool Registration (Server-Side)
Two tools are registered with the MCP server:
read_file
- Reads a UTF-8 text file from the local filesystem
- Returns file path and contents
ask_model
- Sends a prompt to the local LLM via Ollama
- Returns the model’s generated response
Why tools matter:
The LLM cannot directly access system resources.
All capabilities must be explicitly exposed via tools, enforcing strict boundaries.
📸 Screenshot: server.js showing tool definitions
Step 6: Spinning Up the MCP Client
A separate Node.js client (client.js) is created to validate the system.
The client:
- Spawns the MCP server as a subprocess
- Connects using MCP stdio transport
- Discovers registered tools
- Calls read_file
- Passes file contents to ask_model
Console output confirms:
Connected to MCP server
Tools: [ 'read_file', 'ask_model' ]
What this demonstrates:
End-to-end agent orchestration using MCP.
📸 Screenshot: Client successfully connected and listing tools
Step 7: Executing a Full Agent Loop
The client executes a complete agent flow:
- Reads test.txt from disk
- Embeds file contents into a prompt
- Sends the prompt to the local LLM
- Waits for inference to complete
Observed behavior:
- Smaller models return quickly
- Larger models (qwen3:4b, qwen3:30b) may take several minutes on CPU
- Timeouts may occur due to hardware constraints
📸 Screenshot: Successful read_file output
📸 Screenshot: Timeout occurring during ask_model execution
Step 8: Performance and Infrastructure Observations
Key findings from this lab:
- CPU-only inference introduces significant latency
- Model size has a direct impact on response time
- Timeouts are a capacity limitation, not a configuration error
This step highlights real-world AI infrastructure tradeoffs.