A Proof of Concept of vLLM at the Edge with MCP calling

Nicolas Massé 8b361316cc initial commit		4 weeks ago
client	initial commit	4 weeks ago
mcp-server	initial commit	4 weeks ago
.gitignore	initial commit	4 weeks ago
README.md	initial commit	4 weeks ago
requirements.txt	initial commit	4 weeks ago
run_demo.sh	initial commit	4 weeks ago
setup.sh	initial commit	4 weeks ago
start_server.sh	initial commit	4 weeks ago

README.md

AI model at the Edge with MCP support

This is a Proof of Concept of a generative AI model (LLM) at the edge, with MCP server support.

The idea is for the client to call the AI model to know the weather in Paris. The weather data is provided by an MCP server.

There are three components in this project:

the vLLM server, serving the Qwen/Qwen3-8B model.
the Python client, calling vLLM.
the MCP server serves over stdio the weather for a city (responses are hardcoded).
the MCP server has to be declared to the AI model and called by the client when the AI model needs it.

Prerequisites

Python 3.8 or higher
8GB+ RAM (for the model)
Internet connection (for first-time model download)

Step 1: Setup

./setup.sh

Wait for all dependencies to install.

Step 2: Start the vLLM Server

Open a terminal and run:

./start_server.sh

Wait for the model to download and the server to start. You'll see:

INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8000

Keep this terminal open!

Step 3: Run the Client

Open a new terminal and run:

./run_demo.sh

You should see the client:

Connect to the MCP weather server
Ask about weather in Paris
The LLM calls the get_weather tool
Returns a natural language response with the weather data

Example Output

Connected to MCP server. Available tools:
  - get_weather: Get the current weather for a specific city.

User: What's the weather like in Paris?

Assistant wants to call 1 tool(s):

  Calling tool: get_weather
  Arguments: {'city': 'Paris'}
  Result: {"city": "Paris", "temperature": 15, ...}

Authors

Claude Code
Nicolas Massé