A Proof of Concept of vLLM at the Edge with MCP calling
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
Nicolas Massé 8b361316cc initial commit 4 weeks ago
client initial commit 4 weeks ago
mcp-server initial commit 4 weeks ago
.gitignore initial commit 4 weeks ago
README.md initial commit 4 weeks ago
requirements.txt initial commit 4 weeks ago
run_demo.sh initial commit 4 weeks ago
setup.sh initial commit 4 weeks ago
start_server.sh initial commit 4 weeks ago

README.md

AI model at the Edge with MCP support

This is a Proof of Concept of a generative AI model (LLM) at the edge, with MCP server support.

The idea is for the client to call the AI model to know the weather in Paris. The weather data is provided by an MCP server.

There are three components in this project:

  • the vLLM server, serving the Qwen/Qwen3-8B model.
  • the Python client, calling vLLM.
  • the MCP server serves over stdio the weather for a city (responses are hardcoded).
  • the MCP server has to be declared to the AI model and called by the client when the AI model needs it.

Prerequisites

  • Python 3.8 or higher
  • 8GB+ RAM (for the model)
  • Internet connection (for first-time model download)

Step 1: Setup

./setup.sh

Wait for all dependencies to install.

Step 2: Start the vLLM Server

Open a terminal and run:

./start_server.sh

Wait for the model to download and the server to start. You'll see:

INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8000

Keep this terminal open!

Step 3: Run the Client

Open a new terminal and run:

./run_demo.sh

You should see the client:

  1. Connect to the MCP weather server
  2. Ask about weather in Paris
  3. The LLM calls the get_weather tool
  4. Returns a natural language response with the weather data

Example Output

Connected to MCP server. Available tools:
  - get_weather: Get the current weather for a specific city.

User: What's the weather like in Paris?

Assistant wants to call 1 tool(s):

  Calling tool: get_weather
  Arguments: {'city': 'Paris'}
  Result: {"city": "Paris", "temperature": 15, ...}

Authors

  • Claude Code
  • Nicolas Massé