You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
|
|
4 weeks ago | |
|---|---|---|
| client | 4 weeks ago | |
| mcp-server | 4 weeks ago | |
| .gitignore | 4 weeks ago | |
| README.md | 4 weeks ago | |
| requirements.txt | 4 weeks ago | |
| run_demo.sh | 4 weeks ago | |
| setup.sh | 4 weeks ago | |
| start_server.sh | 4 weeks ago | |
README.md
AI model at the Edge with MCP support
This is a Proof of Concept of a generative AI model (LLM) at the edge, with MCP server support.
The idea is for the client to call the AI model to know the weather in Paris. The weather data is provided by an MCP server.
There are three components in this project:
- the vLLM server, serving the Qwen/Qwen3-8B model.
- the Python client, calling vLLM.
- the MCP server serves over stdio the weather for a city (responses are hardcoded).
- the MCP server has to be declared to the AI model and called by the client when the AI model needs it.
Prerequisites
- Python 3.8 or higher
- 8GB+ RAM (for the model)
- Internet connection (for first-time model download)
Step 1: Setup
./setup.sh
Wait for all dependencies to install.
Step 2: Start the vLLM Server
Open a terminal and run:
./start_server.sh
Wait for the model to download and the server to start. You'll see:
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8000
Keep this terminal open!
Step 3: Run the Client
Open a new terminal and run:
./run_demo.sh
You should see the client:
- Connect to the MCP weather server
- Ask about weather in Paris
- The LLM calls the
get_weathertool - Returns a natural language response with the weather data
Example Output
Connected to MCP server. Available tools:
- get_weather: Get the current weather for a specific city.
User: What's the weather like in Paris?
Assistant wants to call 1 tool(s):
Calling tool: get_weather
Arguments: {'city': 'Paris'}
Result: {"city": "Paris", "temperature": 15, ...}
Authors
- Claude Code
- Nicolas Massé