# AI model at the Edge with MCP support This is a Proof of Concept of a generative AI model (LLM) at the edge, with MCP server support. The idea is for the client to call the AI model to know the weather in Paris. The weather data is provided by an MCP server. There are three components in this project: - the vLLM server, serving the Qwen/Qwen3-8B model. - the Python client, calling vLLM. - the MCP server serves over stdio the weather for a city (responses are hardcoded). - the MCP server has to be declared to the AI model and called by the client when the AI model needs it. ## Prerequisites - Python 3.8 or higher - 8GB+ RAM (for the model) - Internet connection (for first-time model download) ## Step 1: Setup ```bash ./setup.sh ``` Wait for all dependencies to install. ## Step 2: Start the vLLM Server Open a terminal and run: ```bash ./start_server.sh ``` Wait for the model to download and the server to start. You'll see: ``` INFO: Application startup complete. INFO: Uvicorn running on http://127.0.0.1:8000 ``` **Keep this terminal open!** ## Step 3: Run the Client Open a **new terminal** and run: ```bash ./run_demo.sh ``` You should see the client: 1. Connect to the MCP weather server 2. Ask about weather in Paris 3. The LLM calls the `get_weather` tool 4. Returns a natural language response with the weather data ## Example Output ``` Connected to MCP server. Available tools: - get_weather: Get the current weather for a specific city. User: What's the weather like in Paris? Assistant wants to call 1 tool(s): Calling tool: get_weather Arguments: {'city': 'Paris'} Result: {"city": "Paris", "temperature": 15, ...} ``` ## Authors - Claude Code - Nicolas Massé