Understanding MCP Through Packet Capture: The Communication Mechanism Behind AI Tool Invocation
TL;DR
Through packet capture analysis, we’ve gained a clear understanding of the complete MCP communication process: from establishing an SSE connection, three-step initialization, tool invocation operations, to the final connection termination. It’s evident that MCP has built a powerful tool invocation framework based on the simple SSE protocol, allowing AI agents to conveniently invoke external tools to complete complex tasks.
Compared to traditional API invocation methods, MCP is more flexible and can automatically adapt to different tool sets, enabling AI agents to use various service capabilities in a “plug-and-play” manner, which is the ingenious aspect of its design.
Of course, MCP is not perfect. As an emerging protocol, it’s still continuously evolving. More features and capabilities will likely be added in the future to meet increasingly complex requirements.
Background
MCP supports two standard transport implementations: standard input/output (stdio) and Server-Sent Events (referred to as SSE). stdio is based on command-line tools, commonly used for local integration through process communication; SSE is based on client-server network communication, used for cross-device networking scenarios.
Since we’re analyzing through packet capture, we need to choose an SSE transport MCP server, and then use tools for network packet analysis. Before diving into packet analysis, it’s necessary to briefly understand the SSE protocol.
SSE Protocol
The SSE protocol is a server push technology that allows clients to automatically receive updates from the server via HTTP connections, typically used for server-to-client message updates or continuous data streams (streaming information).
Essentially, the HTTP protocol cannot implement active message pushing unless the server “notifies” the client that it’s sending stream information. This prevents the client from disconnecting and allows it to continuously receive data streams through that connection.
You may be thinking of the WebSocket protocol, as both involve establishing a connection between client and server, followed by the server pushing data to the client. While seemingly similar, there are significant differences:
- SSE is a lightweight protocol based on HTTP; WebSocket is an independent protocol.
- SSE is based on HTTP request with
Accept: text/event-stream
; WebSocket leverages HTTP protocol upgrade withUpgrade: websocket
, then uses an independent protocol. - SSE is pseudo-duplex, supporting only one-way communication from server to client; client-to-server communication requires sending separate HTTP requests. WebSocket is full-duplex, supporting two-way communication.
- SSE is simple, lightweight, suitable for one-way low-frequency pushing; WebSocket is more complex, has stronger real-time capabilities, suitable for two-way high-frequency interaction.
From the comparison above, it’s not difficult to understand why MCP chose SSE as its network transport protocol.
Now that we understand the SSE protocol, we can begin our analysis.
Environment
- Packet capture tool: Proxyman, with installed CA certificate to handle HTTPS requests.
- AI application: VSCode Insiders, with Github Copilot plugin installed and Agent mode enabled.
- MCP Server: Using the Spring REST API example from the previous article.
Configuring MCP Server
Add MCP Server configuration in settings.json. To use Proxyman’s HTTP Proxy, add 127.0.0.1 nio.local
in /etc/hosts.
{
"mcp": {
"servers": {
"spring-ai-mcp-sample": {
"type": "sse",
"url": "http://nio.local:8080/sse"
}
}
}
}
After adding the configuration, we can start the MCP Client to connect to the Server.
MCP Communication
Below we’ll analyze the complete lifecycle of MCP communication through packet capture, including four phases: connection establishment, initialization, operation, and termination.
When our VSCode successfully connects to the MCP Server, we can already see multiple communications in Proxyman.
Establishing Connection
Since it’s uncertain which method the Server supports, the MCP Client sends both GET and POST requests to our configured Server address, attempting to establish a connection. The Accept header in the request is text/event-stream, indicating an attempt to establish SSE communication with the Server.
The configured Server only supports establishing SSE communication via the GET method; the POST request receives a 404 response. In the GET request response, the Server returns the following information:
- Session id: 3e19fbcd-51f4–4784–9f63–538c9a203859
- Event event: endpoint indicates the content type, which is the endpoint for subsequent one-way communication from Client to Server.
- Data data: /mcp/messages?sessionId=3e19fbcd-51f4–4784–9f63–538c9a203859, where /mcp/messages is configured by the server side with
spring.ai.mcp.server.sse-message-endpoint: /mcp/messages
.
id:3e19fbcd-51f4-4784-9f63-538c9a203859
event:endpoint
data:/mcp/messages?sessionId=3e19fbcd-51f4-4784-9f63-538c9a203859
This HTTP connection serves as the channel for subsequent server-to-client stream information pushing, which is why we see other stream information in the screenshot. At this point, the lifecycle of the MCP Client-Server connection begins:
- Initialization
- Operation
- Termination
Initialization
The initialization phase must be the first interaction between client and server, a process somewhat similar to TCP’s three-way handshake.
Client Initiates Initialization Request
After receiving the subsequent communication endpoint from the Server, the Client sends an initialize request to perform initialization, report information, and negotiate capabilities.
- protocolVersion: The protocol version
- capabilities: Functionality support: listChanged indicates support for list change notification
- clientInfo: Client information
{
"jsonrpc": "2.0",
"id": 1,
"method": "initialize",
"params": {
"protocolVersion": "2025-03-26",
"capabilities": {
"roots": {
"listChanged": true
}
},
"clientInfo": {
"name": "Visual Studio Code - Insiders",
"version": "1.100.0-insider"
}
}
}
Server Responds to Initialization Request
Similarly, the Server also returns stream information:
- Same session id
- Event type: message
- Event data
id:3e19fbcd-51f4-4784-9f63-538c9a203859
event:message
data:{"jsonrpc":"2.0","id":1,"result":{"protocolVersion":"2024-11-05","capabilities":{"logging":{},"tools":{"listChanged":true}},"serverInfo":{"name":"webmvc-mcp-server","version":"1.0.0"}}}
In the data portion of the event, the Server also provides content similar to the request (in the text below, we’ll directly show the data portion of the stream information):
{
"jsonrpc": "2.0",
"id": 1,
"result": {
"protocolVersion": "2024-11-05",
"capabilities": {
"logging": {},
"tools": {
"listChanged": true
}
},
"serverInfo": {
"name": "webmvc-mcp-server",
"version": "1.0.0"
}
}
}
Initialization Complete
After completing information exchange with the Server and successful negotiation (such as version compatibility, feature support), the Client sends a request to complete initialization.
{
"method": "notifications/initialized",
"jsonrpc": "2.0"
}
This time, the Server does not provide any response, similar to how a server doesn’t process anything after a client sends an ACK during TCP handshake.
Operation
Getting Tool List
After completing initialization, the Client sends a request to get the list of tools supported by the Server.
{
"jsonrpc": "2.0",
"id": 2,
"method": "tools/list",
"params": {}
}
The server returns the tool list through the SSE connection. Our example Server includes 4 tools. The response contains information such as tool names, input schema parameter descriptions, etc. After receiving this response, the client caches the tool list locally to avoid frequent requests. The cache content is only updated when the Server updates the list and notifies the Client.
Due to length constraints, not all list content is shown.
{
"jsonrpc": "2.0",
"id": 2,
"result": {
"tools": [
{
"name": "addUser",
"description": "Add a new user",
"inputSchema": {
"type": "object",
"properties": {
"arg0": {
"type": "object",
"properties": {
"email": {
"type": "string"
},
"name": {
"type": "string"
}
},
"required": [
"email",
"name"
],
"description": "user to add"
}
},
"required": [
"arg0"
],
"additionalProperties": false
}
},
//...
]
}
}
With the tool list in hand, we can try asking Copilot to execute tasks. In Copilot Agent mode, input the same task as last time:
First, help me check the user list to see if there is a user named Carson. If not, add a new user: Carson carson@gmail.com; then check the list again to see if the new user was added successfully. Finally, say hello to Carson.
Let’s look at the execution results first.
After I issued the task request, VSCode analyzed it and decided to execute several tools to complete the task. I’m using the GPT-4o model here, which shows no reasoning process. If I hadn’t expanded the tool execution results, all I would see is the final response.
If switched to the Claude 3.7 Sonnet model, the execution would include reasoning, making the entire process much clearer.
Execution
Going back to Proxyman to examine the captured requests:
- When VScode first requests the Copilot Server, the request content is quite long. Taking the GPT-4o model as an example, the request size is 49.7 KB with a response of 1.34 KB.
- The request includes:
- An extremely long system Prompt. Those interested can refer to the developer-compiled GitHub Copilot Agent Official Prompt
- Available tool list, including system tools provided by VSCode and tools provided by the configured MCP Server
- The response includes the tool that was decided to be invoked after analyzing the task:
{
"choices": [
{
"index": 0,
"delta": {
"content": null,
"role": "assistant",
"tool_calls": [
{
"function": {
"arguments": "",
"name": "bb7_getUsers"
},
"id": "call_nL7ToTNvrfLwUPYoqtUH8Yx3",
"index": 0,
"type": "function"
}
]
}
}
],
"created": 1745649196,
"id": "chatcmpl-BQTO863fJsOBHD4tU1LN3AEk5Uuo2",
"model": "gpt-4o-2024-11-20",
"system_fingerprint": "fp_ee1d74bde0"
}
- Based on the response content, VSCode invokes the MCP Tool.
//http://nio.local:8080/mcp/messages?sessionId=3e19fbcd-51f4-4784-9f63-538c9a203859
{
"jsonrpc": "2.0",
"id": 3,
"method": "tools/call",
"params": {
"name": "getUsers",
"arguments": {}
}
}
- The MCP Server returns the tool invocation result through the SSE connection.
{
"jsonrpc": "2.0",
"id": 3,
"result": {
"content": [
{
"type": "text",
"text": "[{\"name\":\"John\",\"email\":\"john@example.com\"},{\"name\":\"Jane\",\"email\":\"jane@example.com\"}]"
}
],
"isError": false
}
}
- VSCode then sends the invocation result to the Copilot Server for processing, and receives another tool to invoke along with the required parameters.
- This cycle repeats until the task execution is complete. In the rightmost request sent to the Copilot Server, we can see a list of all tool requests and responses invoked during this task execution. This means that each time the model is called, it carries all previously invoked tool requests and responses, which is why the request size gradually increases.
Termination
The termination operation is simple — for SSE transport type MCP interactions, it’s just disconnecting the related HTTP connection.