Comunica MCP SPARQL: Improving the Accuracy of AI Agents using SPARQL-based Access to Decentralized Knowledge Graphs

Ruben Taelman; Jitse De Smet

Introduction

The paradigm of neuro-symbolic artificial intelligence [1] aims to integrate symbolic reasoning with machine learning, to improve overall accuracy, trustworthiness, and explainability. Graph-based retrieval augmented generation (GraphRAG) [2] offers one approach to achieve neuro-symbolic AI by combining vector embedding-based semantic retrieval with explicit graph structures such as Knowledge Graphs for guiding answers from Large Language Models (LLMs) through vector search. GraphRAG techniques involve embedding generation for specific Knowledge Graphs. This means that LLMs can not directly incorporate arbitrary Knowledge Graphs without preprocessing them first. The zero-shot approach for SPARQL-based question answering introduced in GRASP [3] offers one solution of this problem, which offers functions that can be called by LLM agents to interact with a Knowledge Graph.

Recently, the Model Context Protocol (MCP) [4] was introduced that allows LLM agents to connect to external systems. Concretely, it allows MCP servers to offer specific tools that can produce a certain output based on various parameters. These tools are described externally in a human-readable format to the LLM, and internally connects to specific systems such as the file system, code editors, or databases. One main advantage of MCP is that it runs over live systems, and does not require embedding-like precomputions. As such, it offers an excellent opportunity for connecting arbitrary Knowledge Graphs on the fly. Within this demonstration, we introduce Comunica MCP SPARQL that provides this connection point, and show how it works.

Implementation

Comunica MCP SPARQL is a TypeScript/JavaScript project that exposes all functionalities of the Comunica SPARQL querying framework [5] through MCP. It is open-source, available on GitHub, npm, and is available under the MIT license. Comunica is a modular SPARQL framwork that enables SPARQL querying over decentralized Knowledge Graphs. This includes Knowledge Graphs exposed as SPARQL endpoints [6], Linked Data documents [7] in any RDF representation, TPF interfaces [8], HDT files [9], Solid pods [10], and more. In contrast to GRASP, Comunica MCP SPARQL is exposed through the more flexible MCP instead of using direct function calling. MCP is very simple to setup, as it only requires a minimal configuration change within LLM agents, as can be seen in Listing 1. Furthermore, besides allowing the LLM agent to execute any possible SPARQL 1.2 query, Comunica MCP SPARQL allows agents to provide one or more Knowledge Graph URLs to query over. And if the AI agent does not know what URLs to start from, link traversal [11, 12] can be used to discover relevant sources by following links.

{
  "mcpServers": {
    "comunica-sparql": {
      "command": "npx",
      "args": [ "-y", "@comunica/mcp-sparql", "--mode", "stdio" ]
    }
  }
}

Listing 1: An example of the only configuration change needed to configure Comunica MCP SPARQL within tools such as Claude or Copilot.

Comunica MCP SPARQL is not the first MCP approach for SPARQL, but it has some notable differences to existing work. Some SPARQL engine vendors have their own dedicated MCP servers, such as Jena, GraphDB, and Stardog. These MCP servers are tied to these systems, while Comunica MCP SPARQL works with any Knowledge Graph. RDF Explorer and SPARQL-MCP [13] are not tied to specific Knowledge Graphs, but in contrast to Comunica MCP SPARQL, they only handle Knowledge Graphs exposed through SPARQL endpoints. Among these two, SPARQL-MCP is the only one that also supports federation, but in contrast to Comunica MCP SPARQL, it requires the LLM agent to explicitly include SERVICE keywords within the query, while Comunica enables automatic source selection. Furthermore, all advanced features from Comunica are exposed through MCP, such as HTTP proxy support, timeout settings, and authentication (e.g for updates).

Below, we list the tools that Comunica MCP SPARQL exposes to LLMs, together with their LLM-directed descriptions and (a subset of) the available parameters:

query-sparql: Execute SPARQL queries over one or more remote sources, which also includes update queries.
- query (required): SPARQL query string.
- sources (required): List of SPARQL endpoint URLs, TPF interface URLs, or Lin
- httpProxy (optional): HTTP proxy URL (e.g., http://proxy.example.com:8080).
- httpAuth (optional): HTTP basic authentication in the format username:password.
query-sparql-rdf: Execute SPARQL queries over a serialized RDF dataset provided as a string.
- query (required): SPARQL query string.
- value (required): Serialized RDF dataset as a string.
- mediaType (required): Media type of the serialized RDF dataset.

Demonstration

This demonstration shows the functionalities and current limitations of Comunica MCP SPARQL, to trigger discussions for future work. Concretely, we will run Comunica MCP SPARQL locally, and connect different LLM-based AI chatbots with it. Through various prompts, participants can see answer accuracy improve.

[An example of using Comunica MCP SPARQL in Claude] — Fig. 1: Get all movies Brad Pitt and Leonardo DiCaprio both play in with Claude.

For example, when using Claude to ask “What movies do both Brad Pitt and Leonardo DiCaprio both play in?”, only a single answer is produced; namely “Once Upon a Time in Hollywood”. However, when asking “Use Comunica SPARQL to determine what movies both Brad Pitt and Leonardo DiCaprio both play in.” (see Fig. 1), we receive a more accurate answer; “Once Upon a Time in Hollywood and The Audition”. This is because internally, this second question leads to a SPARQL query over DBpedia, which causes a more complete answer to be produced. When connected to a Solid pod, prompts such as “Using Comunica SPARQL Solid, can you show me information about myself based on my profile?” can be answered.

Comunica MCP still requires a dedicated MCP server, which contradicts Comunica’s goal to enable pure client-side execution. To address this limitation, we will also demonstrate a prototype of Comunica MCP that uses WebMCP [14] to expose MCP capabilities through the Web browser via https://query.comunica.dev/.

Conclusions

MCP enables AI agents to act as orchestrators of other systems, and Comunica MCP SPARQL offers this connection point to Knowledge Graphs. Thanks to the power of Comunica [5], not just SPARQL endpoints can be queried, but also Linked Data documents in any RDF serialization, Solid pods (including authenticated access to private data), and more. And if the agent does not know what data sources to start from, Comunica’s link traversal engine [12] can be used.

Our preliminary findings show that modern LLMs mostly seem to decide to query over DBpedia and Wikidata, which they do very well. This confirms earlier experments that show that LLMs internalize DBpedia and Wikidata during pretraining [15]. If the user wants to query over other Knowledge Graphs (such as Uniprot or DBLP), they need to ask it explicitly. Furthermore, LLMs seem to struggle with writing federated queries, and can even hallucinate SPARQL endpoint URLs. For this, Comunica MCP SPARQL is a tool that acts as an enabler for future research. For instance, there is a need for more research on how well agents can understand other Knowledge Graphs besides DBpedia and Wikidata, for which exposing additional tools in line with those of GRASP [3] could be valuable. Furthermore, more research is needed on guiding agents towards query-relevant Knowledge Graphs, similar to the work set out by the authors of SPARQL-MCP [13].

Acknowledgements

The authors are fellows of FWO (1202124N, 1SB8525N). We thank Daniel Dobriy for discussions on MCP within the GOBLIN COST Action (CA23147).