Query-driven Data Integration for a Decentralized Web

Ruben Taelman

VITO DPH Day 2025, 10 October 2025

Query-driven
Data Integration
for a Decentralized Web

Ghent University – imec – IDLab, Belgium

Data is highly decentralized

Decentralized data integration is challenging for user-facing app developers

Query engines abstract access to decentralized data

Hide the complexities of reading and writing for app developers

Application Gear Globe

Image credit

Query processing over centralized data

Centralization not always possible

How to query over decentralized data?

Approaches for querying over decentralized data

Federation distributes query over APIs

Link traversal follows linked documents

Limitations of querying approaches

Focus on Knowledge Graphs

Publishing Knowledge Graphs
as SPARQL Endpoints

SPARQL endpoint: API that accepts SPARQL queries, and replies with results.

SPARQL endpoints have low availabily

Public endpoints have an availability of 95%.

→ ~1.5 days downtime per month!

Vandenbussche, Pierre-Yves, et al. "SPARQLES: Monitoring public SPARQL endpoints." Semantic web 8.6 (2017): 1049-1065.

SPARQL endpoints have restrictions

To counter availability issues

→ Limits types and number of queries that can be executed!

Alternatives to SPARQL endpoints

To limit expressivity

LDF axis

Linked Data Fragments (LDF) axis: investigates trade-offs between server and client effort for query execution.

Verborgh, Ruben, et al. "Triple pattern fragments: a low-cost knowledge graph interface for the web." Journal of Web Semantics 37 (2016): 184-206.

LDF interfaces complicate federation


→ Federation engines must combine data across heterogeneous APIs

Personal data

Decentralization initiatives offer users full control of where data is stored and who can access it

Solid Pods

How to query protected data?

→ Lack of understanding of how to do this for decentralized data!

Solid as decentralized environment

https://solidproject.org/

We develop the Comunica framework

https://comunica.dev/

The future of RDF and SPARQL

https://www.w3.org/TR/sparql12-query/

LDES enables data silo synchronization

https://semiceu.github.io/LinkedDataEventStreams/

From access to usage control with ODRL

https://semiceu.github.io/LinkedDataEventStreams/

Conclusion: Query-driven data integration for decentralization