Querying over Decentralized Data

Ruben Taelman

Knowledge Graphs course 2026, 6 March 2026

Querying over
Decentralized Data

Query execution when data can not be centralized

Ghent University – imec – IDLab, Belgium

Data is highly decentralized

Decentralized data integration is challenging for application developers

Query engines abstract access to decentralized data

Hide the complexities of reading and writing for app developers

Application Gear Globe

Image credit

SPARQL processing over centralized data

Centralization not always possible

How to query over decentralized data?

Approaches for querying over decentralized data

Client distributes query over query APIs

Federation over SPARQL endpoints

SELECT ?drug ?title WHERE {
  SERVICE <http://example.com/drb> {
    ?drug db:drugCategory dbc:micronutrient.
    ?drug db:casRegistryNumber ?id.
  }
  SERVICE <http://example.com/kegg> {
    ?keggDrug rdf:type kegg :Drug.
    ?keggDrug bio2rdf:xRef ?id.
    ?keggDrug purl:title ?title.
  }
}

Automatic source selection

# Source 1: http://example.com/drb
# Source 2: http://example.com/kegg

SELECT ?drug ?title WHERE {
  ?drug db:drugCategory dbc:micronutrient.
  ?drug db:casRegistryNumber ?id.
  ?keggDrug rdf:type kegg :Drug.
  ?keggDrug bio2rdf:xRef ?id.
  ?keggDrug purl:title ?title.
}


SELECT ?drug ?title WHERE {
  SERVICE <http://example.com/drb> {
    ?drug db:drugCategory dbc:micronutrient.
    ?drug db:casRegistryNumber ?id.
  }
  SERVICE <http://example.com/kegg> {
    ?keggDrug rdf:type kegg :Drug.
    ?keggDrug bio2rdf:xRef ?id.
    ?keggDrug purl:title ?title.
  }
}

Main federation bottleneck: data transfer

Exhaustive source selection (worst-case)

# Source 1: http://example.com/drb
# Source 2: http://example.com/kegg

SELECT ?drug ?title WHERE {
   ?drug db:drugCategory dbc:micronutrient.
   ?drug db:casRegistryNumber ?id.
}
        
SELECT ?drug ?title WHERE {
  {
    SERVICE <http://example.com/drb> {
      ?drug db:drugCategory dbc:micronutrient.
    }
  } UNION {
    SERVICE <http://example.com/kegg> {
      ?drug db:drugCategory dbc:micronutrient.
    }  
  }
  {
    SERVICE <http://example.com/drb> {
      ?drug db:casRegistryNumber ?id.
    }
  } UNION {
    SERVICE <http://example.com/kegg> {
      ?drug db:casRegistryNumber ?id.
    }  
  }
 }

Identifying exclusive groups

SELECT ?drug ?title WHERE {
   SERVICE <http://example.com/drb> {
     ?drug db:drugCategory dbc:micronutrient.
   }
   SERVICE <http://example.com/drb> {
     ?drug db:casRegistryNumber ?id.
   }
 }
        
 
SELECT ?drug ?title WHERE {
  SERVICE <http://example.com/drb> {
    ?drug db:drugCategory dbc:micronutrient.
    ?drug db:casRegistryNumber ?id.
  }
}

Federation-specific join algorithms

Example of one iteration (block size 16):

Joining (?category) ⋈ (?category, ?drug, ?id)

SELECT ?drug ?id WHERE {
   VALUES (?category) {
       db:drugCategory1 db:drugCategory2 db:drugCategory3 db:drugCategory4
       db:drugCategory5 db:drugCategory6 db:drugCategory7 db:drugCategory8
       db:drugCategory9 db:drugCategory10 db:drugCategory11 db:drugCategory12
       db:drugCategory13 db:drugCategory14 db:drugCategory14 db:drugCategory15
   }
   ?drug db:drugCategory ?category.
   ?drug db:casRegistryNumber ?id.
 }

Federation over heterogeneous sources

Limitations of federated querying

Example: decentralized address book

Example: Find Alice's contact names

SELECT ?name WHERE {
    <https://alice.pods.org/profile#me>
        foaf:knows ?person.
    ?person foaf:name ?name.
}

Query process:

  1. Start from Alice's address book
  2. Follow links to profiles of Bob and Carol
  3. Query over union of all profiles
  4. Find query results: [ { "name": "Bob" }, { "name": "Carol" } ]

Link Traversal has specific challenges

That do not exist in centralized querying

Link queue iteration blocks evaluation

→ Process query during link queue handling

Number of links can become very large

→ Pre-filter links before they enter queue

Different filtering strategies exist

Follow if triple matches query: example

SELECT ?name WHERE {
    <https://alice.pods.org/profile#me> foaf:knows ?person.
    ?person foaf:name ?name.
}

Contents of https://alice.pods.org/profile:
✅ </profile#me> :knows <https://bob.pods.org/profile#me>.
✅ </profile#me> :knows <https://carol.org/#i>.
❌ </profile#me> :likes <https://bob.pods.org/posts/hello-world>.

Filtering links and query semantics

Traditional query planning does not work

→ Zero-knowledge query planning

Score-based heuristics for determining triple pattern order in BGPs

Link Traversal: too slow for querying over Linked Open Data

Link Traversal becomes feasible with structural assumptions

Solid pods follow structural properties

Conclusions

Single-pod queries are fast, but multi-pod queries can lead to link overload

Current techniques for following cross-pod links are unselective

Link queue Discover 4 Link queue Complex 2

If pods expose more information, complex querying can become faster