Optimizing Approximate Membership Metadata in Triple Pattern Fragments

Ruben Taelman

SSWS 2020, 2 November 2020

Optimizing Approximate Membership Metadata in Triple Pattern Fragments

Ghent University – imec – IDLab, Belgium

Triple Pattern Fragments (TPF)

A low-cost triple pattern interface

high client cost high availability high bandwidth high server cost low availability low bandwidth data dump subject page Triple Pattern Fragments SPARQL endpoints (R. Verborgh 2014)

TPF leads to longer execution times

Example of membership requests

Input query

SELECT ?p WHERE {
  ?p dbo:birthPlace dbr:York. # Evaluate first
  ?p a dbo:Artist.
}

Bind results to remaining query → membership queries

dbp:Albert_Joseph_Moore    a dbo:Artist
dbp:Charles_Francis_Hansom a dbo:Artist
dbp:Dustin_Gee             a dbo:Artist
... (207 in total on DBpedia)

→ 207 HTTP membership requests

Approximate membership filters (AMFs) to reduce HTTP requests

Example with AMF

Input query

SELECT ?p WHERE {
  ?p dbo:birthPlace dbr:York. # Evaluate first
  ?p a dbo:Artist. # Download AMF
}

Run results through AMF of second pattern

AMF(dbp:Albert_Joseph_Moore)
AMF(dbp:Charles_Francis_Hansom)
AMF(dbp:Dustin_Gee)
...

207 32 HTTP membership requests

AMFs reduce number of HTTP requests

Goal of this work:
How to reduce query execution time with AMFs over TPF?

Investigated directions for optimization

Subselection, more in the article

1. Using AMFs higher up in query plan reduces time for most queries

None: Default client-side TPF algorithm
Triple: Existing triple-level AMF algoritm
BGPSimple, BGPCombined, TripleBGP: New higher-level algorithms

Speedup for 15 / 20 queries

WatDiv (dataset scale: 100; query count: 5)

1. Using AMFs higher up in query plan changes result arrival times

Delayed time to first result
AMFs used for higher-level algorithms are larger → more HTTP delay
Faster result rate

2. Any form of caching significantly reduces query execution time

HTTP: Server-wide NGINX cache
Filter: Only caching AMFs

3. No clear winner among the different false-positive probabilities



Optimal probability depends on dataset
WatDiv: 1/64

Experiments are fully reproducible

Findings