Getting started with RDF SPARQL queries and inference using Apache Jena Fuseki

This post describes how to get started using the Apache Jena RDF Server (Fuseki) to perform basic SPARQL queries and inference.

Apache Jena is an open-source framework for building semantic web applications. Fuseki is a component of Jena that lets you query an RDF data model using the SPARQL query language.

This post takes you through the steps to set up a Fuseki 2 server, configure it to perform basic inference, and run some SPARQL queries.

This example was inspired by Jörg Baach’s Getting started with Jena, Fuseki and OWL Reasoning.

Set up Fuseki Server

Download and unzip the Fuseki server from the Jena download site. This version used in this post is Fuseki 2.5.0.

In the fuseki directory, run the server:

cd fuseki
./fuseki-server

Browse to http://localhost:3030. You should see the Fuseki server home page, with a green server status in the top right, and with no datasets on it.

Stop the server (CTRL-C).

Configure Datasets

We are going to create the configuration for three datasets, by creating files in fuseki/run/configuration. The three datasets will illustrate:

  • a persisted dataset without inference
  • an in-memory dataset with inference
  • a persisted dataset with inference

Persisted dataset without inference

Create the configuration for a persisted dataset using Jena TDB with no inference enabled, in a file called fuseki/run/configuration/animals_no_inf.ttl. Replace <path> in the dataset section with the path to your fuseki directory:

@prefix :      <http://base/#> .
@prefix tdb:   <http://jena.hpl.hp.com/2008/tdb#> .
@prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ja:    <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
@prefix fuseki: <http://jena.apache.org/fuseki#> .

:service1  a                          fuseki:Service ;
        fuseki:dataset                :dataset ;
        fuseki:name                   "animals_no_inf" ;
        fuseki:serviceQuery           "query" , "sparql" ;   # SPARQL query service
        fuseki:serviceReadGraphStore  "get" ;                # Graph store protocol (read only)
        fuseki:serviceReadWriteGraphStore "data" ;           # Graph store protocol (read/write)
        fuseki:serviceUpdate          "update" ;             # SPARQL update service
        fuseki:serviceUpload          "upload" .             # File upload

:dataset
        a             tdb:DatasetTDB ;
        tdb:location  "<path>/fuseki/run/databases/animals_no_inf" .

Lines 1-6 set up some prefixes for schemas used in the definition. Lines 8-15 set up a service to access the dataset supporting SPARQL query and update, as well as file upload. Lines 17-19 declare the dataset and where it should be stored.

Note that the run directory is only created when you start the server. If you skipped starting the server, you will need to create the directories manually.

In-memory dataset with inference

Create an in-memory dataset, with inference in the file fuseki/run/configuration/animals_in_mem.ttl:

@prefix :      <http://base/#> .
@prefix tdb:   <http://jena.hpl.hp.com/2008/tdb#> .
@prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ja:    <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
@prefix fuseki: <http://jena.apache.org/fuseki#> .

:service1  a                          fuseki:Service ;
        fuseki:dataset                :dataset ;
        fuseki:name                   "animals_in_mem" ;
        fuseki:serviceQuery           "query" , "sparql" ;
        fuseki:serviceReadGraphStore  "get" ;
        fuseki:serviceReadWriteGraphStore  "data" ;
        fuseki:serviceUpdate          "update" ;
        fuseki:serviceUpload          "upload" .

:dataset  a     ja:RDFDataset ;
  ja:defaultGraph <#model_inf_1> ;
  .

<#model_inf_1> rdfs:label "Inf-1" ;
 ja:reasoner
 [ ja:reasonerURL
 <http://jena.hpl.hp.com/2003/OWLFBRuleReasoner>];
 

Lines 1-15 are as above. Lines 17-19 declare the dataset and a default graph using the inference model, and lines 21-24 set up an inference model using the Jena OWL FB reasoner.

Persisted dataset with inference

Create an TDB (persisted) dataset, with inference in the file fuseki/run/configuration/animals.ttl, replacing <path> with the path to your fuseki directory:

 @prefix :      <http://base/#> .
@prefix tdb:   <http://jena.hpl.hp.com/2008/tdb#> .
@prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ja:    <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
@prefix fuseki: <http://jena.apache.org/fuseki#> .

:service_tdb_all  a                   fuseki:Service ;
        fuseki:dataset                :dataset ;
        fuseki:name                   "animals" ;
        fuseki:serviceQuery           "query" , "sparql" ;
        fuseki:serviceReadGraphStore  "get" ;
        fuseki:serviceReadWriteGraphStore "data" ;
        fuseki:serviceUpdate          "update" ;
        fuseki:serviceUpload          "upload" .

:dataset a ja:RDFDataset ;
    ja:defaultGraph       <#model_inf> ;
     .

<#model_inf> a ja:InfModel ;
     ja:baseModel <#graph> ;
     ja:reasoner [
         ja:reasonerURL <http://jena.hpl.hp.com/2003/OWLFBRuleReasoner>
     ] .

<#graph> rdf:type tdb:GraphTDB ;
  tdb:dataset :tdb_dataset_readwrite .

:tdb_dataset_readwrite
        a             tdb:DatasetTDB ;
        tdb:location  "<path>/fuseki/run/databases/animals"
        .

Lines 1-15 are as above. Lines 17-19 declare the dataset and a default graph using the inference model. Lines 21-25 set up an inference model that references a TDB graph. The TDB graph and the dataset in which it is persisted are defined in lines 27-33.

Create datasets

Exit and restart the server. You should see it load the configuration and register the new datasets:

./fuseki-server
...
[2017-03-02 16:56:50] Config     INFO  Load configuration: file:///Users/cdraper/cortex/fuseki2/run/configuration/animals.ttl
[2017-03-02 16:56:51] Config     INFO  Load configuration: file:///Users/cdraper/cortex/fuseki2/run/configuration/animals_in_mem.ttl
[2017-03-02 16:56:51] Config     INFO  Load configuration: file:///Users/cdraper/cortex/fuseki2/run/configuration/animals_no_inf.ttl
[2017-03-02 16:56:51] Config     INFO  Register: /animals
[2017-03-02 16:56:51] Config     INFO  Register: /animals_in_mem
[2017-03-02 16:56:51] Config     INFO  Register: /animals_no_inf

If you get errors, make sure you replaced the <path> correctly in the two persisted dataset configurations.

Refresh the UI at http://localhost:3030. You should now see the three datasets. Click ‘add data’ for the ‘animals_no_inf’ dataset.

We could load data from a file, but we are going to use a query. Go to the query tab. Make sure the dataset dropdown is set to ‘animals_no_inf’.

Change the SPARQL endpoint to: http://localhost:3030/animals_no_inf/update

Paste the insert command into the command area, overwriting anything that is there:

PREFIX rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs:   <http://www.w3.org/2000/01/rdf-schema#>
PREFIX ex:   <http://example.org/>
PREFIX zoo:   <http://example.org/zoo/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>

INSERT DATA {
ex:dog1    rdf:type         ex:animal .
ex:cat1    rdf:type         ex:cat .
ex:cat     rdfs:subClassOf  ex:animal .
zoo:host   rdfs:range       ex:animal .
ex:zoo1    zoo:host         ex:cat2 .
ex:cat3    owl:sameAs       ex:cat2 .
}

Lines 1-5 set up some prefixes for schemas used in the insert command. Lines 8-13 are the RDF triples for this example.

Execute the command by clicking on the arrow. You should get a result in the lower part of the UI indicating a successful update.

Repeat for the other datasets by changing the dropdown and using the ‘update’ URL.

Query datasets

In the ‘animals_no_inf’ dataset, change the SPARQL endpoint to: http://localhost:3030/animals_no_inf/sparql and run the following:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX ex: <http://example.org/>
SELECT * WHERE {
  ?sub a ex:animal
}
LIMIT 10

You should get a single result ‘ex:dog1’. This is because ‘dog1’ is the only subject explicitly identified as being of type ‘ex:animal’.

Now repeat the query for the ‘animals’ dataset (make sure to use the query URL). You should get the full set of inferred results:

ex:dog1
ex:cat2
ex:cat3
ex:cat1

‘cat2’ is inferred to be an animal because it is the object of a ‘zoo:host’ predicate, and because the objects of this predicate are declared to be of type animal by the ‘rdfs:range’ predicate. ‘cat1’ is inferred to be an animal because it is of type ‘cat’, which is declared to be a subclass of animal. ‘cat3’ is an animal because it is the same as ‘cat1’.

Inference Rules

All of the inference in the above is encoded in tuples that are imported by the OWL reasoner that the ‘animals’ dataset is configured to use.

In the browser, go to the ‘Dataset > Edit’ tab. For the ‘animals_no_inf’ dataset, click on ‘List current graphs’ then click on the default graph. You will see the 6 tuples that we inserted.

Then go to one of the other datasets and repeat. You will see there are over 600 tuples, consisting of the ontology rules being used for the inference.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s