Querying RDF with SPARQL

Introduction

In this part of the assignment, you will create some SPARQL queries over your FOAF files and related data.

A good introduction to SPARQL (again by Leigh Dodds) is this tutorial. You can look at the W3C Recommendation for SPARQL. You might also want to consult this handy cheatsheet for SPARQL syntax.

Finally, here is more tutorial material on SPARQL:

Implementations

There are a number of SPARQL implementations around. So far, the best one I have found is ARQ, which is based on the Jena toolkit. ARQ is very easy to install; see ARQ download and read the accompanying documentation; and the ARQ Tutorial, which is also pretty good, and links to a few other resources. In the rest of this document, I will assume that you are going to use ARQ.

Unfortunately, the Python-based rdflib implementation of SPARQL is incomplete and I don’t recommend using it for this exercise.

If you don’t want to download any files (e.g., ARQ), you could instead try the web-based SPARQLing query interface (screenshots of SPARQLing query and SPARQLing results). Be warned that the XSLT transform file suggested on the SPARQLing web page is no longer accessible; use http://www.w3.org/TR/rdf-sparql-XMLres/result-to-html.xsl instead.

ARQ Installation and Usage

One of the easiest ways to run SPARQL queries is using ARQ, which is a wrapper round Jena. ARQ is available on DICE at:

/usr/share/java/jena/bin/arq

ARQ is very easy to install; see http://jena.sourceforge.net/ARQ for more information, including download instructions and a nice tutorial.

Here are the installation instructions if you download your own copy.

  1. First, you need to set environment variable JENAROOT. If you are in the root of the unzipped distibution, you can do this:

    export JENAROOT=$PWD

    On DICE, you’d do:

    export JENAROOT=/usr/share/java/jena
  2. Second, if it is your own installation, ensure all scripts are executable:

    chmod u+x $JENAROOT/bin/*

Querying

Here’s an example of a simple SPARQL query:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name1  ?name2
FROM <http://homepages.inf.ed.ac.uk/ewan/foaf.rdf>
WHERE { 
      ?person1 foaf:knows ?person2 .
      ?person1 foaf:name  ?name1 .
      ?person2 foaf:name  ?name2 .
      }

Most of this should be familiar to you, but the``FROM`` clause may be new. This says that the query should be run against the RDF data to be found at http://homepages.inf.ed.ac.uk/ewan/foaf.rdf. Note that the URI has to be addressable via HTTP when the query is executed.

Let’s assume that we have installed ARQ in a directory ARQ-2.6.2. To call the ARQ SPARQL query engine, we use the following instruction on the command-line:

% ARQ-2.6.2/bin/sparql --query example-01.rq

Given my FOAF file, ARQ-2.6.2/bin/sparql will print the following output to the terminal:

---------------------------------
| name1        | name2          |
=================================
| "Ewan Klein" | "Harry Halpin" |
---------------------------------

We are allowed to have more than one FROM clause in a query, and the resulting graphs are merged. This is shown in the next example, where we query both my FOAF file and Harry Halpin’s.

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name1  ?name2
FROM <http://homepages.inf.ed.ac.uk/ewan/foaf.rdf>
FROM <http://www.ibiblio.org/hhalpin/foaf.rdf>
WHERE { 
      ?person1 foaf:name ?name1 ;
               foaf:knows [ foaf:name ?name2 ];
      }

As you can observe here, SPARQL triple patterns allow the same abbreviatory syntax as we have already seen for N3, though strictly speaking SPARQL uses a subset of N3 called Turtle.

Running the query gives the following result set:

------------------------------------
| name1          | name2           |
====================================
| "Harry Halpin" | "Ivan Herman"   |
| "Harry Halpin" | "Dan Connolly"  |
| "Harry Halpin" | "Ian Davis"     |
| "Harry Halpin" | "Paolo Bouquet" |
| "Ewan Klein"   | "Harry Halpin"  |
------------------------------------

The next query is run against my facts-plus-ontology file for bread. In this case, I want to recover instances that either have rdf:type masws:Bread or else have rdf:type C, where C is a subclass of masws:Bread. In order to match these two alternatives, we use the UNION keyword, as shown here:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX masws: <http://www.inf.ed.ac.uk/teaching/courses/masws/ontology#>

SELECT ?bread ?class ?tags
FROM <http://homepages.inf.ed.ac.uk/ewan/masws/rdf/combined.rdf>
WHERE 
    {
      ?bread dc:subject ?tags .
      {
       ?bread a ?class .
       ?bread a masws:Bread .
      }
      UNION
      {
       ?bread a ?class.
       ?class rdfs:subClassOf masws:Bread .
      }
    } 

Note that the first occurrence of ?bread a ?class. does not act to restrict the selected values, but just allows us to display a ?class value for every combination of ?bread with ?tags. This is shown in the results:

---------------------------------------------------------------------------------------------
| bread                 | class           | tags                                            |
=============================================================================================
| masws:multigrainbread | masws:Bread     | "bread multigrain spelt barley kamut rye wheat" |
| masws:oatmealbread    | masws:Bread     | "bread wholegrain wheat"                        |
| masws:ciabatta        | masws:Ciabatta  | "bread ciabatta sourdough"                      |
| masws:sourdoughbread  | masws:Sourdough | "bread sourdough rye"                           |
---------------------------------------------------------------------------------------------

So far, we have just focussed on SELECT, which returns a table of results. However, we can also use SPARQL’s CONSTRUCT keyword to build a new RDF graph for us, based on a graph template which may contain variables. This is illustrated in the next example.

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dlcs: <http://del.icio.us/>
PREFIX masws: <http://www.inf.ed.ac.uk/teaching/courses/masws/ontology#>
PREFIX : <http://homepages.inf.ed.ac.uk/ewan/foaf.rdf#>

CONSTRUCT { ?person foaf:interest ?href .
            ?person foaf:name ?name . }

FROM <http://homepages.inf.ed.ac.uk/ewan/foaf.rdf>
FROM <http://homepages.inf.ed.ac.uk/ewan/masws/rdf/combined.rdf>

WHERE {
      ?person foaf:interest ?topic .
      ?topic foaf:maker ?person .
      ?person foaf:name ?name .
      ?subj dlcs:href ?href .
          {
          ?subj a masws:Bread .
          ?subj dlcs:href ?href .
          }
          UNION
          {
          ?subj a ?class.
          ?class rdfs:subClassOf masws:Bread .
          }
      } 

Like the previous example, we look for resources which are instances of masws:Bread or of a subclass of masws:Bread. However, there are a number of other conditions. Thus, ?person foaf:interest ?topic, and ?topic foaf:maker ?person will be satisfied if there is a person ?person whose foaf:interest is some resource which has a representation of which ?person is the foaf:maker.

Here is the graph, in Turtle syntax, that is returned by the query:

@prefix masws:   <http://www.inf.ed.ac.uk/teaching/courses/masws/ontology#> .
@prefix dlcs:    <http://del.icio.us/> .
@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix foaf:    <http://xmlns.com/foaf/0.1/> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix :        <http://homepages.inf.ed.ac.uk/ewan/foaf.rdf#> .

:ehk
      foaf:interest  <http://www.armchair.com/recipe/ryebread.html> ;
      foaf:interest  <http://www.all-creatures.org/recipes/bread-mg-sbkrw.html> ;
      foaf:interest  <http://www.recipezaar.com/107986> ;
      foaf:interest  <http://www.sourdoughhome.com/ciabatta.html> ;
      foaf:name     "Ewan Klein" .

The task

For this part of the assigment, I want you to do the following:

  1. Write a SPARQL query, using SELECT, which gives the names of all the people that you know and that your MASWS partner knows. Use Turtle abbreviations in your graph patterns. In order to execute this query, you will need to inspect both your FOAF file and the FOAF file of your partner.
  2. Write a SPARQL query using CONSTRUCT similar to the one given above, which creates a graph that captures some aspects of your partner’s interests as shown by their counterpart of my combined.rdf file. Make sure that your WHERE clause uses their RDFS ontology as a way of further specifying the results. You don’t have to follow the example I gave; there are many other ways of building interesting queries! Again, use Turtle abbreviations in your graph patterns.

Hand-in: A hard-copy version of your SPARQL query, plus the trace of the results return from running the query.

Intended Learning Outcomes

On completion, you should be able to:

  1. understand and process RDF data created by other people;
  2. explain the general role of SPARQL in extracting information from RDF data;
  3. explain the use of the SPARQL keywords PREFIX, FROM, WHERE, UNION in constructing queries and the use of SELECT and CONSTRUCT for presenting query results;
  4. demonstrate an understanding of syntactic abbreviations in N3/Turtle;
  5. demonstrate an understanding of how graph merging as a result of multiple FROM statements affects the query results;
  6. explain how RDFS ontology statements can be used to provide useful semantic structure in recovering information from RDF data.

Table Of Contents

This Page