Professor: Leonid Libkin

While there are no formal prerequisites, it is assumed that students
taking this course are familiar with the topics taught in a standard
database course such
as DBS
in Edinburgh. Concepts such as relational calculus (i.e., first-order logic), algebra,
constraints (keys, foreign keys, functional dependencies), basics of
SQL programming, basics of database design, and basics of relational
query processing will **not** be covered in lectures: they are part
of the background knowledge. Standard notions related to computability
such as finite automata, grammars, and Turing machines are supposed to
be part of the background knowledge as well (for example, in
Informatics at Edinburgh, this is generally
taught in the first year of the undergraduate program).

**Essay and presentation** The coursework consists of an essay and
in-class presentation on the last day of lectures (31 July). The mark
is determined solely by the essay but the presentation is compulsory,
that is, essays will only be marked if students did a
presentation. The rationale for this is that the course is short and
intense and students have little time for presentation, hence its role
is downsized.

**Presentation guidelines** Each presentation should be around 7-8
minutes; students are expected to say which paper from the reading
list they have chosen, why they find it interesting, and give a very
brief (one or two slides at most) overview.

Please prepare you slides as a PDF file (better than powerpoint: with
PDF viewers there are fewer surprises and it allows faster switches
between presenters; of course you can always save powerpoint or
keynote as pdf) and have it with you on a usb
stick. We shall go in the alphabetical (by last name) order, so that the
last two 50-minute sessions will be sufficient for all the students to
go over their presentation.

**Essay guidelines** The course is split into several topics
(see below), each with a set of associated papers from the reading
list (see the link below). Each essay should
pick a paper from the list and present:

- a summary of the paper, and
- your own analysis and critical thoughts. This may be a criticism of the paper, your ideas for extending its results, or analysis of followup papers that show how the ideas of the paper under review have influenced the field. You may also consider an implementation of a theoretical algorithm with performance analysis, or an extension of some of the results to cover new cases, or an improvement for an existing solution, perhaps under some restrictions.
- Crucially, your essay must be understood by someone who
has
**not**read the paper. Of course in reality it will be marked by someone who has (although in some cases, a while ago), but still this is an important criterion that will be used in marking.

Deadline: 13 August 2018, 5pm Beijing time.

Please email a PDF file (no .doc, no .tex source etc, just pdf) to me, to

Course materials

**Main topics**

- Foundations: conjunctive queries, constraints/chase, datalog/recursion
- Understanding SQL as a data analytics tool
- Scalable query answering
- Approximation of queries
- Graph databases: theory, and the Cypher language of Neo4j
- RDF and SPARQL
- Incomplete information
- Inconsistent data and consistent query answering
- Data integration and exchange
- Ontologies and querying

**Background reading**

- Materials of the Database Systems course (3rd year)
- Materials of the Computation and Logic course (1st year)
- Abiteboul, Hull, Vianu,
*Foundations of Databases*, 1995 - Books on data exchange and finite model theory (chapters 6-9)

**Lecture materials**

- General course info
- Background: relational model
- Conjunctive queries, chase, datalog
- SQL: surprises and semantics
- Volume: scalability, approximations
- Graph databases: theory
- Graph databases: Neo4j and Cypher
- RDF databases
- XML (if time permits)
- Incomplete and Inconsistent data
- Data integration and exchange
- Ontologies and querying