Applied Database Systems - Tutorial 1

This tutorial is a revised version of Lena Hansson and Mark Longair from previous years of this course. Please let me know of any errors you find in this tutorial.

This tutorial consists of 2 sections:

 PostgreSQL Basics
This section of the tutorial consists of 6 parts:
Using PostgreSQL on DICE

Access to the PostgreSQL database server on DICE for teaching purposes is restricted to those people doing relevant courses, and you must register in order to use the system.

Your first step should be to read and follow the instructions on the PostgreSQL User Self-Service Guide page, which will explain you how to register for a PostgreSQL account. The computer support people assured us that PostgreSQL accounts should be available within 24 hours of registering for the AD course. If you cannot get a PostrgeSQL account after 24 hours, please fill out computing support request.

If everything has gone well, you should now be able to interact with PostgreSQL by running the command:

psql -h pgteach

Try this now; it should bring up a prompt that looks like this:

Welcome to psql 8.1.11, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help on internal slash commands
       \g or terminate with semicolon to execute query
       \q to quit

SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256)

sXXXXXXX=>

If that doesn't happen, and you get an error like "psql: krb5_sendauth: Ticket expired" then your Kerberos ticket has expired. You can renew it by typing:

renc

... and entering your password when prompted. If you have other problems at this stage, something may have gone wrong in your initial registration, and you should go back to the PostgreSQL User Self-Service Guide page and check to see if your problem is discussed there.

The following sections describe the basics of interacting with the PostgreSQL terminal and contain a variety of tips which are hopefully useful for anyone who is unfamiliar with command-line programs of this type.
Basic Commands

Now that you are properly registered, you shouldn't have to worry about any of the above again. If you ran psql -h pgteach as above, then you will be faced with the PostgreSQL prompt. In order to exit from the interpreter you can type \q or press ctrl-D at an empty prompt.

Some other useful commands you may need include:

  • \d to show all ("describe") tables in the database
  • \d <tablename> to describe a particular table
  • \h to see all SQL commands
  • \? to see all PostgreSQL commands

A common source of confusion when using PostgreSQL interactively is that if you make a mistake, such as not finishing your SQL statement with a semi-colon, then the command prompt may change to -> instead of =>, indicating that the interpreter thinks that you are still typing the previous command. To return to the top level, use Ctrl-C. To give an example, the normal top level interpreter prompt looks like:

sXXXXXXX=>

... and the continuation prompt looks like this:

sXXXXXXX->
Using External Files with PostgreSQL

Using the text editor of your choice (e.g. Emacs, XEmacs or one from the Start Menu), enter the SQL commands just as you would at the PostgreSQL prompt, making sure to complete all the commands with a semi-colon (';'). Save that file as, for example, hello.sql.

If you saved that file in your home directory then you can just open a terminal window, start the interactive terminal with psql -h pgteach and perform the commands in the file by typing:

\i hello.sql

(replacing 'hello.sql' with whatever you called your file.) If you saved the file elsewhere, change to that directory before running the interpreter, or enter the full file name after \i instead of just 'hello.sql'.

Redirecting Output to External Files

You can redirect the output of a query into an external file using the \o command. In order to save the output of a query into a file name myoutput.txt enter the following command before you execute the query:

\o myoutput.txt

All the output will now be stored in myoutput.txt instead of being displayed on the screen. To stop output redirection simply type:

\o
Retrieving and Editing Old Commands

Use the up-arrow to return to previous typed commands. This can be used multiple times to access even older commands.

Copying and Pasting in X Windows

Mark the text you wish to copy by clicking the left mouse button and dragging until the "darker region" covers the text you wish to copy. Then go to the application (and place) where you want to paste the text. There, click the wheel on the mouse to paste the text. If the wheel does not work, then try both mouse-buttons at once.

Tab-Completion

If you are typing a command or the name of a file in a terminal window, you may be able to complete the word automatically by hitting the Tab key. In the PostgreSQL terminal you can also use the Tab key to complete the names of tables and so on.

Example 1: I have two tables in my database, one is called "cities" and the other "weather". If I wanted to run the command "SELECT * FROM WEATHER;" I could type "SELECT * FROM W", hit Tab and the W would be completed to WEATHER.

Example 2: I have three tables, the two above and one called WINTER. Should I follow the same steps as above, PostgreSQL will display all the possible completions, in this case WEATHER and WINTER. This means that I have to add an E to my command and then hit Tab again to uniquely complete the word.

 Database Design

PostgreSQL is a relational database management system (RDBMS). That means it is a system for managing data stored in relations. Relation is essentially a mathematical term for table. The notion of storing data in tables is so commonplace today that it might seem inherently obvious, but there are a number of other ways of organizing databases. Files and directories on Unix-like operating systems form an example of a hierarchical database. A more modern development is the object-oriented database.

Each table is a named collection of rows. Each row of a given table has the same set of named columns, and each column is of a specific data type. Whereas columns have a fixed order in each row, it is important to remember that SQL does not guarantee the order of the rows within the table in any way (although they can be explicitly sorted for display).

Tables are grouped into databases, and a collection of databases managed by a single PostgreSQL server instance constitutes a database cluster.

This section of the tutorial consists of 4 parts:

CREATE TABLE

You can create a new table by specifying the table name along with all column names and their types. For example:

CREATE TABLE weather (
     city            varchar(80),
     temp_lo         integer,           -- low temperature
     temp_hi         integer,           -- high temperature
     prcp            real,              -- precipitation
     date            date
);

You can enter this into psql with the line breaks. It will recognize that the command is not terminated until the semicolon. Whitespace (such as spaces, tabs, and newlines) may be used pretty freely in SQL commands. That means you can format that command very differently from above in your editor, or even collapse it all onto one line. Two dashes (--) introduce comments, and anything after them is is ignored up to the end of the line. SQL is case insensitive about key words and identifiers, except when identifiers are double-quoted to preserve the case (that is rarely done, however). It is good practice to keep SQL commands in upper case, in order to make your queries easier to read.

The expression varchar(80) specifies a data type that can store arbitrary character strings up to 80 characters in length. integer (or just int) is the most common integer type - this type is signed and stored in 4 bytes. real is a type for storing single precision floating-point numbers. date should be self-explanatory. (You may notice here that the column of type date is also named date. This is allowed, but you should avoid exploiting this for obvious reasons.)

The distinction between the time and timestamp types is that timestamp includes both the date and the time of day, whereas time only stores the time of day. You can use the following special values to enter the current date, time or timestamp: CURRENT_DATE, CURRENT_TIME and CURRENT_TIMESTAMP.

PostgreSQL supports all the standard SQL types, which are:

  • bit
  • bit varying
  • boolean
  • char
  • character varying
  • character
  • varchar
  • date
  • double precision
  • integer
  • interval
  • numeric
  • decimal
  • real
  • smallint
  • time (with or without time zone)
  • timestamp (with or without time zone)

If you're interested, you can look look these up in the PostgreSQL manual page on datatypes.

PostgreSQL also allows a variety of useful other types including a set of geometric types. You can also define your own types.

The second example will store cities and their associated geographical location:

CREATE TABLE cities (
     name            varchar(80),
     location        point
 );

The point type is an example of a PostgreSQL-specific data type.

If you don't need a table any longer or want to recreate it differently you can remove it using the DROP TABLE command, as described in the following section.

DROP TABLE

In SQL terminology, you "drop" rather than "delete" tables. (The DELETE command is used for remove rows from tables.) To drop a table (or a view, a sequence, a constraint, etc.) you would type:

DROP <type-of-thing-to-drop> <name-of-thing-to-drop>

For example, the following two commands drop the before created tables:

DROP TABLE cities;
DROP TABLE weather;

If you are editing a file with data to include with the \i command in PostgreSQL, it is often convenient to add DROP TABLE commands at the top of the file, so that you don't just end up adding duplicate entries to the tables each time you include the file.

ALTER TABLE

When you create a table and you realize that you made a mistake, or the requirements of the application change, then you can drop the table and create it again. But this is not a convenient option if the table is already filled with data, or if the table is referenced by other database objects (for instance a foreign key constraint). Therefore PostgreSQL provides a family of commands to make modifications to existing tables.

You can

  • Add columns,
  • Remove columns,
  • Add constraints,
  • Remove constraints,
  • Change default values,
  • Change column data types,
  • Rename columns,
  • Rename tables.

All these actions are performed using the ALTER TABLE command.

The following command will rename a column:

ALTER TABLE weather
RENAME COLUMN prcp TO precipitation;

The following command will set the default value for a column:

ALTER TABLE weather
ALTER COLUMN date SET DEFAULT CURRENT_DATE;

The following command will add a foreign key constraint:

ALTER TABLE weather
ADD FOREIGN KEY (city) REFERENCES cities(name);

Question: Why will the previous command fail if you created table cities as shown above?

Try to find out how to add a primary key to an existing table using the ALTER TABLE command.

Exercises

Recall the following ER diagram:

The CREATE TABLE statements for the translated ER diagram are given in file create_munros.sql. Notet that primary key and foreign key constraints are omitted from the CREATE TABLE statements. Note that constraints may either be specified in the CREATE TABLE command or using ALTER TABLE commands.

  1. Download file create_munros.sql (right click and "Save as...").
  2. Append ALTER TABLE commands to the downloaded file to define primary key and foreign key constraints similar to those shown on the lecture slides.
  3. Execute the commands in create_munros.sql using the \i command.
  4. List the tables in your database.
  5. Show the description of table munros.
  6. Create a file named drop_munros.sql that contains DROP TABLE commands for each of the three tables.
  7. Execute the commands in drop_munros.sql using the \i command.

Question: Why is the order in which the tables are dropped important?

Modify the given ER diagram to include the following entities and relationships:

  • Apart from Munros we also want to include Corbetts in our list of mountains. A Corbett is a separate mountain over 2,500ft. A Corbett has the same attributes as a Munro. Note that a mountain in our list is either a Munro or a Corbett.
  • Each mountain (i.e., Munro or Corbett) is assigned to exactly one region it is located in.
  • Each region has a unique identifier (integer), a name of maximal 120 character, an attribute that shows the number of Munros in that region, and an attribute that shows the number of Corbetts in that region.
  • For each region we assign one of the hikers as the supervisor. A hiker may not supervise more than one region.
  • For each hiker we maintain a list of contact information. A contact is either a telephone number or an e-mail address.Telephone numbers have a country code, an area code, and the actual phone number. An e-mail address contains only the full address itself as an attribute. Furtermore, contacts are numbered uniquely from 1 to N. Each hiker has at least one contact. One of the contacts for each hiker is labeled as the primary contact.

Tip: If you wish to draw your ER diagram on DICE, I suggest that you use dia, a Visio-like package for creating diagrams, or the more venerable xfig. For this exercise you may also just draw your diagram on paper.

Translate the modified ER diagram into a relational schema. Make sure that your schema enforces the given constraints. Store the CREATE TABLE (and ALTER TABLE) commands in a separate file and execute the commands in the file using the \i command.

For more detailed information on PostgreSQL DDL commands refer to the PostgreSQL 8.1 documentation.

End of Tutorial 1. Please send any corrections or suggestions to Heiko Müller.