Boxes introduction and examples

The Boxes tool is for quick checks and cross checks of lists, particularly those containing student numbers or course codes.

Video examples: Introduction and more examples.

Boxes tries not to make too many assumptions about what you need to do, or where your data is coming from, while (hopefully!) making some common tasks possible. If you regularly find yourself doing the same thing in Boxes, a tool or spreadsheet that does exactly what you want would be better.

Before looking at examples, here is an overview of what you’ll do:

  1. Copy text into the boxes from the clipboard with Ctrl-V. This text can contain “junk”, as long as it contains something Boxes can identify, such as course codes, student IDs, or a tab-separated table from copying a webpage or spreadsheet.
  2. The first row of options lets you process/tidy the data in a single box. Examples:
    • extract course codes or student numbers,
    • find or tidy tab-separated tables,
    • remove duplicates, or count how many times they occur.
  3. The second row of options lets you interact with data in another box. When you select which other box to use, the command is executed. Find differences or intersections between two lists, to help notice when items are unexpectedly missing or duplicated.

The commands in the second row let you compare rows based on any student numbers or course codes that they contain. As you’ll see below, when you use these features, you often don’t need to tidy up the data (the first row of options) at all.

Examples:

(These examples used 2018/19 data. To navigate between sessions in the DRPS you might like DRPS arrows.)

Compare two DPTs

Here we copy in two DPTs that share a lot of courses, and check which courses are only one of them.

  1. Open the 2018/19 DPT for Computer Science (BSc Hons) in a new tab, select and copy the whole page with Ctrl-A Ctrl-C. Then click in Box A of Boxes and paste with Ctrl-V (or right-clicking and selecting paste).

  2. Open the 2018/19 DPT for Informatics (MInf) in a new tab, use the mouse to select the first four years, copy with Ctrl-C, and paste into Box B of Boxes. You don’t need to copy each part separately, you can copy one block of text in one go, including intermediate “junk”.

  3. In Box A select Compare in the first drop-down in the second-row. To the right of that, select “Find DRPS codes only in one box.” Then click other box and select Box B.

    Box A should now contain the text below. You can make a text area in Boxes bigger by dragging its bottom right-hand corner.

    # Only in Box A:
    INFR10044   40 credits  Honours Project (Informatics)
    INFR11102   10 credits  Computational Complexity
    # Only in Box B:
    INFR08010   20 credits  Informatics 2D - Reasoning and Agents   Must be passed
    INFR10051   40 credits  MInf Project (Part 1)
    INFR11123   10 credits  Scalable Data Management Systems
    INFR11162   10 credits  Neural Computation

    Some of these differences are expected. A couple might be mistakes to be checked further.

Find what’s not being used on a list

Most of the courses that Informatics run are currently listed on the MInf degree DPT. Let’s check what’s not there. There are multiple ways to find out, here we prune down the sortable course list and look at what’s left.

  1. Copy-paste the contents of these pages into Boxes:

  2. In the second row of controls for Box A, select “Discard row if Col 2 (EUCLID Code) is part of a row in”, and choose Box B as the other box.

    Aside: Boxes discarded the junk around the table, because those rows didn’t have a column 2. We could have instead said “Discard row if row has DRPS code in”, but we would have retained the junk. We would then clean the results: in the first row select ExtractRows with tabs (table) or Rows with course codes.

  3. The courses left are those not on the MInf DPT. There are quite a lot left, so we get rid of the bulk of those we know shouldn’t be on the MInf by typing or copy-pasting the following into Box C:

    EPCC
    Distance
    Thesis
    Dissertation
    Project
    Design Informatics

    Then in Box A, change the Discard criterion to “if row contains a row in”, and now choose Box C as the other box. (Alternatively you could have clicked Clear in Box B and reused it.)

    At the time of writing, what was left was:

    Course URL  EUCLID Code     Acronym     AIA     COG     FSS     ML  NS  SE  Level   Points  Year    Delivery    Exam Diet   Work%/Exam%     Lecturer(s)/Coordinator(s)
    Computer Programming Skills and Concepts    INFR08022   CP                          8   20  1   S1  December    20/80   Cristina Alexandru / Ajitha Rajan
    Decision Making in Robots and Autonomous Agents     INFR11090   DMR                             11  10  5   S2  April/May   40/60   Ram Ramamoorthy
    Informatics 1 - Cognitive Science   INFR08020   INF1-CG                             8   20  1   S2  April/May   40/60   Frank Keller / Christopher Lucas / Richard Shillcock
    Informatics Research Review     INFR11136   IRR                             11  10  5   S1      100/0   Bjoern Franke
    Introduction to Java Programming    INFR09021   IJP                             9   10  5   S1      100/0   Paul Anderson
    Introduction to Research in Data Science    INFR11138   IRDS                            11  20  5   S1      100/0   Amos Storkey
    Pervasive Parallelism   INFR11108   PERP                            11  20  5   S1      100/0   Murray Cole
    Robot Learning and Sensorimotor Control     INFR11186   RLSC                            11  10  5   S2  April/May   40/60   Michael Mistry

    CP is for outside students, and INF1-CG doesn’t need to be on the DPT. IRR, IJP, IRDS, and PERP are MSc/CDT-only courses. That leaves DMR and RLSC… maybe they should be on the DPT…

Using multiple sources in different formats

Imagine I want to find Informatics courses that are Informatics courses listed in the DRPS and on the MInf degree DPT, but not on the sortable course list. Maybe these courses should be the Informatics’ sortable list?

There are many ways to do it:

  1. Copy-paste the contents of the pages into Boxes:

    You can label them in the box banners if you might forget which is which.

  2. In the second row of Box A’s controls, select “Keep row if row has DRPS code in” then click other box and select Box B. Box A now contains only the Informatics courses that appear on the MInf DPT.

  3. Still in the second row of Box A’s controls, change Keep to Discard and select Box C, to remove courses that are in the sortable list.

    We see an initially-surprising long list of results:

    INFR09009       Computer Architecture   Not delivered this year     10
    INFR10049       Agent Based Systems (Level 10)  Not delivered this year     10
    INFR10061       Elements of Programming Languages   Not delivered this year     10
    INFR10005       Intelligent Autonomous Robotics (Level 10)  Not delivered this year     10
    INFR11069       Adaptive Learning Environments 1 (Level 11)     Not delivered this year     10
    INFR11021       Computer Graphics (Level 11)    Not delivered this year     10
    INFR11049       Computer Networking (Level 11)  Not delivered this year     10
    INFR11022       Distributed Systems (Level 11)  Not delivered this year     10
    INFR11129       Formal Verification     Not delivered this year     10
    INFR11024       Parallel Architectures (Level 11)   Not delivered this year     10
    INFR11113       Topics in Natural Language Processing   Not delivered this year     10

    However, we see that none of the courses are running. If that wasn’t obvious, we could discard these rows (type “Not delivered” into Box D and use that). Or sort by delivery mode: change Only Show in the first row of controls in Box A to Sort A-Z and then select Column 4.

Some things you can do with a class list

  1. Select and copy a class list that you have open in Euclid with Ctrl-A Ctrl-C. Click into Box A of Boxes and paste with Ctrl-V (or right-clicking and selecting paste).

  2. At the right of the first row of buttons, next to Only Show, click Column and select 5 Programme. You should get a column of text saying which degrees people taking the class are on.

  3. Click Duplicates and Counts of unique rows. You’ll get a list like this:

    17  Computer Science (MSc) (Full-time)
     5  Artificial Intelligence and Computer Science (BSc Hons)
     1  Programme
     1  Semester 2 Courses for Visiting Students MAT

    The 1 Programme entry is spurious, that was the column header…

  4. Press Ctrl-Z twice to undo the counting and column selection, and you’ll have the class list back. (Ctrl-Y redoes the changes if you don’t do anything else first.)

  5. So we won’t have to undo next time, take a copy of the class list in Box A into Box B. You can do that with standard copy-paste, or click in Box B and type A followed by Enter.

  6. In Box B I could see how students are enrolled on the course, by choosing to Only show Column 7 Course Mode of Study, then selecting DuplicatesCounts of unique rows. I might get something like:

    19  CE
     2  C
     2  E
     1  Course Mode of Study
  7. Let’s say I wanted to look at the rows for students that are taking the exam (CE or E) in Box A. Three of the possibilities:

    • You could change Only show to Sort AZ, then sort the column and look for these two blocks. But that’s unwieldy as Boxes doesn’t render a large table nicely.

    • To filter the rows, edit Box B by hand to contain only:

      CE
      E

      You can initially extract the second column using the Only show Column feature if you want. Then in Box A, “Keep row if Col 7 (Course Mode of Study) is a row in”, and choose Box B.

    • Hack: We could look for a column that ends in a capital E (you might get false positives). In Box C enter an E then Insert tab. Then in Box A you can select “Keep row if row contains a row in”, click match case (or false positives are likely), then for other box select Box C.

  8. If I wanted to email just the students taking the exam, I could now select ExtractStudent numbers as emails. If my email client needs a comma separated list, I can choose Edit then Line breaks → commas. Then copy all the email addresses with Ctrl-A Ctrl-C.

  9. Or in another box I could ExtractStudent numbers on some other list, for example a list of those who had submitted an assignment. Then in the box with those taking the exam, ExtractStudent numbers followed by Discard row if row is a row in the box with assignment student numbers. I’ll be left with those who didn’t submit the assignment, but should have, and I can check what happened.