Boxes introduction and examples

The hacky Boxes tool is for quick checks and cross checks of lists, particularly those containing student numbers or course codes.

It’s trying not to make too many assumptions about what you need to do, or where your data is coming from, while (hopefully!) making some common tasks possible. If you regularly find yourself doing the same thing in boxes, a tool or spreadsheet that does exactly what you want would be better.

Before looking at examples, here is an overview of what you’ll do:

  1. Copy text into the boxes from the clipboard with Ctrl-V. This text can contain “junk”, as long as it contains something boxes can identify, such as course codes, student IDs, or a tab-separated table from copying a webpage or spreadsheet.
  2. The first row of options lets you process/tidy the data in a single box. Examples:
    • extract course codes or student numbers,
    • find or tidy tab-separated tables,
    • remove duplicates, or count how many times they occur.
  3. The second row of options lets you interact with data in another box. When you select which other box to use, the command is executed. Find differences or intersections between two lists, to help notice when items are unexpectedly missing or duplicated.

Examples:

(These examples used 2018/19 data. To navigate between sessions in the DRPS you might like DRPS arrows.)

Compare two DPTs

Here we copy in two DPTs that share a lot of courses, and check which courses are only one of them.

  1. Open the 2018/19 DPT for Computer Science (BSc Hons) in a new tab, select and copy the whole page with Ctrl-A Ctrl-C. Then click in Box A of Boxes and paste with Ctrl-V (or right-clicking and selecting paste).

  2. Open the 2018/19 DPT for Informatics (MInf) in a new tab, use the mouse to select the first four years, copy with Ctrl-C, and paste into Box B of Boxes. You don’t need to copy each part separately, you can copy one block of text in one go, including intermediate “junk”.

  3. For both Boxes A and B click Extract, then Rows with tabs (table).

  4. Then in Box A select Compare in the first drop-down in the second-row. Then click other box and select Box B.

    Box A should now contain the text below. You can make a text area in Boxes bigger by dragging its bottom right-hand corner.

    # Only in Box A:
    INFR10044   40 credits  Honours Project (Informatics)
    INFR11102   10 credits  Computational Complexity
    # Only in Box B:
    INFR08010   20 credits  Informatics 2D - Reasoning and Agents   Must be passed
    INFR10051   40 credits  MInf Project (Part 1)
    INFR11123   10 credits  Scalable Data Management Systems
    INFR11162   10 credits  Neural Computation

    Some of these differences are expected. A couple might be mistakes to be checked further.

Find what’s not being used on a list

Most of the courses that Informatics run are currently listed on the MInf degree DPT. Let’s check what’s not there. There are multiple ways to find out, here we prune down the sortable course list and look at what’s left.

  1. Copy-paste the contents of these pages into boxes:
  2. In the second row of controls for Box A, select “Discard row if Col 2 (EUCLID Code) is part of a row in”, and choose Box B as the other box.

  3. The courses left are those not on the MInf DPT. There are quite a lot left, so we get rid of the bulk of those we know shouldn’t be on the MInf by typing or copy-pasting the following into Box C:

    EPCC
    Distance
    Thesis
    Dissertation
    Project
    Design Informatics

    Then in Box A, change the Discard criterion to “if row contains a row in”, and now choose Box C as the other box. (Alternatively you could have clicked Clear in Box B and reused it.)

    At the time of writing, what was left was:

    Course URL  EUCLID Code     Acronym     AIA     COG     FSS     ML  NS  SE  Level   Points  Year    Delivery    Exam Diet   Work%/Exam%     Lecturer(s)/Coordinator(s)
    Computer Programming Skills and Concepts    INFR08022   CP                          8   20  1   S1  December    20/80   Cristina Alexandru / Ajitha Rajan
    Decision Making in Robots and Autonomous Agents     INFR11090   DMR                             11  10  5   S2  April/May   40/60   Ram Ramamoorthy
    Informatics 1 - Cognitive Science   INFR08020   INF1-CG                             8   20  1   S2  April/May   40/60   Frank Keller / Christopher Lucas / Richard Shillcock
    Informatics Research Review     INFR11136   IRR                             11  10  5   S1      100/0   Bjoern Franke
    Introduction to Java Programming    INFR09021   IJP                             9   10  5   S1      100/0   Paul Anderson
    Introduction to Research in Data Science    INFR11138   IRDS                            11  20  5   S1      100/0   Amos Storkey
    Pervasive Parallelism   INFR11108   PERP                            11  20  5   S1      100/0   Murray Cole
    Robot Learning and Sensorimotor Control     INFR11186   RLSC                            11  10  5   S2  April/May   40/60   Michael Mistry

    CP is for outside students, and INF1-CG doesn’t need to be on the DPT. IRR, IJP, IRDS, and PERP are MSc/CDT-only courses. That leaves DMR and RLSC… maybe they should be on the DPT…

Using multiple sources in different formats

Imagine I want to find Informatics courses that are listed in the DRPS but not on our sortable course list that are also on the MInf degree DPT.

Again there are multiple ways to do it. While all of these sources appear as reasonable tab-separated data in boxes, that won’t always happen. In this section we throw most of the text away, boiling it down to raw course codes (a similar strategy applies with student numbers), so the data format doesn’t matter.

  1. Copy-paste the contents of these pages into boxes:

    You can label them in the box banners if you might forget which is which.

  2. In Boxes A and B select ExtractCourse codes. Now we’ve extracted the raw course codes, it doesn’t matter what format these lists were in originally, or that they came from different places. They could have come from any other tool.

  3. In the second row of Box A’s controls, select “Discard row if row is a row in” then click other box and select Box B. Box A now contains the course codes we want to look for in the MInf list.

  4. In the second row of Box C’s controls you can now say you want to “Keep row if row contains a row in”, and select Box A.

Box C now contains a nice list of courses, with names, that should possibly be on the sortable list, because we mention them in the MInf programme, and they are our courses. But maybe they’re not running. How could we find that out?

In Box C we can then ExtractCourse codes. Then restore the table from the DRPS in Box A with Undo/Ctrl-Z in that box. Then in Box A, “Keep row if row contains a row in”, and select Box C. We can now see which courses are being delivered, and filter or sort on that basis.

Some things you can do with a class list

  1. Select and copy a class list that you have open in Euclid with Ctrl-A Ctrl-C. Click into Box A of Boxes and paste with Ctrl-V (or right-clicking and selecting paste).

  2. At the right of the first row of buttons, next to Only Show, click Column and select 5 Programme. You should get a column of text saying which degrees people taking the class are on.

  3. Click Duplicates and Counts of unique rows. You’ll get a list like this:

    17  Computer Science (MSc) (Full-time)
     5  Artificial Intelligence and Computer Science (BSc Hons)
     1  Programme
     1  Semester 2 Courses for Visiting Students MAT

    The 1 Programme entry is spurious, that was the column header…

  4. Press Ctrl-Z twice to undo the counting and column selection, and you’ll have the class list back. (Ctrl-Y redoes the changes if you don’t do anything else first.)

  5. So we won’t have to undo next time, take a copy of the class list in Box A into Box B. You can do that with standard copy-paste, or click in Box B and type A followed by Enter.

  6. In Box B I could see how students are enrolled on the course, by choosing to Only show Column 7 Course Mode of Study, then selecting DuplicatesCounts of unique rows. I might get something like:

    19  CE
     2  C
     2  E
     1  Course Mode of Study
  7. Let’s say I wanted to look at the rows for students that are taking the exam (CE or E) in Box A. Three of the possibilities:
    • You could change Only show to Sort AZ, then sort the column and look for these two blocks. But that’s unwieldy as Boxes doesn’t (currently) render a large table nicely.
    • To filter the rows, edit Box B by hand to contain only:

      CE
      E
      You can initially extract the second column using the Only show Column feature if you want. Then in Box A, “Keep row if Col 7 (Course Mode of Study) is a row in”, and choose Box B.
    • Hack: We could look for a column that ends in a capital E (you might get false positives). In Box C enter an E then Insert tab. Then in Box A you can select “Keep row if row contains a row in”, click match case (or false positives are likely), then for other box select Box C.

  8. If I wanted to email just the students taking the exam, I could now select ExtractStudent numbers as emails. If my email client needs a comma separated list, I can choose Edit then Line breaks → commas. Then copy all the email addresses with Ctrl-A Ctrl-C.

  9. Or in another box I could ExtractStudent numbers on some other list, for example a list of those who had submitted an assignment. Then in the box with those taking the exam, ExtractStudent numbers followed by Discard row if row is a row in the box with assignment student numbers. I’ll be left with those who didn’t submit the assignment, but should have, and I can check what happened.