Boxes introduction and examples
The Boxes tool is for quick checks and cross checks of lists, particularly those containing student numbers or course codes.
Boxes tries not to make too many assumptions about what you need to do, or where your data is coming from, while (hopefully!) making some common tasks possible. If you regularly find yourself doing the same thing in Boxes, a tool or spreadsheet that does exactly what you want would be better.
Before looking at examples, here is an overview of what you’ll do:
- Copy text into the boxes from the clipboard with
Ctrl-V. This text can contain “junk”, as long as it contains something Boxes can identify, such as course codes, student IDs, or a tab-separated table from copying a webpage or spreadsheet.
- The first row of options lets you process/tidy the data in a single box. Examples:
- extract course codes or student numbers,
- find or tidy tab-separated tables,
- remove duplicates, or count how many times they occur.
- The second row of options lets you interact with data in another box. When you select which other box to use, the command is executed. Find differences or intersections between two lists, to help notice when items are unexpectedly missing or duplicated.
The commands in the second row let you compare rows based on any student numbers or course codes that they contain. As you’ll see below, when you use these features, you often don’t need to tidy up the data (the first row of options) at all.
- Compare two DPTs. Which courses are only on one of the DPTs?
- Find what’s not being used on a list. Check a DPT against the sortable course list.
- Using multiple sources in different formats
- Some things you can do with a class list
(These examples used 2018/19 data. To navigate between sessions in the DRPS you might like DRPS arrows.)
Compare two DPTs
Here we copy in two DPTs that share a lot of courses, and check which courses are only one of them.
Open the 2018/19 DPT for Computer Science (BSc Hons) in a new tab, select and copy the whole page with
Ctrl-C. Then click in Box A of Boxes and paste with
Ctrl-V(or right-clicking and selecting paste).
Open the 2018/19 DPT for Informatics (MInf) in a new tab, use the mouse to select the first four years, copy with
Ctrl-C, and paste into Box B of Boxes. You don’t need to copy each part separately, you can copy one block of text in one go, including intermediate “junk”.
In Box A select
Comparein the first drop-down in the second-row. To the right of that, select “Find
DRPS codesonly in one box.” Then click
other boxand select
Box A should now contain the text below. You can make a text area in Boxes bigger by dragging its bottom right-hand corner.
# Only in Box A: INFR10044 40 credits Honours Project (Informatics) INFR11102 10 credits Computational Complexity # Only in Box B: INFR08010 20 credits Informatics 2D - Reasoning and Agents Must be passed INFR10051 40 credits MInf Project (Part 1) INFR11123 10 credits Scalable Data Management Systems INFR11162 10 credits Neural Computation
Some of these differences are expected. A couple might be mistakes to be checked further.
Find what’s not being used on a list
Most of the courses that Informatics run are currently listed on the MInf degree DPT. Let’s check what’s not there. There are multiple ways to find out, here we prune down the sortable course list and look at what’s left.
Copy-paste the contents of these pages into Boxes:
In the second row of controls for Box A, select “
Col 2 (EUCLID Code)
is part of a row in”, and choose Box B as the
Aside: Boxes discarded the junk around the table, because those rows didn’t have a column 2. We could have instead said “
has DRPS code in”, but we would have retained the junk. We would then clean the results: in the first row select
Rows with tabs (table)or
Rows with course codes.
The courses left are those not on the MInf DPT. There are quite a lot left, so we get rid of the bulk of those we know shouldn’t be on the MInf by typing or copy-pasting the following into Box C:
EPCC Distance Thesis Dissertation Project Design Informatics
Then in Box A, change the
Discardcriterion to “if
contains a row in”, and now choose Box C as the
other box. (Alternatively you could have clicked
Clearin Box B and reused it.)
At the time of writing, what was left was:
Course URL EUCLID Code Acronym AIA COG FSS ML NS SE Level Points Year Delivery Exam Diet Work%/Exam% Lecturer(s)/Coordinator(s) Computer Programming Skills and Concepts INFR08022 CP 8 20 1 S1 December 20/80 Cristina Alexandru / Ajitha Rajan Decision Making in Robots and Autonomous Agents INFR11090 DMR 11 10 5 S2 April/May 40/60 Ram Ramamoorthy Informatics 1 - Cognitive Science INFR08020 INF1-CG 8 20 1 S2 April/May 40/60 Frank Keller / Christopher Lucas / Richard Shillcock Informatics Research Review INFR11136 IRR 11 10 5 S1 100/0 Bjoern Franke Introduction to Java Programming INFR09021 IJP 9 10 5 S1 100/0 Paul Anderson Introduction to Research in Data Science INFR11138 IRDS 11 20 5 S1 100/0 Amos Storkey Pervasive Parallelism INFR11108 PERP 11 20 5 S1 100/0 Murray Cole Robot Learning and Sensorimotor Control INFR11186 RLSC 11 10 5 S2 April/May 40/60 Michael Mistry
CP is for outside students, and INF1-CG doesn’t need to be on the DPT. IRR, IJP, IRDS, and PERP are MSc/CDT-only courses. That leaves DMR and RLSC… maybe they should be on the DPT…
Using multiple sources in different formats
Imagine I want to find Informatics courses that are Informatics courses listed in the DRPS and on the MInf degree DPT, but not on the sortable course list. Maybe these courses should be the Informatics’ sortable list?
There are many ways to do it:
Copy-paste the contents of the pages into Boxes:
- DRPS courses into Box A
- MInf degree DPT into Box B.
- Sortable course list (2018/19 snapshot) into Box C.
You can label them in the box banners if you might forget which is which.
In the second row of Box A’s controls, select “
has DRPS code in” then click
other boxand select Box B. Box A now contains only the Informatics courses that appear on the MInf DPT.
Still in the second row of Box A’s controls, change
Discardand select Box C, to remove courses that are in the sortable list.
We see an initially-surprising long list of results:
INFR09009 Computer Architecture Not delivered this year 10 INFR10049 Agent Based Systems (Level 10) Not delivered this year 10 INFR10061 Elements of Programming Languages Not delivered this year 10 INFR10005 Intelligent Autonomous Robotics (Level 10) Not delivered this year 10 INFR11069 Adaptive Learning Environments 1 (Level 11) Not delivered this year 10 INFR11021 Computer Graphics (Level 11) Not delivered this year 10 INFR11049 Computer Networking (Level 11) Not delivered this year 10 INFR11022 Distributed Systems (Level 11) Not delivered this year 10 INFR11129 Formal Verification Not delivered this year 10 INFR11024 Parallel Architectures (Level 11) Not delivered this year 10 INFR11113 Topics in Natural Language Processing Not delivered this year 10
However, we see that none of the courses are running. If that wasn’t obvious, we could discard these rows (type “Not delivered” into Box D and use that). Or sort by delivery mode: change
Only Showin the first row of controls in Box A to
Sort A-Zand then select
Some things you can do with a class list
Select and copy a class list that you have open in Euclid with
Ctrl-C. Click into Box A of Boxes and paste with
Ctrl-V(or right-clicking and selecting paste).
At the right of the first row of buttons, next to
Only Show, click
5 Programme. You should get a column of text saying which degrees people taking the class are on.
Counts of unique rows. You’ll get a list like this:
17 Computer Science (MSc) (Full-time) 5 Artificial Intelligence and Computer Science (BSc Hons) 1 Programme 1 Semester 2 Courses for Visiting Students MAT
1 Programmeentry is spurious, that was the column header…
Ctrl-Ztwice to undo the counting and column selection, and you’ll have the class list back. (
Ctrl-Yredoes the changes if you don’t do anything else first.)
So we won’t have to undo next time, take a copy of the class list in Box A into Box B. You can do that with standard copy-paste, or click in Box B and type
In Box B I could see how students are enrolled on the course, by choosing to
7 Course Mode of Study, then selecting
Counts of unique rows. I might get something like:
19 CE 2 C 2 E 1 Course Mode of Study
Let’s say I wanted to look at the rows for students that are taking the exam (
E) in Box A. Three of the possibilities:
You could change
Sort AZ, then sort the column and look for these two blocks. But that’s unwieldy as Boxes doesn’t render a large table nicely.
To filter the rows, edit Box B by hand to contain only:
You can initially extract the second column using the
Columnfeature if you want. Then in Box A, “
Col 7 (Course Mode of Study)
is a row in”, and choose Box B.
Hack: We could look for a column that ends in a capital E (you might get false positives). In Box C enter an
Insert tab. Then in Box A you can select “
contains a row in”, click
match case(or false positives are likely), then for
other boxselect Box C.
If I wanted to email just the students taking the exam, I could now select
Student numbers as emails. If my email client needs a comma separated list, I can choose
Line breaks → commas. Then copy all the email addresses with
Or in another box I could
Student numberson some other list, for example a list of those who had submitted an assignment. Then in the box with those taking the exam,
Student numbersfollowed by
Discard row if row is a row inthe box with assignment student numbers. I’ll be left with those who didn’t submit the assignment, but should have, and I can check what happened.