HMRC Masterclass
Natural Language Processing
Advance Preparation

ILCC, School of Informatics
University of Edinburgh

University of Edinburgh logo
$Id: prepare.html,v 1.21 2017/09/12 07:54:22 ht Exp $

1. Introduction

The upcoming NLP Masterclass will involve regular lab sessions in which you will learn the use of a range of NLP processing techniques and tools. This document tells you how to prepare your laptop or personal computer for these lab sessions by installing the necessary software and resources, and points you to some learning materials you should work through before arrival.

We recommend reading over the next section once before you start. The complete install process takes around 15 minutes assuming a reasonably good network connection (home broadband should be OK, office may be better). Please do this before the first day of the course, so we have time to help you if you have difficulties.

Note that even if you already have Python installed, we recommend that you follow the process below for the Masterclass, to avoid the possibility of hard-to-detect dependency and versioning problems between what you have pre-installed and what we expect.

Finally, unless you are already quite experienced with Python and Jupyter, as well as having some experience with natural language data handling and the mathematics that goes with it, please do go on to use you new installation to work through the first notebook and review the introductory materials linked from it, as summarised in the final section below.

2. Step by step

These instructions should work on either a Windows PC (tested with Windows 10) or a Mac (tested with OS X). In a few places the process differs slightly depending on which—these are differentiated below by a [W] or an [M] respectively. Linux users will want to use their own distro's package manager to download/install, but should follow the Mac instructions for activating the 'class' environment.

Following these instructions involves entering commands in a terminal window, what Windows calls a "Command Prompt" and Mac OS X a "Terminal", available via the Launchpad. If you're not familiar with doing this, in particular with using the cd and mkdir commands, please pause and have a quick look at my quick introduction to directories and paths.

If you are refreshing an existing installation, the relevant steps are marked with an asterisk, but be sure to follow the specific instructions in the email asking you to do the refresh.

2.1. Install Miniconda

Skip this step if you already have a Python 3.6 Anaconda (or Miniconda) installed.

  1. Download the Python 3.6 version, 64- or 32-bit and OS as appropriate, from https://conda.io/miniconda.html and install it:
    • [W] Double-click the downloaded installer as usual. When offered,
      • Untick Learn more about Anaconda Cloud
      • Untick Learn more about Anaconda Support
    • [M] Launch a Terminal via the Launchpad, then
      $ cd Downloads
      $ bash Miniconda3-latest-MacOSX-X86_64.sh
      Skim through the license as required, and accept it by typing yes when asked, then accept the default location for the install and the path update as offered. Finally close the Terminal and open a new one.
    • Note: Here and below whenever you are asked to type in a terminal, the > or $␣ is not meant to be part of what you type: it's just meant as a representation of the terminal prompt, which usually ends with an angle-bracket on Windows and a dollar-sign and a space on a Mac.

2.2. *Confirm install and bring conda up-to-date

  1. [W] Launch the Anaconda prompt window:
     Start > Programs > Anaconda3 > Anaconda Prompt
    ([M] You should just stay in the Terminal window you were in at the end of the previous step)
  2. Update conda:
    [W] >conda update conda
        >conda list
    [M] $ conda update conda
        $ conda list
  3. [M] Close your terminal window and re-open it to get the new conda version

You should see that conda is now at version 4.3.23.

2.3. Create an environment for this master class

  1. Call it 'class':
    [W] >conda create -n class python=3.6
    [M] $ conda create -n class python=3.6
  2. You should see a list of things that will be in the new environment, beginning with pip and python 3.6.[something], along with 3 or 4 others—confirm and wait for the environment to be built, this takes a minute or so.

2.4. *Move into that environment

You will need to do this any time you launch a new Anaconda Prompt/Terminal

  1. [W] >activate class
    [M] $ source activate class

You will see (class) at the left margin, to let you know you're in the right environment.

2.5. *Install more packages

  1. We need a number of additional packages:
    [W] >conda install numpy scipy matplotlib nltk jupyter pandas seaborn tqdm scikit-learn
        >conda install -c conda-forge keras tensorflow
    [M] $ conda install numpy scipy matplotlib nltk jupyter pandas seaborn tqdm scikit-learn
        $ conda install -c conda-forge keras tensorflow
  2. Confirm the long list of packages and wait a fair while after the feedback shows jupyter-1.0... has been downloaded: 4–5 minutes at least.
  3. [W] Respond with a 'Yes' when (if) asked if you want to allow pythonw to make changes to your system (3 times).

2.6. *Clear away the downloads

  1. The downloaded zip files take a lot of space, so we'll get rid of them:
    [W] >conda clean -t
    [M] $ conda clean -t

2.7. Download the course material

2.7.1. Install mercurial (maybe)

  1. If you don't already have mercurial installed (try hg --version in a terminal window if you're not sure), download and then install the appropriate version for your platform from https://www.mercurial-scm.org/downloads. For Windows you'll want one of the "Mercurial 4.2.2 Inno Setup installer"s, either 'x64' for 64-bit Windows, 'x86' for 32-bit, where for MacOS it's "Mercurial 4.3-rc for MacOS X 10.12".
    • [W] You can untick "View ReadMe.html", but leave "Add the installation path to the search path" ticked. When the install finishes, exit your Anaconda Prompt and relaunch it, to get the search path update, and reactivate the class environment.
  2. [M] Install in the normal way.

2.7.2. Use mercurial to install the courseware

  1. Change directory to wherever you want the course materials to be installed. Best to choose a place with no spaces anywhere in its path. If you're not sure where to do this, I recommend:
    • [W] >cd c:\Users\[you]\Documents
          >mkdir HMRCourse
          >cd HMRCourse
    • [M] $ cd Documents
          $ mkdir HMRCourse
          $ cd HMRCourse
  2. Install the courseware and move into the new class directory:
    [W] >hg clone http://homepages.inf.ed.ac.uk/ht/nlp/hg -b default class
        >cd class
    [M] $ hg clone http://homepages.inf.ed.ac.uk/ht/nlp/hg -b default class
        $ cd class

2.7.3. *Do a bit of environment setup

As the name suggests, you only need to do this once:

  1. [W] >runMeOnce.bat
    [M] $ source runMeOnce.sh

2.8. *Try it out

  1. A final check to see that we have everything:
    [W] >jupyter notebook
    [M] $ jupyter notebook
  2. The above should open a browser window, if it does not, go back to the terminal and follow the instructions printed there:
    • Copy/paste this URL into your browser when you connect for the first time, to login with a token:
    • http://localhost:8888/?token=************
  3. Open the notebooks folder, launch the 01_Introduction.ipynb notebook there.
  4. Try running the code in the first code cell (click in the cell and type <ctrl>+<enter>).

3. Things to do before the class

When you have time, you should work through the first lab's notebook (01_Introduction.ipynb), and as necessary given your background, follow the relevant links therein to the various background resources we've provided or shared, which are summarised below, in the recommended order for reading: