Lab 4: Solutions

These solutions available in an html version or a pdf version.

Pointwise mutual information

PMI using counts is:

PMI(x, y) = log2(NC(x, y))/(C(x)C(y))

which can be derived from the fact that the MLE estimates are P(x) = C(x)/N, P(y) = C(y)/N, P(x,y) = C(x,y)/N.

Therefore:

PMI(x, y) = log2(12⋅2)/(6⋅4) = log21 = 0
PMI(x, z) = log2(12⋅1)/(6⋅3) = log2(2)/(3) < 0
PMI(y, z) = log2(12⋅2)/(4⋅3) = log22 = 1

Examining and running the code

Do Twitter users like Justin Bieber?

See lab4-sol.py for code. The PMI values for love and hate are 1.54 and 0.36 respectively, suggesting that Twitter users who mention Justin Bieber tend to like him, although the positive (but weak) PMI with hate suggests that at least some users feel negatively towards him. These scores are actually a bit unstable in this dataset if you start adding other positive/negative words, though in a larger dataset we found that overall Justin Bieber had a much higher postive than negative sentiment score.

Investigating other words