[an error occurred while processing this directive]
There will be two assignments, the first around week 5, and the second around week 9. They will appear here. Each assignment is worth 10% of the total marks on the course. If you are only auditing the course (ie you do not intend to take the exam) feel free to attempt the coursework. However, it will not be marked.
Please note that plagiarism (unattributed usage of other peoples work) is taken very seriously by the university. To avoid possibly severe punishment, I would advise that you never show any relevant assignment material to your fellow students. See here for the official guidelines.
The second assignment has now been handed out and is available below. You will need the dataset assess2.mat, and the matlab files also provided below. The submission instructions are given on the sheet.
Assignment 2
assess2.mat
assess2rs.mat
eyehist.m
eyescatter.m
whiten.m
fun1.m
fun2.m
zo_err.m
reject.m
Support have indicated there are problems with the MATLAB path in some circumstances as it does not include the NETLAB toolbox. This should be fixed, but if you are still having problems such as
> ??? Undefined function or variable 'mlp'.
please type addpath('/opt/matlab-6.5/toolbox/local/netlab/') into matlab before starting. If you are not working on the Informatics system you will also need to download netlab.
The answers to assignment one are now available (password needed - given in lecture). Run answer.m to get them. You will need distmat.m and gaussprob.m as well as the original data file below.
The marked and returned assignments should be available from the ITO from Tues 30 Nov AT THE LATEST.
The first assignment has now been handed out and is available below. You will need the dataset assignmentone.mat also linked to below. Instructions for submission are given below.
Assignment 1
assignementone.mat
To submit your project, place answers.txt and all the MATLAB files needed to produce your results, (except assignmentone.mat) in a directory subdir. Then from a DICE directory that contains subdir as a directory, type
submit msc lfd-5 1 subdir
if you are an MSc student, or
submit ai4 lfd-4 1 subdir
if you are an AI4 student.
For fairness, any answers I give to emails regarding the assignments are posted here for reference.
In calculating covariances, please use the form provided by the matlab function cov throughout, even though this is not strictly the maximum likelihood estimate (it normalises by N-1, not N).
ERRATUM: In question 5, the question should read "classify xtest" not "classify ttest2". Likewise in question 6 it should strictly say "classify xtest" not classify "xtest, ttest2". The important thing is that the classification is into the three classes (1.1, 1.2 and 2) such as those contained in the second label sets ttrain2 and ttest2.
CLARIFICATION: Question 5 involves using the model developed in question 4. Hence you will need to reduce xtest to the two principal components in order to use the model.
QUESTION: In question 6, we are supposed to classify xtest using 5 nearest neighbours. To do this we need a training data set with the true labels. I am assuming that the training data set is xtrain, and the true labels are in ttest2?
ANSWER: Question 5 says you are testing the model you learnt earlier (in the previous question). Hence the training data is the (xtrain, ttrain2) of the previous question. I.e. ttrain2 are the correct classes for the data in xtrain. ttest2 contains the true classifications of the data in xtest. This allows a comparison of the model results with the true results in the second half of question 5. Likewise question 6 uses the original training data (xtrain, ttrain2) to train k-nn, and tests the knn model on xtest. Once again ttest2 contains the true classifications of the data in xtest.
QUESTION: What MATLAB commands can I use to isolate the individual elements corresponding to particular classes?
ANSWER: Do a help on the matlab command "find". a=find(b==1) will produce an index list a of the positions in vector b which contain 1. Then c(a,:) is a matrix created from c by only including the rows for which b=1. Also for visualisation you will find the plot and hist functions helpful. Do a help on each of these for more information.
QUESTION: Using the 5 nearest neighbours, we finally come up with a classification of the 5-nearest points. If my thinking is correct, trying to classify a point in one of the 3 classes (1.1 , 1.2, and 2) could come to a point where the nearest neighbours have the following combination: (2, 2, 1.1, 1.1, 1.2) . What should be the decision on that? How should we classify the point? Should I omit the case?
ANSWER: People use various methods for undecided cases in KNN, and you will not be penalised whatever you do. I suggest excluding the undecided cases in the false positive count for class 1.1 in question 6, and then randomly allocating them to one of the appropriate classes in the second part of that question. But whatever you choose to do in the case of having an ambiguous classification will not affect your marks.