 
The suggested syllabi is a result of surveying a broad selection of
courses related to various aspects of machine learning from 
a number of different universities worldwide. Three courses with different lengths 
(short, medium and long) at two levels (undergraduate and postgraduate) 
are 
suggested. Although proposing a general syllabus for a machine learning course to cover various topics, which may needs different 
prerequisites, is nontrivial, hopefully the presented modules will 
help academics to prepare suitable course content more easily and 
faster.
| Overview Courses | Postgraduate Course | |||||||
| Short | Medium | Long | Short | Medium | Long | |||
| Total Hours | 10 hrs | 20 hrs | 40 hrs | 10 hrs | 20 hrs | 40 hrs | ||
| Fundamentals | 1 | 2 | 4 | 1 | 2 | 2.5 | ||
| Introduction/Motivation | 0.25 | 0.5 | 1 | 0.25 | 0.5 | 0.5 | ||
| Basic Probability Theory | 0.25 | 0.25 | 0.75 | 0 | 0 | 0 | ||
| Basic Linear Algebra | 0 | 0.25 | 0.5 | 0 | 0 | 0 | ||
| Gaussian Distribution | 0 | 0.25 | 0.5 | 0.25 | 0.5 | 0.5 | ||
| Other Important Distributions | 0 | 0.25 | 0.5 | 0.25 | 0.5 | 0.5 | ||
| Bayesian Decision Theory | 0.5 | 0.5 | 0.5 | 0.25 | 0.25 | 0.5 | ||
| Information Theory | 0 | 0 | 0.25 | 0 | 0.25 | 0.5 | ||
| Feature Extraction | 1 | 2.5 | 3.5 | 1 | 2 | 6 | ||
| Preprocessing/Normalisation | 0.25 | 0.25 | 0.25 | 0 | 0 | 0.25 | ||
| Dimensionality Reduction | 0.75 | 1.75 | 1.75 | 1 | 2 | 2.75 | ||
| Independent Component Analysis | 0 | 0 | 0.5 | 0 | 0 | 1 | ||
| Factor Analysis | 0 | 0 | 0 | 0 | 0 | 1 | ||
| Feature Selection | 0 | 0.5 | 1 | 0 | 0 | 1 | ||
| Clustering | 1 | 2 | 3 | 1 | 1.75 | 2 | ||
| K-Means | 0.5 | 0.5 | 0.5 | 0.25 | 0.25 | 0.25 | ||
| Hierarchical Clustering | 0 | 0.5 | 0.5 | 0.25 | 0.25 | 0.25 | ||
| Spectral/Graph-based Clustering | 0 | 0 | 1 | 0 | 0.75 | 1 | ||
| Gaussian Mixture Models | 0.5 | 1 | 1 | 0.5 | 0.5 | 0.5 | ||
| Nonparametric Density Estimation | 1 | 1.5 | 1.5 | 0.5 | 1.5 | 1.5 | ||
| Histograms | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | ||
| Kernel Density Estimation/Parzen Windows | 0.25 | 0.5 | 0.5 | 0.25 | 0.25 | 0.25 | ||
| Nearest Neighbour Density Estimation | 0.5 | 0.75 | 0.75 | 0.25 | 0.25 | 0.25 | ||
| Bayesian Nonparametric Methods | 0 | 0 | 0 | 0 | 1 | 1 | ||
| Regression | 1 | 2 | 2 | 1 | 1 | 2 | ||
| Linear Regression | 1 | 1 | 1 | 0.5 | 0.5 | 0.5 | ||
| Linearly Weighted Basis Functions | 0 | 0.5 | 0.5 | 0.25 | 0.25 | 0.25 | ||
| Kernel Regression | 0 | 0.5 | 0.5 | 0.25 | 0.25 | 0.25 | ||
| Gaussian Processes | 0 | 0 | 0 | 0 | 0 | 1 | ||
| Classifiers | 3 | 6 | 10 | 3.25 | 5.25 | 8 | ||
| Linear Discriminants | 1 | 1.5 | 2 | 1 | 1 | 1 | ||
| Logistic Regression | 0 | 0 | 1 | 1 | 1 | 1 | ||
| Support Vector Machines, Kernel Methods | 0 | 1 | 2 | 1 | 1 | 2 | ||
| Neural Networks (MLP, RBF) | 0 | 1 | 2 | 0 | 1 | 2 | ||
| Decision Trees | 1 | 1 | 1 | 0 | 0.5 | 1 | ||
| Naïve Bayes | 0.5 | 1 | 1 | 0 | 0.5 | 0.5 | ||
| Nearest Neighbour Classification | 0.5 | 0.5 | 1 | 0.25 | 0.25 | 0.5 | ||
| Parameter Estimation | 1.5 | 3 | 4 | 1.5 | 1.5 | 4 | ||
| Maximum Likelihood | 0.5 | 1 | 1 | 0.25 | 0.25 | 0.5 | ||
| Maximum A Posteriori | 0 | 1 | 1 | 0.75 | 0.75 | 1 | ||
| Expectation Maximisation | 1 | 1 | 1 | 0.5 | 0.5 | 0.5 | ||
| Sampling Methods | 0 | 0 | 1 | 0 | 0 | 1 | ||
| Variational Methods | 0 | 0 | 0 | 0 | 0 | 1 | ||
| Model Selection | 0.5 | 1 | 3 | 0.75 | 1 | 1.5 | ||
| Overfitting, Train-vs-Test Error | 0.25 | 0.25 | 0.5 | 0.25 | 0 | 0 | ||
| Bias-vs-Variance Dilemma | 0 | 0.5 | 1 | 0.25 | 0.5 | 0.5 | ||
| Regularization, Bayesian Model Selection | 0 | 0 | 1 | 0.25 | 0.5 | 1 | ||
| Cross Validation | 0.25 | 0.25 | 0.5 | 0 | 0 | 0 | ||
| Classifier Combination | 0 | 0 | 2 | 0 | 1 | 1 | ||
| Boosting | 0 | 0 | 1.5 | 0 | 0.25 | 0.25 | ||
| Bagging/Bootstrap | 0 | 0 | 0.5 | 0 | 0.75 | 0.75 | ||
| Other Combination Techniques | 0 | 0 | 0 | 0 | 0 | 0 | ||
| Graphical Models | 0 | 0 | 3 | 0 | 1 | 4 | ||
| Bayesian Belief Networks | 0 | 0 | 1 | 0 | 0.5 | 1 | ||
| Parameter Estimation | 0 | 0 | 1 | 0 | 0 | 0.5 | ||
| Markov Random Fields | 0 | 0 | 0 | 0 | 0 | 1 | ||
| Inference in Graphical Models | 0 | 0 | 1 | 0 | 0.5 | 1 | ||
| Structure Learning | 0 | 0 | 0 | 0 | 0 | 0.5 | ||
| Sequence Models | 0 | 0 | 2 | 0 | 1 | 3.5 | ||
| Markov Chains | 0 | 0 | 1 | 0 | 0.25 | 0.5 | ||
| Hidden Markov Models | 0 | 0 | 1 | 0 | 0.75 | 2 | ||
| Linear Dynamical Systems | 0 | 0 | 0 | 0 | 0 | 1 | ||
| Theoretical Concepts | 0 | 0 | 1 | 0 | 1 | 2 | ||
| PAC Learning | 0 | 0 | 0.5 | 0 | 0.75 | 0.75 | ||
| VC Dimension | 0 | 0 | 0.5 | 0 | 0.25 | 0.25 | ||
| Computational Learning Theory | 0 | 0 | 0 | 0 | 0 | 1 | ||
| Other Types of Learning | 0 | 0 | 1 | 0 | 0 | 2 | ||
| Reinforcement Learning | 0 | 0 | 1 | 0 | 0 | 1 | ||
| Semi-supervised Learning | 0 | 0 | 0 | 0 | 0 | 0.5 | ||
| Active Learning | 0 | 0 | 0 | 0 | 0 | 0.5 | ||
 © 2010 Robert Fisher
 
© 2010 Robert Fisher