The suggested syllabi is a result of surveying a broad selection of
courses related to various aspects of machine learning from
a number of different universities worldwide. Three courses with different lengths
(short, medium and long) at two levels (undergraduate and postgraduate)
are
suggested. Although proposing a general syllabus for a machine learning course to cover various topics, which may needs different
prerequisites, is nontrivial, hopefully the presented modules will
help academics to prepare suitable course content more easily and
faster.
Overview Courses | Postgraduate Course | |||||||
Short | Medium | Long | Short | Medium | Long | |||
Total Hours | 10 hrs | 20 hrs | 40 hrs | 10 hrs | 20 hrs | 40 hrs | ||
Fundamentals | 1 | 2 | 4 | 1 | 2 | 2.5 | ||
Introduction/Motivation | 0.25 | 0.5 | 1 | 0.25 | 0.5 | 0.5 | ||
Basic Probability Theory | 0.25 | 0.25 | 0.75 | 0 | 0 | 0 | ||
Basic Linear Algebra | 0 | 0.25 | 0.5 | 0 | 0 | 0 | ||
Gaussian Distribution | 0 | 0.25 | 0.5 | 0.25 | 0.5 | 0.5 | ||
Other Important Distributions | 0 | 0.25 | 0.5 | 0.25 | 0.5 | 0.5 | ||
Bayesian Decision Theory | 0.5 | 0.5 | 0.5 | 0.25 | 0.25 | 0.5 | ||
Information Theory | 0 | 0 | 0.25 | 0 | 0.25 | 0.5 | ||
Feature Extraction | 1 | 2.5 | 3.5 | 1 | 2 | 6 | ||
Preprocessing/Normalisation | 0.25 | 0.25 | 0.25 | 0 | 0 | 0.25 | ||
Dimensionality Reduction | 0.75 | 1.75 | 1.75 | 1 | 2 | 2.75 | ||
Independent Component Analysis | 0 | 0 | 0.5 | 0 | 0 | 1 | ||
Factor Analysis | 0 | 0 | 0 | 0 | 0 | 1 | ||
Feature Selection | 0 | 0.5 | 1 | 0 | 0 | 1 | ||
Clustering | 1 | 2 | 3 | 1 | 1.75 | 2 | ||
K-Means | 0.5 | 0.5 | 0.5 | 0.25 | 0.25 | 0.25 | ||
Hierarchical Clustering | 0 | 0.5 | 0.5 | 0.25 | 0.25 | 0.25 | ||
Spectral/Graph-based Clustering | 0 | 0 | 1 | 0 | 0.75 | 1 | ||
Gaussian Mixture Models | 0.5 | 1 | 1 | 0.5 | 0.5 | 0.5 | ||
Nonparametric Density Estimation | 1 | 1.5 | 1.5 | 0.5 | 1.5 | 1.5 | ||
Histograms | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | ||
Kernel Density Estimation/Parzen Windows | 0.25 | 0.5 | 0.5 | 0.25 | 0.25 | 0.25 | ||
Nearest Neighbour Density Estimation | 0.5 | 0.75 | 0.75 | 0.25 | 0.25 | 0.25 | ||
Bayesian Nonparametric Methods | 0 | 0 | 0 | 0 | 1 | 1 | ||
Regression | 1 | 2 | 2 | 1 | 1 | 2 | ||
Linear Regression | 1 | 1 | 1 | 0.5 | 0.5 | 0.5 | ||
Linearly Weighted Basis Functions | 0 | 0.5 | 0.5 | 0.25 | 0.25 | 0.25 | ||
Kernel Regression | 0 | 0.5 | 0.5 | 0.25 | 0.25 | 0.25 | ||
Gaussian Processes | 0 | 0 | 0 | 0 | 0 | 1 | ||
Classifiers | 3 | 6 | 10 | 3.25 | 5.25 | 8 | ||
Linear Discriminants | 1 | 1.5 | 2 | 1 | 1 | 1 | ||
Logistic Regression | 0 | 0 | 1 | 1 | 1 | 1 | ||
Support Vector Machines, Kernel Methods | 0 | 1 | 2 | 1 | 1 | 2 | ||
Neural Networks (MLP, RBF) | 0 | 1 | 2 | 0 | 1 | 2 | ||
Decision Trees | 1 | 1 | 1 | 0 | 0.5 | 1 | ||
Naïve Bayes | 0.5 | 1 | 1 | 0 | 0.5 | 0.5 | ||
Nearest Neighbour Classification | 0.5 | 0.5 | 1 | 0.25 | 0.25 | 0.5 | ||
Parameter Estimation | 1.5 | 3 | 4 | 1.5 | 1.5 | 4 | ||
Maximum Likelihood | 0.5 | 1 | 1 | 0.25 | 0.25 | 0.5 | ||
Maximum A Posteriori | 0 | 1 | 1 | 0.75 | 0.75 | 1 | ||
Expectation Maximisation | 1 | 1 | 1 | 0.5 | 0.5 | 0.5 | ||
Sampling Methods | 0 | 0 | 1 | 0 | 0 | 1 | ||
Variational Methods | 0 | 0 | 0 | 0 | 0 | 1 | ||
Model Selection | 0.5 | 1 | 3 | 0.75 | 1 | 1.5 | ||
Overfitting, Train-vs-Test Error | 0.25 | 0.25 | 0.5 | 0.25 | 0 | 0 | ||
Bias-vs-Variance Dilemma | 0 | 0.5 | 1 | 0.25 | 0.5 | 0.5 | ||
Regularization, Bayesian Model Selection | 0 | 0 | 1 | 0.25 | 0.5 | 1 | ||
Cross Validation | 0.25 | 0.25 | 0.5 | 0 | 0 | 0 | ||
Classifier Combination | 0 | 0 | 2 | 0 | 1 | 1 | ||
Boosting | 0 | 0 | 1.5 | 0 | 0.25 | 0.25 | ||
Bagging/Bootstrap | 0 | 0 | 0.5 | 0 | 0.75 | 0.75 | ||
Other Combination Techniques | 0 | 0 | 0 | 0 | 0 | 0 | ||
Graphical Models | 0 | 0 | 3 | 0 | 1 | 4 | ||
Bayesian Belief Networks | 0 | 0 | 1 | 0 | 0.5 | 1 | ||
Parameter Estimation | 0 | 0 | 1 | 0 | 0 | 0.5 | ||
Markov Random Fields | 0 | 0 | 0 | 0 | 0 | 1 | ||
Inference in Graphical Models | 0 | 0 | 1 | 0 | 0.5 | 1 | ||
Structure Learning | 0 | 0 | 0 | 0 | 0 | 0.5 | ||
Sequence Models | 0 | 0 | 2 | 0 | 1 | 3.5 | ||
Markov Chains | 0 | 0 | 1 | 0 | 0.25 | 0.5 | ||
Hidden Markov Models | 0 | 0 | 1 | 0 | 0.75 | 2 | ||
Linear Dynamical Systems | 0 | 0 | 0 | 0 | 0 | 1 | ||
Theoretical Concepts | 0 | 0 | 1 | 0 | 1 | 2 | ||
PAC Learning | 0 | 0 | 0.5 | 0 | 0.75 | 0.75 | ||
VC Dimension | 0 | 0 | 0.5 | 0 | 0.25 | 0.25 | ||
Computational Learning Theory | 0 | 0 | 0 | 0 | 0 | 1 | ||
Other Types of Learning | 0 | 0 | 1 | 0 | 0 | 2 | ||
Reinforcement Learning | 0 | 0 | 1 | 0 | 0 | 1 | ||
Semi-supervised Learning | 0 | 0 | 0 | 0 | 0 | 0.5 | ||
Active Learning | 0 | 0 | 0 | 0 | 0 | 0.5 |