CAVIAR Hidden Semi-Markov Model Behaviour Recognition


Sequential behaviour recognition commonly uses a Hidden Markov Model (HMM) but this has an exponential distribution implicit in its state transition model. We replaced this distribution with an empirical time-in-state distribution (HSMM: Hidden Semi-Markovian Model). The commonly used algorithms for the HSMM model are O(T^2) meaning that continuous video is computationally infeasible. We located an O(T) algorithm from gene-sequence analysis and adapted it for video-sequence use.

We represented behaviour in a 4 level scheme, with movement and roles infered based on image evidence and instantaneous 'situation' and long-term 'context' descriptions represented in a graph representation. We developed a rule-based symbolic 'parsing' of the video sequence and a HSMM recognizer. The former is simpler and but the latter can cope better with softer (probabilisitic) evidence. We then compared the two algorithms, recognizing behaviour using the ground-truth tracking, IST feature descriptions and UEDIN role hypothesizing, over 7 context models, 80 sequences and 417 tracked persons. The rule based recognizer achieved 57% and the HSMM recognizer achieved 65% correct recognition of the contexts.

We investigated whether an algorithm based on hard categorial decisions and hand-crafted decision rules would have better or worse recognition results than the HSMM probabilisitic recognizer. We compared the rule-based 'parser' (which allowed some erroneous single frame movement and role classification errors) to the HSMM algorithm, which allowed marginal evidence to be used at a lower probability. In the following, the data is from all ground truth sequences, using the ground truth short term short-term activity and role classifications. True class labels are at the table left edge. The Context label abbreviations are

  1. CW: Walking
  2. CB: Browsing
  3. CI: Immobile
  4. CEn: Shop Enter
  5. CEx: Shop Exit
  6. CR: Shop Re-enter
  7. CWi: Windowshop
  8. CErr: Unrecognizable

Rule-based context recognizer on ground truth

This classifier used hand-tuned rule and procedural based matching algorithms (e.g. like a parser allowing erroneous states) that matched the different context model graphs to the sequence of situations for each video. The sequence of situations were derived from the combination of role and short-term activity labels. Overall 70% of the situations (individual frames) were correctly classified and 57% of the behaviours (context models) were correctly recognised.
.CWCBCICEnCExCRCWiCErrTot%
CW6347011033366114958..55257467185
CB65615934.1188...87802655860
CI5512257518768..232.27042979163
CEn1048..16785.6011.23842622864
CEx371...10603..268953786928
CR.......248824880
CWi6710139..3766..8601225730
Total........22017857

HSMM context recogniser on ground truth

This classifier used the HSMM matching algorithm to matched the different context model graphs to the sequence of situations for each video. The sequence of situations were derived from the combination of role and short-term activity labels. Overall 74% of the situations (individual frames) were correctly classified and 65% of the behaviours (context models) were correctly recognised. This was slightly better overall than the rule-based approach.
.CWCBCICEnCExCRCWiCErrTot%
CW6571011034099368...33917467188
CB65614872.....110302655856
CI19122421747....76292979173
CEn1049..16261..1789182622862
CEx371...21206..162923786956
CR.....528.1891248820
CWi.9565....2934100742257313
Total........22017865

A paper that describes the algorithm is:

  1. D. Tweed, R. Fisher, J. Bins, T. List, "Efficient Hidden Semi-Markov Model Inference for Structured Video Sequences", Proc. 2nd Joint IEEE Int. Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, (VS-PETS), pp 247-254, Beijing, Oct 2005.

Back to CAVIAR home page.