Once moving objects in videos have been detected and tracked, we can do a variety of things. Here we look at 2 issues: 1) How to connect together sections of tracked people and cars that have been fragmented into separate sections by occlusions, shadows, etc. 2) Recognising the current immediate action that a person is making, such as a forehand tennis stroke.
We introduce the two example applications.
A probabilistic method of generating multiple hypotheses about the identity of an object is introduced, where object colour is the image evidence. new hypotheses are created each time a trajectory is broken. A Bayesian network attempts to combine tracking and colour evidence to resolve the identities of each of the targets.
An example of using the Bayesian network to link trajectories involving people and cars.
We introduce an example application where short-term actions, eg. a single tennis stroke, are identified using medium field image evidence (where there are enough pixels to recognise the tracked object is a person, but not enough to fit a 3D body model, eg. 30 pixels high).
To recognise the short-term actions, we introduce optical flow descriptors. In order to make the OF descriptors be relative to the action, rather than to the global motion of the person, we introduce the idea of stabilising the person.
With a set of motion descriptors for observed actions, we can recognise the current action by comparing the descriptors to those stored in a database. The was successfully applied to about 25 action primitives from ballet, tennis and football/soccer, as evidenced by a confusion matrix.