Skin Colour Analysis
Jamie Sherrah and Shaogang Gong
The detection of skin colour in images is a very useful and
increasingly popular technique in computer vision for detecting
and tracking humans. As a visual cue, skin colour is robust and
inexpensive to compute, making it useful as an attention-focusing
mechanism for more expensive computations. It has been found that
skin colour from all ethnicities clusters tightly in
hue-saturation (HS)-space [5]. Ignoring intensity
immediately introduces some invariance to lighting conditions. In
the literature, physical models have been introduced for modelling
colour [3,2], and in
particular for human
skin [6,4]. For example, one
can model light sources as black-body radiators, the reflectance
and melanin content of skin, and the linear and non-linear camera
characteristics of the colour-calibrated camera. However, such a
model would still be incomplete because the blood content in the
skin affects its colour distribution, and the viewing geometry
would not be known in general. A simple, general system is
usually required that can operate with: (1) un-calibrated cameras,
(2) arbitrary viewing geometry and (3) unknown but
commonly-encountered illuminants.
Let us now use an example to illustrate how a skin model can be
built through sampling. To produce an empirical camera-independent
model for skin colour in HS-space, we assumed that the illuminant
is one of the commonly-encountered ``white'' illuminants, namely
daylight, and fluorescent or incandescent light sources.
Thirty-two skin colour image samples were collected from our own
cameras and the internet under reasonable lighting conditions.
Images with blue and orange light sources were discarded. The
pixels from these images were converted to Hue, Saturation and
Intensity (HSI)-space. The HS components are plotted in polar
coordinate in Figure 1. Hue being the angle
, and saturation is the angle , with red at .
Figure 1:
Skin pixels plotted in HS-space. Hue being the angle ,
and saturation is the angle , with red at . Rays
are plotted at
and
.
|
On notices easily that the skin pixels occupy only a subset of all
colours. What is not directly evident from
Figure 1 is that in fact there is no single image
in which all pixels have saturation below .
When modelling skin colour, the problem is usually thought of as a
binary classification problem. Each pixel is classified as skin
or non-skin based on it's colour components. Given
Figure 1, it would appear that the problem could
be easily solved for all commonly-encountered environments.
However, realistically this is less likely. It is inevitable that
there will be overlap in colour space between skin pixels and
background pixels. For example, the face is likely to contain
specularities that will be indistinguishable from white-ish
regions in the background. The accuracy of the classification
will depend on the scene background colour distribution. Therefore
better results are obtained by tightly modelling the skin colour
distribution in this image set rather than trying to cope
with all possible skin hues. A classifier can be trained off-line
by having a human identify skin and non-skin pixels. An example
would be to model the skin pixels in HS-space using a single 2D
Gaussian.
The classification task is complicated by changing illumination
conditions, which alter the distribution of skin colour in the image
over time. There have been successful applications of adaptive skin
colour models that track the colour distribution over time. The two
main approaches use histograms [1], and
mixtures-of-Gaussians adapted using
Expectation-Maximisation [5]. The difficulty with
these models is to identify when colour tracking has failed, as the
adaptation can occasionally conform to background colours.
To conclude, in a general setting skin colour alone will not be
sufficiently reliable to specifically identify human subjects in a
scene likely to contain skin-look-alike background. Therefore this
useful and computationally inexpensive visual cue ought to be
combined with other sources of information such as shape,
appearance and motion in order to be truly effective.
Shaogang Gong
2001-05-18