Gaze direction determination

Jian-Gang Wang^a and Eric Sung^b

^aCentre for Signal Processing, ^bDivision of Control and Instrumentation

School of Electrical and Electronic Engineering

Nanyang Technological University

Singapore 639798

Introduction

There are two components to the line-of-sight : pose of human head and the orientation of the eyes within their sockets (eye-gaze). We found that the domain knowledge of the human face is important and essential for determining both the head pose and eye gaze utilizing only minimal robust features and under real-time requirement. Here, the domain knowledge used is not merely from facial features but from more anatomical properties. For instance, we found that the eye gaze can be estimated using the normal to the iris contour, which has an approximate fixed angle with the true gaze. Hence, we have developed two novel approaches, called the "two-circle" and “one-circle” algorithm respectively, for measuring eye gaze using monocular image that zooms in on two eyes or only one eye of a person. In addition, we make an observation that the eye-lines (connecting the two far eye corners and the two neighboring eye corners respectively) are parallel to the mouth-line (connecting the two mouth corners). This domain knowledge led us to develop a new method for determining head pose fast and robustly using the vanishing point formed by the eye-lines and mouth-line [2].

The perspective projection of a circle on image plane is an ellipse. We observed that the iris contour (not the iris) in 3D is a circle. The gaze, defined as the normal to this iris circle, can be estimated from the ellipse/circle correspondence. However, it will result in two possible solutions of the normal. Hence, we propose a "two circle" [3] or a ²one-circle² [4] algorithm to disambiguate the solutions. As for the "two-circle" method, the prior knowledge of the eye model has been utilized to disambiguate the solutions, namely that the difference between the two normal directions to the supporting planes of the two irises should be minimal, irrespective of eyeball rotations and head movements. We call this constraint the "normal direction constraint". In the "one-circle" method, the unique supporting plane was obtained based on a geometric constraint, namely that the distance between the eyeball¢s center and the two eye corners should be equal to each other. We will refer to this as the ²distance constraint².

The actual gaze direction is measured by the Kappa, which is the angle between the visual axis and the anatomical axis of the eye [5]. Because our defined gaze direction is almost a fixed relation with Kappa, it does not really matter which one is used. However, for practical purposes, our definition of eye gaze will be the one to be adopted.

In order to obtain the higher resolution image of the iris, a zoom-in “gaze” camera is used. It provides sufficient resolution to measure accurately the rotation of eyeball. A general approach that bines head pose determination with eye gaze estimation is proposed. The problem of having possible out-of-field views can be settled by guiding the gaze camera, mounted on a pan-tilt unit, by the head pose estimation results. The pose of the human head, including the 3D location of the eye corners, mouth corners and the orientation of the face, can be obtained from a second “pose” camera [2]. The 3D coordinates of the eye corners (respect to the pose camera) informs the gaze camera system on how to focus on the eye region; it could also be used as well to calculate the distances between the eyeball¢s center to the eye corners (via the ²distant constraint², see below).

Positioning of the circle

From the observed perspective projection of a circle having known radius, it is possible to infer analytically the supporting plane of the circle, as well as the center of the circle [6]. The problem has been extensively investigated and there are many papers concentrating on 3D location of circular objects [7, 8, 9].

Let ellipse Q (nonzero real symmetric matrix in the quadratic form representation) represents a projection of a circle of radius r under the normalized camera coordinate system, see Figure 1. l₁ , l₂ and l₃ be the eigenvalues of Q (l₃<0<l₁<=l₂), and u₁, u₂ and u₃, are the orthonormal unit eigenvectors for eigenvalues l₁ , l₂ and l₃, respectively. The unit surface normal to the supporting plane is given by [7]:

n=u₂sinq+u₃cosq (1)

where

(2)

(3)

Its distance is

(4)

The centers of the iris contours in space can be determined using Eq. (1), Eq. (2) and radius r.

Note that the signs of the eigenvectors u₁, u₂ and u₃are arbitrary. Since n and -n indicate the same surface orientation, the number of 3-D interpretations is as follows:

1. if , two interpretations exist.

2. If , only one interpretation exists.

The second case, when one interpretation exists, will happen only when the optical axis is parallel to the normal to the supporting plane of a circle and passes through the center of the circle.

Eye gaze determination

In our approach, iris contour is modeled as a circle having known radius. Hence, eye gaze, normal to the supporting plane of the iris, can be determined from the image of the iris contour. Considering the ambiguities of the normal that has been discussed in the last section, we developed the "two circle" and "one-circle" methods that aims to disambiguate the normal solutions.

"Two-circle" method

We apply the principle that from the image of two space circles lying on parallel planes, the resulting two ellipses can be used to deduce the unique normal to these planes. Each ellipse gives rise to two possible space circles. It has been proved that one of the normals will be common to each of the two sets of solutions while the other two remaining are spurious. This principle can be described as the following proposition. A circle in 3D and the viewpoint will define a cone.

Proposition Let Q₁ and Q₂ are the projections of two circles having same radii. There are a total of two solutions of the normals to each of the supporting planes of two circles that can come from the re-projections of the two image ellipses Q₁ and Q₂. If one of the solutions is common for the Q₁ and Q₂ since by hypothesis they came from two circles lying on parallel planes, then the other two solutions will not be the same unless (1) two circles are symmetrical about Y-Z or X-Z plane of the camera coordinate system (Figure 2(a) and Figure 2(b) respectively); (2) The axes of the two cones coincide (Figure 2(c)).

An alternative proofs of the proposition can be found in [5].

In general, the eye gazes of the two eyes meet at a point of interest, i.e. focus point, see Figure 3. Applying the proposition to our gaze determination application, we thus can disambiguate the gaze results based on the fact that the left and right iris boundaries are reasonably parallel in 3D when the focus point is at infinity. In practice though, the two “correct” normals may not be exactly equal each other due to errors and noise. Hence, we disambiguate by treating the two normals from each set that are closest to each other as the correct match. The difference of the normal to the supporting plane of the two irises should be minimal irrespective of eyeball rotations and head movement and we refer to it as the “normal constraint”.

When the view distance is great (1m for example), the gazes of the two eyes are nearly parallel because the distance between the two eyes can be neglected with respect to the focus distance. The difference between the gaze of the two eyes is far smaller than the difference of the redundant normal solutions. The "normal direction constraint" is right. There is a noticeable angle between the gaze of the two eyes when the distance is not great (0.5m for example). However, "normal direction constraint" is justified because we do not expect an abrupt change of the differences we are considering. In our experiments, we found that the difference of the redundant normal is at least three times than the difference of the gaze of the two eyes even at a near distance of focus of 0.5m.

"One-circle" method

We extend our investigation by proposing the “one-circle” algorithm to be discussed here which aims to relax the limitation that two irises are required to disambiguate the normal solutions. Consequently, the field of view of the camera can be narrowed further by focusing on only one eye. With the improvement of the iris resolution, higher precision and robustness can be expected and has been achieved. The robustness of this approach was statistically verified by extensive experiments on synthetic and real image data.

The distance between the two corners of an eye and the center of the eyeball should be equal to each other (see Fig.4):

O_sP₁ = O_sP₃ (5)

Considering an iris contour Q. Call the two solutions of the normal of Q as n₁ = (cosa₁, cosb₁, cosg₁)^T, n₂ = (cosa₂, cosb₂, cosg₂)^T and the corresponding solutions of the center of the iris contour are O_c1(x₀₁, y₀₁, z₀₁) and O_c2(x₀₂, y₀₂, z₀₂) respectively. Using the eye model defined in Figure 5, the center of the eyeball O_si can be calculated:

x_si = x_0i + dcosa_i, y_si = y_0i + dcosb_i, z_si = z_0i + dcosg_i, (6)

where d is the distance from center of the eyeball to the iris plane, we defined it as the radius of the eyeball.

Although the eyeball center cannot be seen, its location can be inferred. This is because its average 3D location relative to the observed features is very close to a generic constant and can be fixed during model acquisition [10]. In our approach, the ratio of the radius of a person’s iris and the radius of his/her eyeball in 3D space is found to possess very low ensemble variance and consequently we can fix the ratio as the generic average. The small variation from person to person of this ratio thus has no significant effect on the results. Hence, the eyeball center can be located once the radius of the iris has been calibrated.

d=cr (7)

where r is the radius of the iris contour, c is a constant, see Fig. 5.

After that, the solutions of the two eye corners are projected to the gaze camera coordinate system. The distances between the center of the eyeball and the two eye corners are compared. Due to the image noise, the unique solution of the iris plane should be the one that satisfies:

O_sP₁ » O_sP₃ (8)

In our algorithm, we calculate O_s1P₁, O_s1P₃, O_s2P₁ and O_s2P₃. If

(9)

then (n₁, O_c1) is the solution what we want, else (n₂, O_c2) is the solution.

In our experiments, we found that the algorithm based on the "distance constraint" is robust. The same unique solution can be obtained for different poses even though the ratio is varied 50% . This is because the difference of the two solutions of the gaze is significant. The angle between the two normal solutions is large. Consequently, the separation of the resulting two eyeball centers is large enough to disambiguate the solutions based on "distance constraint" even with deviations from the assumption of 50 %.

Comparison with the previous work

It is difficult to determine the eye gaze by analyzing the eyeball rotations from a typical image with low resolution for eye region [11, 12]. The iris is partially occluded by the upper and lower eyelids so it will be difficult to fit its contour consistently and reliably. For example, in [13] the field of the view of the camera is set to capture the whole face in the image, the width of an eye is only about 30 (pixels) and the radius of the iris in the image plane is only 5 (pixels) in a typical situation. Therefore, it is hard to determine the gaze in a 3D scene by using the iris information in such a typical image. Our method combined the head pose and eye gaze, the gaze camera can focus on the eye guided by the head pose determination result, hence we can determine the gaze via the image of the irises.

In most approaches [14, 15, 16, 17, 13] to extract the iris contour, the iris contours on the image plane are simplified to be circles, so the felicitous circular geometry is utilized and iris outer boundaries (limbus) are detected using a circle edge operator. For instance, the center of the iris is detected using the circular Hough Transform in [17]. In [13], the iris is located by matching the left and right curvatures of the iris (circle) candidate with those of the iris to be detected in the edge image. our method is more relasim than the existing approaches since we consider the contour of the iris as circle in 3D and hence the perspective projection of the iris contour is the ellipse.

Zelinsky et al [17, 10] presented an eye gaze estimation in which the eye corners are located using a stereo vision system. Then the eyeball position can be calculated from the pose of the head and a 3D “offset” vector from the mid-point of the corners of an eye to the center of the eye. Consequently the radius of the eyeball can be obtained. However the “offset vector”, in addition to the radius of the iris, are needed to be manually adjusted through a training sequence where the gaze point of the person is known.

Our algorithm made minimum assumption on the eye model. We do not need to know the eyeball in shape. Only the visible iris edges of the human eye are utilized in our approach. The eye gaze we defined passes through the pupil´s center and eyeball´s center.

In summary, our method differs from others in the following respects. We treat the image of the iris contour not as a circle but correctly as an ellipse. Hence, our approach is more realistic than the existing approaches. The other difference is that our method is more accurate because it can zoom in onto one eye thereby allowing a larger iris image for detection.

Experimental results

We have comprehensively investigated the performances of the proposed eye gaze determination approach.

An experiment of the gaze determination using "two-circle" method is illustrated in Fig. 6. In this example, the camera is put in front of a person (nearly fronto-parallel) and away from the person about 1 m. The person observes an object (Philips monitor, size 40 cm ´ 29 cm) that is behind-left of the camera. The distance between the object and the person is about 1.2 m. We put the object of interest in a fixed position in the world system where the gaze camera is calibrated. So we know the coordinate of the corners of the monitor with respect to the camera, we take these relative coordinates as the reference coordinate of the true focus points. Four images are captured correspond to the person observed the four corners of the monitor in order. The gaze, consequently the corners of the monitors (focus points), can be obtained from the images. Then an error can be estimated by comparing the estimated focus points and the corresponding reference coordinate. The gaze results correspondent to the subject observing left upper, right upper, left down and left down corner of the monitor are shown in Figure 7(a) to 7(d). The radius of the iris contour is about 0.63 cm. The results and errors are listed in Table 1.

We cannot expect the gaze of the two eyes to intersect at a point in space due to inaccuracy camera calibration and image processing. The shortest distance between the gaze of the two eyes, used to locate the "estimated focus points", are found to be quite small. In this experimental example, they are 0.30, 0.43, 0.01 and 0.11 in cm respectively when the person observes the corners in order. This also shows the gaze we defined in this paper is applicable to represent the attention of the human.

Table 1 "two-circle" method: Results of the gaze determination and focus points estimation

observed points	Left eye	Right eye	Focus points	Errors
left upper	C: (-3.857074, 0.074951, 74.569134) G: (-0.573778, 0.081283, 0.814968)	C: (2.659461, 0.081426, 74.714108) G: (-0.540752, 0.084195, 0.836958)	E: (76.332527, -11.337215, -39.321533) R: (80, -14.5, -45)	LE:1.00⁰ RE:1.01⁰
right upper	C: (-3.612313, 0.102581, 74.705642) G: (0.339735, 0.099605, 0.935232)	C: (2.923148, 0.106827, 74.854271) G: (-0.292030, 0.104673, 0.950664)	E: (38.745293, -12.524923, -41.832115) R: (40, -14.5, -45)	LE:0.75⁰ RE:0.69⁰
left down	C: (-3.856879, -0.096348, 74.567535) G: (-0.573595, -0.082166, 0.815008)	C: (2.659401, -0.102780, 74.712286) G: (-0.540747,-0.084421, 0.836939)	E: (76.769348, 11.460166, -39.991791) R: (80, 14.5, -45)	LE:1.02⁰ RE:1.02⁰
right down	C: (-3.612607, -0.123945, 74.712240) G: (-0.339758, -0.099915, 0.935191)	C: (2.923129, -0.128169, 74.856711) G: (-0.292573, -0.103917, 0.950580)	E: (39.290848, 12.641216, -43.343605) R: (40, 14.5, -45)	LE:0.73⁰ RE:0.64⁰

Note: C: Center of the iris contour (cm); G: Gaze; E: Estimated focus point (cm); R: Reference focus point (cm); LE: error of the gaze of the left eye; RE: error of the gaze of the right eye.

As for the "one circle" approach, an experiment is shown in Fig. 8. The gaze determination result when a subject looks at the four points of an monitor is shown in Fig 9. The results and error are listed in Table 2.

We can see the errors of point-of-regard are less than 1.5 cm within 1.5 m range, and consequently the errors of the gaze are all less than 1⁰.

Table 2 "One-circle" method: Errors of the point-of-regard and the eye gaze

Observed points	True coordinates (cm)	Estimated coordinates (cm)	Position error (cm)	Gaze error
V₁	(60, 15, -90)	(61.056, 14.992, -90)	1.056	0.40⁰
V₂	(20, 15, -90)	(19.277, 16.290, -90)	1.479	0.56⁰
V₃	(-20, 15, -90)	(19.249, 16.294, -90)	1.496	0.57⁰
V₄	(-60, 15, -90)	(-61.020, 14.991, -90)	1.025	0.39⁰

The experimental results show that the precision of the eye gaze is improved significantly in the "one-circle" algorithm. The simulations for the "one-circle" method has shown that the maximum error of the gaze due to eyelids^’ occlusion is 0.3⁰, while the maximum error of the center of the iris is 0.1 cm. However, using the "two-circle" algorithm, the maximum error of the gaze is 1⁰ while the maximum error of the center of the iris is 0.4 cm.

Conclusion

Eye gaze determination via image of iris(es) is developed. The integration of the eye gaze subsystem with the head pose estimation module together offers great potential especially in the applications mentioned earlier. Of importance to note is that our method is non-intrusive, fast and robust. It is robust because the segmentation of the iris contour is one of the simplest and most robust facial features to be extracted.

REFERENCES

[1] S. Barattelli, L. Sichelschmidt and G. Rickheit, Eye-movements as an input in human computer interaction: exploiting natural behavior, Annual Conference of IEEE Industrial Electronic Society, Vol. 4, pp. 2000-2005, 31 Aug – 4 Sept. 1998.

[2] J.G. Wang and E. Sung, Pose determination of human faces by using vanishing points, Pattern Recognition, Vol. 34, No. 12, pp. 2427-2445, 2001.

[3] J. G. Wang and E. Sung, Gaze determination via images of irises, Image and Vision Computing, Vol. 19, No. 12, pp. 891-911, 2001.

[4] J. G. Wang and E. Sung, Study on eye gaze estimation, IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics, to appear.

[5] J. G. Wang, Head pose and eye gaze determination for human machine interaction, PhD thesis, 2001, Nanyang Technological University.

[6] R. M. Haralick, L. G. Shapiro, Computer and Robot Vision, Chapter 13: Perspective projection geometry, Addison-Wesley Publishing Company, 1993.

[7] K. Kanatani, Geometric Computation for Machine Vision, Chapter 8: Analysis of Conics, Clarendon Press, 1993.

[8] D. Forsyth, J. L. Mundy, A. Zisserman, C. Coelho, A. Heller and C. Rothwell, Invariant descriptors for 3-D object recognition and pose, IEEE Transactions on PAMI, Vol. 13, No.10, pp.971-991, October 1991.

[9] H. S. Sawhney, J. Oliensis and A. R. Hanson, Description and reconstruction from image trajectories of rotational motion, Proceedings of ICCV´90, pp. 494-498, 1990.

[10] R. Newman, Y. Matsumoto, S. Rougeaux and A. Zelinsky, Real-time stereo tracking for head pose and gaze estimation, Proc. IEEE Conference On Automatic Face and Gesture Recognition, pp.499-504, 2000.

[11] A. Gee and R. Cipolla, Determining the gaze of faces in images, Image and Vision Computing, Vol. 12, No. 10, pp. 639-647, December 1994.

[12] K. Talmi and J. Liu, Eye and gaze tracking for visually controlled interactive stereoscopic displays, Signal Processing: Image communication Vol.14, pp. 799-810, 1999.

[13] K.-N. Kim and R. S. Ramakrishna, Vision-based eye-gaze tracking for human computer interface, Proceedings of IEEE International Conference on Systems, Man, and Cybernetics, 12-15, Oct. 1999, Vol. 2, pp.324-329.

[14] X. Xie, R. Sudhakar, and H. Azhang, On improving eye feature extraction using deformable templates, Pattern Recognition Vol. 17, pp.791-799, 1994.

[15] J-Y Deng and F. Lai, Region-based template deformation and masking for eye-feature extraction and description, Pattern Recognition, Vol. 30, No. 3, pp. 403-419, 1997.

[16] J. G. Daugman, High confidence visual recognition of persons by a test of statistical independence, IEEE Transactions on PAMI, Vol. 15, No. 11, pp.1148-1161. November 1993.

[17] Y. Matsumoto and A. Zelinsky, An algorithm for real-time stereo vision implementation of head pose and gaze direction measurement, Proceedings of Fourth International Conference on Automatic Face and Gesture Recognition, 2000, pp. 499-504.

Fig1.gif
<Figure 1: perspective projection of a circle>

Fig2a.gif
<Figure 2(a): two circles are symmetrical about Y-Z plane of the camera coordinate system >

Fig2b.gif
<Figure 2(a): two circles are symmetrical about X-Z plane of the camera coordinate system >

Fig2c.gif
<Figure 2(c): The axes of the two cones coincide caption>

Fig3.gif
<Figure 3 the eye gazes of the two eyes meet at a point of interest>

Fig4.gif
<Figure 4 "distant constraint">

Fig5.gif
<Figure 5: location of the eyeball center can be inferred form the eye gaze>

Fig6.gif
<Figure 6 experiment setup for "two-circle" method >

Fig7a.gif
<Figure 7(a) eye gaze correspondent to the left upper corner of the monitor>

Fig7b.gif
<Figure 7(b) eye gaze correspondent to the right upper corner of the monitor>

Fig7c.gif
<Figure 7(c) eye gaze correspondent to the right upper corner of the monitor>

Fig7d.gif
<Figure 7(d) eye gaze correspondent to the right upper corner of the monitor>

Fig8.gif
<Figure 8 experiment setup for "one-circle" method>

Fig9a.gif
<Figure 9(a) eye gaze correspondent to the point V₁>

Fig9b.gif
<Figure 9(b) eye gaze correspondent to the point V₂>

Fig9c.gif
<Figure 9(c) eye gaze correspondent to the point V₃>

Fig9d.gif
<Figure 9(d) eye gaze correspondent to the point V₄>