next up previous
Next: Relation to Previous Work Up: No Title Previous: No Title

Introduction

The ability to interpret hand gestures is essential if computer systems are built to interact with human users in a natural way. In this paper, we present a new vision-based framework which allows a computer to interact with users through hand signs. Our experimental setup is described as follows. We put a video camera on the top of a computer. The user faces the camera while performing hand signs. We assume an indoor environment where the lighting is fixed.

Since its first known dictionary was printed in 1856 [8], American Sign Language (ASL) is widely used in the deaf community as well as by the handicapped people who are not deaf [5]. The general hand sign interpretation needs a broad range of contextual information, general knowledge, cultural background and linguistic capabilities, which are beyond our capabilities now. In our current research, we select twenty-eight different signs from [6] for experiments as shown in Fig. 1. These hand signs have following characteristics: 1) they represent a wide variation of hand shapes; 2) they include a wide variation of motion patterns; 3) these hand signs are performed by one hand; 4) recognition of these signs can be done without using contextual information. The gestures which require the hand to perform in a certain environment or to point to a specific object are excluded.

In the linguistic description of ASL, Stokoe used a structural linguistic framework to analyze sign formation [40]. He defined three ``aspects'' that were combined simultaneously in the formation of a particular sign - what acts, where it acts, and the act. These three aspects translate into building blocks that linguists describe - the hand shape, the location, and the movement. There are two major components in our framework to deal with above three building blocks. We developed a prediction-and-verification scheme to locate hands from complex backgrounds. The spatiotemporal recognition component combines motion understanding (movement) with spatial recognition (hand shape) in an unified framework.

  
Figure 1: The twenty eight different signs used in the experiment. (1) sign of ``angry''; (2) ``any''; (3) ``boy''; (4) ``yes''; (5) ``cute''; (6) ``fine''; (7) ``funny''; (8) ``girl''; (9) ``happy''; (10) ``hi''; (11) ``high''; (12) ``hot''; (13) ``later''; (14) ``low''; (15) ``no''; (16) ``nothing''; (17) ``of course''; (18) ``ok''; (19) ``parent''; (20) ``pepper''; (21) ``smart''; (22) ``sour''; (23) ``strange''; (24) ``sweet''; (25) ``thank you''; (26) ``thirsty''; (27) ``welcome''; (28) ``wrong'' (Bornstein and Saulnier 1989).


next up previous
Next: Relation to Previous Work Up: No Title Previous: No Title

Yuntao Cui
Wed Jun 25 16:00:42 EDT 1997