|Date||May 30, 2014|
|Title||Training object class detectors from eye tracking data|
A central task in Computer Vision is detecting object classes such ascars and horses in complex scenes. Training an object class detectortypically requires a large set of images annotated with bounding-boxes,which is expensive and time consuming to create. In this tasl I willpresent a novel approach to annotate object locations which cansubstantially reduce annotation time. We first track the eye movementsof annotators instructed to find the object and then propose a techniquefor deriving object bounding-boxes from these fixations. To validatethis idea, we collected eye tracking data for 10 object classes of thePascal VOC 2012 benchmark (6270 images, 5 observers). Our techniquecorrectly produces bounding-boxes in 47% of the images, while reducingthe total annotation time by about 5x, compared to drawingbounding-boxes. Any standard object class detector can be trained on thebounding-boxes predicted by our model.
Vittorio Ferrari is a Reader at the School of Informatics of theUniversity of Edinburgh which he joined in December 2011. He leads theCALVIN research group on visual learning. He received his PhD from ETHZurich in 2004 and was a post-doctoral researcher at INRIA Grenoble in2006-2007 and at the University of Oxford in 2007-2008. Between 2008 and2012 he was Assistant Professor at ETH Zurich, funded by a SwissNational Science Foundation Professorship grant. In 2012 he received theprestigious ERC Starting Grant, and the best paper award from theEuropean Conference in Computer Vision for his work on large-scale imageauto-annotation. He is the author of over 60 technical publications,most of them in the highest ranked conferences and journals in computervision and machine learning. He regularly serves as an Area Chair forthe major vision conferences and he is an Associate Editor of IEEEPattern Analysis and Machine Intelligence. His current researchinterests are in weakly supervised learning of object classes, semanticsegmentation, and large-scale auto-annotation.