Vision as Inverse Graphics

We have a PhD scholarship funded by Microsoft Research on the topic of Vision as Inverse Graphics. This studentship was awarded to Lukasz Romaszko, who started work in September 2015. The Microsoft co-supervisor was Dr Pushmeet Kohli (until summer 2017) and is now Dr John Winn.

The project

A long-standing view of computer vision is that it is the inverse of a computer graphics problem. That is, the goal of computer vision is to infer the objects present in a scene, their positions and poses, the illuminant etc. In the language of machine learning, the object identities, poses, illuminant etc are latent variables which must be inferred in order to understand the scene.

In this project we will develop a stochastic scene generator, and render these scenes to produce images; we will then train recognition models to infer the relevant latent variables. These can be dense fields (intrinsic images) such as a depth map or segmentation map, or sparse information, e.g. concerning the presence of a certain object class. This generalizes the work of Shotton et al (2011) on the Microsoft Kinect, where the scene consists of a single human plus background. The great advantage of using synthetic data is that there is ready access to the relevant latent variables, and that large quantities of data can be easily generated for training the recognition models. We will also study the "structured noise" process that relates graphics to real images, so as to enhance transferance of the learned models to real images.

Related publications

Learning Direct Optimization for Scene Understanding pdf
Lukasz Romaszko, Christopher K. I. Williams, John Winn. Final m/s version of paper published in Pattern Recognition vol 105, 107369, https://doi.org/10.1016/j.patcog.2020.107369. Initial verson posted on arXiv 18 Dec 2018.

Vision-as-Inverse-Graphics: Obtaining a Rich 3D Explanation of a Scene from a Single Image pdf
Lukasz Romaszko, Christopher K.I. Williams, Pol Moreno, Pushmeet Kohli. ICCV 2017 Geometry Meets Deep Learning workshop, October 2017 (oral presenatation). supplementary material.

Overcoming Occlusion with Inverse Graphics pdf
Pol Moreno, Christopher K.I. Williams, Charlie Nash and Pushmeet Kohli. Presented at: Geometry Meets Deep Learning workshop, ECCV 2016 (oral presentation). Final m/s version of paper appearing in Computer Vision-ECCV 2016 Workshops Proceedings Part III, eds. H. Gang and H. Jegou, Springer LNCS 9915 pp 170-185.
Code is available.
Chris Williams