I am a post-doc researcher in Bob Fisher’s group at the Institute for Action, Perception and Behaviour in the School of Informatics at the University of Edinburgh.
My research interests are in computer vision and machine perception, more specifically:
I am currently involved in the EU project TrimBot2020 coordinated by our group, which aims to develop a camera-driven automatic trimming robot, which can autonomously navigate outdoors. I am responsible for the evaluation workpackage, computer vision integration and participate in a wide range of technical tasks.
In our Assisted Living Lab in Edinburgh, I work on several machine perception projects with our doctoral and master students.
You can see my publications at Edinburgh Research Explorer or Google Scholar.
You can find more about my previous projects at my PhD research page.
Two-page CV can be downloaded here.
While pose estimation with visual SLAM can be highly accurate, it is not guaranteed to provide the smooth pose estimate that navigation algorithms expect, due to outliers in the measurement, noise in the pose estimate, etc. For this reason it has become common to include a filter that can use inertial sensors mounted on the vehicle and a motion model to constrain the estimated trajectory of the vehicle to be smooth.
We propose an extension to the Extended Kalman Filter framework, which can cope with irregularities of SLAM measurements without access to its internal characteristics. The particular techniques implemented for this purpose are outlier rejection based on estimation of measurement covariance from past measurements, penalization of lags and soft filter reset.
Covariance Estimation for Robust Visual-Inertial Odometry in Presence of Outliers and Lags
Submitted, 2019. Video. Code.
Refining raw disparity maps from different algorithms to exploit their complementary advantages is still challenging. Uncertainty estimation and complex disparity relationships among pixels limit the accuracy and robustness of existing methods and there is no standard method for fusion of different kinds of depth data. In this paper, we introduce a new method to fuse disparity maps from different sources, while incorporating supplementary information (intensity, gradient etc.) into a refiner network to better refine raw disparity inputs. A discriminator network classifies disparities at different receptive fields and scales. Assuming a Markov Random Field for the refined disparity map produces better estimates of the true disparity distribution. Both fully supervised and semi-supervised versions of the algorithm are proposed. The approach includes a more robust loss function to inpaint invalid disparity values and requires much less labeled data to train in the semi-supervised learning mode. The algorithm can be generalized to fuse depths from different kinds of depth sources.|
SDF-MAN: Semi-supervised Disparity Fusion with Multi-scale Adversarial Networks
Remote Sensing, February 2019. Video. Code.
Accurately registering point clouds from a cheap low-resolution sensor is a challenging task. Existing rigid registration methods
failed to use the physical 3D uncertainty distribution of each point from a real sensor in the dynamic alignment process. It is mainly because the uncertainty model for a point is static and invariant and it is hard to describe the change of these physical uncertainty models in different views. Additionally, the existing Gaussian mixture alignment architecture cannot efficiently implement these dynamic changes.
We propose a simple architecture combining error estimation from sample covariances and dynamic global probability
alignment using the convolution of uncertainty-based Gaussian Mixture Models (GMM). |
DUGMA: Dynamic Uncertainty-Based Gaussian Mixture Alignment
Proc. of 3D Vision, September 2018. Video. Code.
The advance of scene understanding methods based on machine learning relies on the availability of large ground truth datasets, which are essential for their training and evaluation. Construction of such datasets with imagery from real sensor data however typically requires much manual annotation of semantic regions in the data, delivered by substantial human labour. To speed up this process, we propose a framework for semantic annotation of scenes captured by moving camera(s), e.g., mounted on a vehicle or robot. It makes use of an available 3D model of the traversed scene to project segmented 3D objects into each camera frame to obtain an initial annotation of the associated 2D image, which is followed by manual refinement by the user. The refined annotation can be transferred to the next consecutive frame using optical flow estimation.
Consistent Semantic Annotation of Outdoor Datasets via 2D/3D Label Transfer
Sensors, July 2018. Video. Code.
We have developed a working pipeline which integrates hardware and software in an automated robotic rosecutter; to the best of our knowledge, the first robot able toprune rose bushes in a natural environment. Unlike similar approaches like tree stem cutting, the proposed method does not require to scan the full plant, have multiple cameras aroundthe bush, or assume that a stem does not move. It relies on asingle stereo camera mounted on the end-effector of the robotand real-time visual servoing to navigate to the desired cuttinglocation on the stem.
Real-time Stereo Visual Servoing for Rose Pruning with Robotic Arm
Proc. of ICRA, 2020. Video.
Visual servoing is a well-known task in robotics. However, there are still challenges when multiple visual sources are combined to accurately guide the robot or occlusions appear. We present a novel visual servoing approach using hybrid multi-camera input data to lead a robot arm accurately to dynamically moving target points in the presence of partial occlusions. The approach uses four RGBD sensors as Eye-to-Hand (EtoH) visual input, and an arm-mounted stereo camera as Eye-in-Hand (EinH). A Master supervisor task selects between using the EtoH or the EinH, depending on the distance between the robot and target.
Hybrid Multi-camera Visual Servoing to Moving Target
Proc. of IROS, September 2018. Video.
The problem of finding a next best viewpoint for 3D modeling or scene mapping has been explored in computer vision over the last decade. We propose a method for dynamic next best viewpoint recovery of a target point while avoiding possible occlusions. Since the environment can change, the method has to iteratively find the next best view with a global understanding of the free and occupied parts. We model the problem as a set of possible viewpoints which correspond to the centers of the facets of a virtual tessellated hemisphere covering the scene. Taking into account occlusions, distances between current and future viewpoints, quality of the viewpoint and joint constraints (robot arm joint distances or limits), we evaluate the next best viewpoint.
Best Viewpoint Tracking for Camera Mounted on Robotic Arm with Dynamic Obstacles
Proc. of 3D Vision, October 2017. Presentation. Video.