Maritime Scene Segmentation

Petr Voles

pvoles@bournemouth.ac.uk

Dr. Martin Teal

mteal@bournemouth.ac.uk

Machine Vision Group, School of DEC, Bournemouth University

Intro

Maritime vessels are today faced with the threat of piracy. Piracy is usually associated with the old swash buckling films and consequently we do not consider piracy in the modern age, however several incidents of piracy happen each day, particularly in the Mallaca straights and the South China Sea areas. Here fast RIB craft (Rigid Inflatable Boats) approach the stern of a large cargo ship, even super-tankers, and scale the ship using simple rope ladders. The small numbers of crew that these ships have on duty means pirate detection needs to be automated. Current Radar systems are of limited use in these situations as RIB craft are small almost non-metallic and consequently have poor radar returns and as such radar systems find them difficult to detect.

In our research group, we are developing automated system that overcomes the difficulties of detecting small maritime objects.

The maritime scene has been found to be extremely complex to analyse [1], [2], producing large number of motion cues making identification and tracking in the visual environment complex. The system being developed here concentrates on the task of extracting the maritime vessels and other static nautical objects (buoys, mooring buoys, piers, etc.) from the sea to aid the recognition and tracking process. To accomplish this task three integrated algorithms have been developed, namely variable size image window analysis , statistical analysis by reclustering and region segmentor.

Figure 1. Typical maritime scene

Variable size image window analysis

The task of this algorithm is to reduce the data to be processed by characterising larger parts of the image by a certain number of characteristic features. It is done by segmenting the image into a set of segments and by consequential substitution of each segment by a set of numbers that somehow characterise the underlying segment.

Segmentation is widely used across the DIP algorithms. Most usual segmentation is done by applying rectangular grid on the image. Each rectangle then represents the segment. However, when maritime scene is considered, the situation is somehow different.

Because maritime scenes are outdoor scenes with considerable depth of view field, the perspectiveness must be accounted for. Simply saying, the sea is a horizontal plane projected on a vertical image plane., Objects that are close to the camera are projected near the bottom of the image. They appear larger than any objects that are further away from the camera. And this is the basic idea behind variable size segmentation. The size of segments in the grid is not fixed. The smallest segments cover the area of the image near the shoreline and their size increases towards the bottom line of the image.

The segments are also overlapping. This overlapping increases the ability of the algorithm to pick up smaller objects that might be only partially covered by a segment.

Next important step in the segmentation is resizing of larger segments. Segments are resized into the size of smallest segments. Two possible algorithms are used - simple resampling which is faster or bilinear transformation which is slower but gives better results. The resizing compensates for the perspectiveness of the scene. It is in a way similar to the affine transformation of the plane representing the sea pixel by pixel used, for example, in. These are the reasons for not using such approach: the configuration of ground plane (sea) - camera is not stable (if the camera is mounted on a boat the movement of the sea can change the position of the camera with respect to the ground plane at any moment), resulting image is not rectangular which brings unnecessary complications to the consecutive processing.

Figure 3. Segmentation grid applied on the previous scene.

Statistical Analysis by Reclustering

The resized segments are then passed for the next processing step. A set of characteristics is determined for each segment. As stated above, sea can be considered as a quasi-random texture. There is a vast number of features used for describing the texture. These features are used for texture recognition, classification, etc. The idea is that instead of classifying the sea in absolute way (which is rather difficult task due to the variability of maritime scenes)the algorithm concentrates on finding relative differences. If an object is present in the segment then the features determined for such a segment will show differences against the features of segments containing just the sea. Features for the sea segments will be similar. While features for the objects will be different. The set of features selected in this application is following:





The features are unbiased. Unibasing is important as a compensation of the change of illumination in the scene. Unbiasing is done by subtracting 'direct component' in each segment from the pixel values. This 'direct component' is either mean or median determined from all pixels in the segment. Both, mean and median were tried and results are very similar. The mean is easier to determine, thus it is preferred over the median.

These features are usually applied on so-called co-occurrence matrix It's use and definition is given in []. In such a case, the features give absolute description of the texture and can be used directly for classification. However, due to the variability of the sea it is rather difficult (if not impossible) to find a unique model usable for classification. Sea changes from scene to scene depending on factors like time of the day, weather, place, etc. This concludes in fact that the co-occurrence can be skipped as unnecessary step and features can be applied directly onto the pixels of the image. The features differ across the scenes but for a particular scene they are stable and similar. If there is an object in the scene then the features exhibit considerable difference.

One might argue that the features as presented here might be highly correlated and therefore redundant. We are aware of this fact and the thorough evaluation based on statistical principles is undergoing. However, early results prove that the redundancy is necessary because of the variability of the scene. Simply saying, different feature subsets are useful for different scenes.

Figure 3a. Another maritime scene ...

Figure 3b. ... and it feature space

Region Segmentor

The features can be considered as vectors. These vectors are spread as points in four-dimensional vector space that is called feature space. The configuration of points in feature space is unique for every image. Points that are corresponding to the segments covering just sea are lying close to one another in the feature space. They form so-called main cluster. Points that are outliers from this main cluster correspond to the points that represent the the segments with objects in them.

The task is to separate these two groups. There is a vast number of classifiers dedicated to similar problem. Some of them are based on neural net, others are purely based on statistical distributions of the data. Unfortunately, this case is different. Because it is difficult to generate a model of the sea which is necessary for 'classical' classification. At this point one might point out the fact that models of objects are not considered at all. The reason is simply the overwhelming number of possible marine craft and other objects and their possible versions. The database would be absolutely huge and still would not be exhaustive. Thus, all the segmentation is done on a low level which frees the system from any knowledge of the underlying objects to be segmented. Such system can deal with almost any object.

The only information that can be used for separation of outliers from main cluster is the relative position of the points in the feature space. The task is to find the center of main cluster and to set a boundary enclosing the points in the main cluster. This boundary is the actual threshold. All the points beyond this threshold are outliers and they are marked as representing the segments with objects in them.

The algorithm that searches for the center of main cluster is iterative. It employs Mahalanobis distance that accounts for anisotropy in distribution of points in the main cluster. Mahalanobis distance is given by a following formula:

Mahalanobis distance

where is feature vector and mean , are center vector and inverse of covariance matrix respectively.

After the two groups of points are separated the points are labeled and segments with objects are remapped back onto the image. They form the rectangles around the places where objects are likely to be.

Figure 4. Principle of reclustering (dashed line - 1st iteration, solid line 2nd iteration).


Figure 5a. Segmentation of single frame	Figure 5b. Segmentation with labeling

System was tested on a dozen of sequences. Sequences varied in the appearance of the sea in types of objects that were present. Some of the sequences contained static objects like piers and moors. Also semi-static objects like buoys were present. The performance was about 95 % of correct segmentation in average (ie. from 100 frames 5 were segmented improperly - either objects were dropped or sea was segmented as object).

References

[1] Sanderson, J.G., Teal, M.K., Ellis, T.J.: Identification and Tracking in Maritime Scenes. IEE Int. Conference on Image Processing and its applications. (1997) Vol. 2 463--467

[2] Smith, A.A.W., Teal, M.K.: Identification and Tracking of Maritime Objects in Near-Infrared Image Sequences for Collision Avoidance. IEE 7th Int. Conference on Image Processing and its applications. (1999) Vol. 1 250--254

[3] Campbell, .N.W., Thomas, B.T.: Segmentation of natural images using self organising feature maps. British Machine Vision Conference Proceedings. (1996) 223--232