Performance Evaluation

Recently an increased amount of emphasis has been placed on assessing the performance of algorithms for computer vision processing. Nowadays there are large numbers of journal publications which describe hundreds of algorithms. Often there are tens or hundreds of competing algorithms all designed to solve the same task. There needs to be a means for comparing them, so that the better ones can be selected for use.

Lines

For instance, over the last thirty years an enormous number of algorithms have been developed for segmenting a digitial curve, normally into a sequence of straight lines. Sometimes these algorithms are compared in terms of the accuracy or compression obtained by the segmented curve with respect to the original. However, since there is a tradeoff between these two factors (one can be improved at the expense of the other), they are of little use for evaluation. For several different algorithms the integral square error (ISE) is plotted against number of points (i.e. accuracy versus compression) and is shown below. The continuous line shows the optimal performance obtained by dynamic programming.

assessment graph

Thus it is obvious that goodness cannot be measured by considering the accuracy or compression properties of an algorithm in isolation. Our solution was to compare an algorithm's performance against some "gold standard"; in this case the optimal result. The previous problem of using accuracy or compression is now solved. The optimal result using the same number of lines as the approximation, or producing the same ISE, can be found. This enables two algorithms to be given a rating relative to the gold standard, that then can be compared meaningfully, even if the algorithms produced very different numbers of lines.

Further measures can be obtained by analysing the behaviour of algorithms over a range of their parameter values. This produces plots of breakpoints, somewhat like scale-space curves as these examples show:
scale-space plot scale-space plot scale-space plot scale-space plot
From such plots methods are given to quantify measures such as

  • monotonicity - the number of lines produced by an algorithm versus the input parameter should be monotonic; also the error versus input parameter should be monotonic
  • consistency - different parameter values should not produce polygons with the same number of lines but different polygons

    The table below shows the assessment of curve segmentation algorithms applied to 21 test curves.

    Method M(Line) M(ISE) Consistency Merit
      $\mu$ $\sigma$ $\mu$ $\sigma$   $\mu$   $\sigma$ $\mu$ $\sigma$
    point-chord distance 86 10.7 67 13.5 1.3 0.80 53 18.7
    area 65 10.9 55 10.9 3.4 1.28 36 15.4
    maximum deviation 70 10.5 58 19.0 2.4 1.09 30 15.1
    length 68 12.9 59 17.8 2.7 1.18 37 14.8
    orientation 93 7.0 71 7.8 1.4 0.59 62 20.6
    number of crossings 37 7.1 24 9.5 5.3 0.85 32 15.7
    Cheng & Hsu 72 13.2 28 15.5 6.1 2.64 16 11.3
    Deguchi & Aoki 17 10.8 18.5 23.0 6.5 5.37 32 17.15
    Deveau 100 0.0 89 13.4 2.3 0.62 41 24.2
    Douglas & Peucker 99 2.8 98 5.6 0.1 0.27 64 12.4
    Fu et al. 8 19.5 1 19.7 2.6 0.80 51 17.3
    Hu & Yan 86 13.9 89 11.9 0.0 0.00 56 13.4
    Inesta et al. 4 53.3 0 48.8 1.6 0.60 37 13.3
    Ji & Haralick 81 25.1 68 18.0 1.7 1.32 22 13.9
    Meloza & Ozanian 83 13.7 61 18.4 1.9 1.27 58 20.0
    Pei & Horng 73 15.7 74 12.9 2.1 1.43 40 15.2
    Phillips & Rosenfeld 29 27.1 26 17.8 5.5 1.81 30 12.6
    Ramer 100 0.0 100 1.1 0.0 0.00 74 19.7
    regular 100 0.0 66 8.8 0.0 0.00 27 9.0
    Rosenfeld & Johnston 85 13.4 71 9.1 2.6 1.27 52 25.0
    Wu & Levine 59 31.0 52 19.6 2.1 0.95 32 15.6
    Zhang et al. 96 5.6 60 13.9 2.3 1.07 36 14.2
    optimal $E_1$ 100 0.0 97 2.8 0.0 0.00 88 9.0
    optimal $E_2$ 100 0.0 100 0.0 0.0 0.00 100 0.0
    optimal $E_\infty$ 100 0.0 95 3.1 0.0 0.00 77 10.4
    optimal $E_L$ 100 0.0 80 14.2 0.0 0.00 63 13.6
    optimal $E_\theta$ 100 0.0 77 16.4 0.0 0.00 40 18.9

    More details are given in:

    P.L. Rosin, "Assessing the behaviour of polygonal approximation algorithms", Pattern Recognition, vol. 36, no. 2, pp. 505-518, 2003.
    P.L. Rosin, "Techniques for Assessing Polygonal Approximations of Curves", IEEE Trans. PAMI vol. 19, no. 6, pp. 659-666, 1997.

    Edges

    In addition, a measure for assessing edge thresholding has been described and applied in:

    P.L. Rosin, "Edges: saliency measures and automatic thresholding", Machine Vision and Applications, vol. 9, pp. 139-159, 1997, (see also: P.L. Rosin, "Edges: saliency measures and automatic thresholding", Technical Note No. I.95.58; May 1995, Institute for Remote Sensing Applications, Joint Research Centre, Italy.)

    S. Venkatesh & P.L. Rosin, "Dynamic threshold determination by local and global edge evaluation", CVGIP: Graphical Models and Image Processing, vol. 57, no. 2, pp. 146-160, 1995.

    Ellipses

    In order to compare different approximation to the Euclidean distance between a point and an ellipse boundary (useful for ellipse fitting) several measures were developed and applied to various distance approximations in:

    P.L. Rosin, "Assessing error of fit functions for ellipses", Graphical Models and Image Processing, vol. 58, no. 5, pp. 494-502, 1996

    P.L. Rosin, "Analysing error of fit functions for ellipses", Pattern Recognition Letters, vol. 17, no. 14, pp. 1461-1470, 1996.

    Tracking

    Some methods for evaluating surveillance techniques are given in:

    J. Black, T. Ellis and P. Rosin, "A Novel Method for Video Tracking Performance Evaluation", Joint IEEE Int. Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (VS-PETS), 2003.

    P.L. Rosin and E. Ioannidis, "Evaluation of global image thresholding for change detection", Pattern Recognition Letters, vol. 24, no. 14, pp. 2345-2356, 2003.