Patrick COURTNEY
11 October 1996
During the course of their development the film and TV industries have enthusiastically exploited technological developments, from sound and colour through to more recent innovations in computer animation and current experiments in high definition and digital broadcasting. Their markets are global and touch the greater part of the worlds population. The equipment market has been estimated at $15billion pa. Technically, the needs are very demanding. In TV production they are characterised by high bandwidths with multiple high quality colour image streams. For live events this is coupled with the need for low latency (a few frames at most) and a requirement for high reliability and robustness to failure.
Two stages can be distinguished: the production of the entertainment product, and the delivery (distribution and transmission) of the finished product. Each calls upon different functions and technologies in which computer vision technology has or can play a role.
In situations where the cameras are subjected to shocks or where a cameraman is unavailable, stabilisation of the image may be necessary. Methods of ensuring this include:
dynamically balanced platforms suitable for handheld use, such as the Steadicam (USA).
gyroscopically balanced platforms suitable for high magnification use on aircraft such as the systems of BSS (UK) and Wescam (Canada).
opto-mechanical stabilisation using elements at the lens eg Canon (Japan).
electronic stabilisation using motion estimation offered by JVC (Japan) and others.
Cameras are often mounted on mobile heads or cranes to obtain particular views. A wide variety of such devices are available in the form of simple pan-tilt heads through to pedestal systems for XYZ motions and arms with 8 or more degrees of freedom. Additional optical degrees of freedom are obtained by controlling the zoom and focus and occasionally the aperture. These systems may be manually driven (with or without coders) or via numerical control, joystick or programmed offline with resolutions described in microns and minutes of arc. Current suppliers include Ultimatte (USA), Vinten (UK), Egripment (Netherlands), A&C (UK), Radamec (UK), MRMC (UK) and Panther (Germany).
The cameras themselves increasingly employ digital signal processing as a way of dealing with the varying preprocessing reqirements (gamma and colour matrix) as well as to provide increased stability, noise reduction and defect elimination capability.
Orad (Israel) offer an interactive annotation system for sports commentating, which operates on a video feed to provide tracking and naming of players, automatic tracking of players and ball, distance and speed estimations, panoramic reconstruction by view combination, offside checks, etc. It has been used in football, ice hockey, tennis, basketball and athletics for replay commentaries.
Other technologies have also been employed to track objects of interest for outdoor and sporting events. Medialab (France) have used localisation information from GPS receivers to sythesise images of yachts for the Americas Cup. In ice hockey, a radio transmitter fitted inside the puck has enabled trajectory and speed information to be obtained for display on screen.
Increasing use is being made of offline 3D data acquisition systems to create models for animation in progamming and video games. The techniques used are either laser scanning, for which there are several suppliers such 3D Scanners (UK) and Cyberware (USA); or fusing of views such as in the Sphinx 3D modeler from Dimension (Germany).
Just recently it has become technically feasible to synthesise background images in real time and to match them to the camera viewing parameters such as position, zoom and depth of field. This allows much more complex virutal sets, including ones where there is an interaction between the actors and the synthetic objects. In order to do this it is necessary to have precise knowledge about the cameras. The precision required depends on viewpoint but can be rather high (1mm in XYZ, <0.1deg. in orientation). This information is currently obtained in one of two ways:
using optical or mechanical sensors fitted to a remotely driven robotic camera, either a pan/tilt unit or a pedestal camera capable of XYZ motions. Suppliers include Radamec (UK) and Vinten (UK). Coders of 24 bit resolution are common on the pan and tilt axes, while 12 to 16 bit coders are used for zoom and focus. Such systems are said to suffer from backlash and drift and are of limited use for handheld or wide area use.
using the recognition of a pattern in the background and using this to determine position. This can either take the form of a bar code read by a laser scanner to determine XY position of a pedestal (Radamec), or a monochrome grid pattern on the blue backdrop to determine the 7 image projection parameters for a handheld camera (Orad). Such systems have problems when the patterns becomes hard to see due to the angle of view, focus or motion blur, or occlusion.
There are a number of commercial virtual set systems on the market including those from Accom (USA/Poland), Brainstorm (USA/Spain), Electrogig (Netherlands), Discreet Logic (Canada/UK), Orad (Israel), and RTset (Israel). Most major broadcasters are currently experimenting with such systems and they are in regular use in Germany, UK Spain and elsewhere. While these systems have the potential to reduce costs (set construction and storage) they are still very expensive (up to $750k) and the present use is to provide increased functionality (unavailable, dynamic or otherwise impossible backdrops) rather than cost reduction.
mixing between several images sources, including animated or synthetic sequences.
transforming or warping one image onto another.
tracking of objects for motion compensation, image stabilisation and alignment of clips.
removing or adding motion blur.
grain management (sampling, modelling and matching) when working with film.
segmentation of an object moving against a bluescreen by tracking.
There exist a number of editing and production workstations to carry out these task, such as the systems from Quantel (UK), Discreet Logic (Canada), Alias/Wavefront (Canada), etc. They draw heavily on methods from surface reconstruction, sampling, filtering, correlation, segmentation and interpolation. In general the tools used are interactive with manual setting of the parameters and the operator supervising the process running over a sequence at near realtime, tuning parameters to obtain a satisfactory result.
indexing and searching: this is traditionally performed using keywords associated with each shot, and there are several suppliers such as Artsum (France). For more automated searching, Dubner (USA) offer a system which detects shot or clip transitions according to change detection (cuts) or fast motion in live filming, for subsequent manual annotation. Demonstrations have also been carried out on experimental image indexing-by-content, such as the Impact system from Hitachi (Japan) for finding shots starts according to content, while Illustra/Virage (USA) and IBM (USA) have systems able to carry out searches based on colour, texture or shape queries. These systems are still under development with regards processing capability and applications.
archive restoration: the transfer of film stock to digital media is becoming increasingly important as the volume of older material threatened by loss and the resolution of broadcast media continues to rise. Colouring of black and white films can be included in this catagory. Suppliers such as DigitalVision (Sweden) provide sophisticated conversion workstations with interactive processing for motion compensation, colour correction, scratch, dirt and noise filtering, etc.
interpolation between images of a film shot at 24 frames per second to a stream at 60 fields per second.
rescanning a 525 line image to create a 625 line image.
motion compensation when converting from one scan rate to another.
converting between one colour system and another.
digitising a movie suffering from grain and scratch damage.
The methods used to carry out these tasks are those drawn from signal and image processing and are present in the transfer machines, as well as in specialised boxes from companies such as Snell and Wilcox (UK) and DigitalVision (Sweden).
JPEG and MPEG-based compression for codecs, data transmission and storage.
transcoding between different compression standards and variants.
preprocessing to remove noise prior to compression.
These draws on methods from signal processing (transforms, quantisation, correlation and non-linear filtering).
Since the coding standards specify a bitstream rather than a coding method, the evaluation of codecs is becoming increasingly important as test generators and quality assessment systems. Evaluation frameworks have been developed based on models of human sensitivity to contrast, orientation at various scales. Recent work had added colour, motion and memory effects.
Within some interactive games, the body motions of players are used to control the game character using the blue-screen chroma-keying technique. Such systems include those from Artificial Reality and Vivid (USA).
image acquisition:
- smart cameras able to automate some stereotyped shooting work, including framing of scenes, tracking, zoom and focus, especially for programming needing many cameras such as minority sports, interactive games, etc. As channels become cheaper and several viewpoints are sent to the viewer, he may take control of the choice of shots. Such systems would also allow acquisition of footage not otherwise available due to the dynamics of the scene, for example the faces of sportsmen, possibly integrating other data such as GPS etc. Smart cameras are under development at MIT and Microsoft.
- three dimensional image acquisition especially for non-TV applications such as interactive games. Work within EC projects DISTIMA and MIRAGE based on polarised left and right image views, has already led to the development of a high-quality 3D camera and displaies by Thomson and AEA.
data acquisition:
- existing performance animation systems are limited in the range of information that they can obtain (arms and legs of single cooperating humans) and the conditions under which they will work (indoors, close range). Systems capable of overcoming these limitations are required.
- there is scope for additional systems capable of extracting conceptual information such as speed and distance as well as 3D shape from arbitrary image sequences for sports, news and games programming. It is also of interest for the cross-media reuse of content. 3D model acquisition from uncalibrated image sequences is the subject of the EC VANGUARD and REALISE projects
sets:
- virtual sets systems are limited by their current high cost, due in part to the problem of obtaining precise camera imaging information (position, and optical modelling).
- the constraints of using a blue-screen room remains frustrating to many. The ideal is true pixel-precise depth keying at field rate and operating over a wide area, outdoors if possible.
post-production: already rich in functionality, post-production workstations would benefit from more automatic and adaptive methods. Automedia (Israel) have recently offered colour image segmentation by contour following. Other possibilities include lip synchronisation for more precise dubbing.
archive management:
- there is an increasing need to allow the searching of archives by content, by activity, and by abstract meaning, including the searching of compressed data and mixed audio and video.
- restoration and conversion of archives will benefit from sophisticated noise and degradation models and adaptive filtering methods. The AURORA project is currently pursuing these goals.
format conversions: the rising number of formats, including compressed, means that format conversion will be with us for quite some time, especially for forwards, backwards and compression transcoding.
compression:
- increased compression ratios are needed to make more efficient use of communication channels, either using model-based coding (MPEG4) or prior knowledge concerning programme content.
- analysis of subjective performance contines to be important, especially for new media such as immersive VR. The TAPESTRIES project is examining these factors.
protection: this will become an increasingly important issue as material is distributed in digital form and image manipulation tools become more widespread. There is a certain amount of work going on in the area of digital fingerprinting as research programmes (ACCOPI, TALISMAN and IMPRIMATEUR) and systems available from EMI (UK), Highwater (UK), AT&T (USA) and NEC (USA).
interaction: mixing of real and virtual objects requires knowledge of position and event synchronisation.
In parallel with the increased performance and functionality of systems in the professional market, equipment will also begin to migrate to the home and domestic markets.
function current application Image signal image acquisition: in-camera processing; processing and post-production workstations; format transformation conversion: estimation and filtering; archive management: image restoration Barcode reading image acquisition: robotic pedastal camera navigation Scene reconstruction data acquisition: performance animation; and visualisation post-production workstations Motion, gesture, face data acquisition: performance animation; expression user end interaction: games interfaces recognition Image compression compression: delivery and programme exchange Positioning, data acquisition: performance animation and registration and sports annotation; sets: camera position metrology registration Pattern, object and data acquisition: performance animation; event recognition sets: camera registration in virtual sets; archives management: searching and preprocessing Biological vision compression: image codec performance assessment