Multiobject identification with several cameras

M. Valera and S.A.Velastin

Источник
Introduction

M. Valera and S.A.Velastin: Intelligent Distributed Surveillance Systems: a Review (IEE Proc. - Vis. Image Signal Process. Vol. 152 No. 2 April 2005, pp. 192-204) describes third generation surveillance systems as "dealing with a large number of cameras, a geographical spread of resources, many monitoring points and mirroring the 'distributed nature' of the human monitoring process. For this, the processing capabilities must be distributed over the network, provide scene understanding, attract attention of the operator in real time and use a multi-sensor environment of low cost components. the need to co-ordinate information across cameras becomes an important issue."

Actual videosurveillance systems are unsatisfactory for identifying and tracking persons of interest in more than one way: most are designed for analysis by a human observer, are based on analyses of individual cameras recordings without fusing the multiple views to perform the tracking on. They are exposed to the failure of a central node (ex. the CROMATICA Eu-funded project) or to the failure of a processor within each node (ADVISOR project - both projects cited in the above paper). More importantly, their success depends entirely on the reliability of one single detection method - usually by motion detection or background elimination. In all these projects the automatic following of a person is restricted to cameras with the same approximate angle of view and lighting

Philosophy

Our team believes that a robust and reliable surveillance system must have

distributed processing without time synchronisation
with the processing organized in several steps, each unit and each step capable to evaluate its own success rate
and the nearest units and subsequent steps with some capability to compensate for the failure of a given one.

This stepwise organisation was also the conclusion of some big research projects like that of Dickmanns for automated driving.

To fit into this philosophy, we have an algorithm to identify objects which:

uses several cameras (at least three seeing every spot and all knowing the relative positions of the others covering the same area), capable to recognise the directions to the moving objects detected but unable to distinguish them, thus giving a set of directions (in 2D or in 3D) as the output
runs an identical program on each camera, using the other cameras non-synchronized programs to build up a 2D plan of the objects' positions which they broadcast among themselves
and thus permits a user station (receiving the same plans) to select an object in one camera image and instantly identify the same object in the other images
or to track the movement of a selected object up to a certain density of objects.

Potential use

Adding a summary and computationally cheap detection functionality for persons (like I'm investigating detecting persons in JPEG images without DCT decompression), the system could be used for crowd monitoring and in particular to acceletrate the post-mortem treatment of traditional CCTV data in the presence of a sufficient number of approximately time-synchronised cameras

to pick out people in a crowd who do not follow its general movement; for example after a bombing or a watching dealer in a corridor of the underground
to follow 'tagged' people and see if they meet or to follow people in a time inverted sequence.

On the other hand, for a CCTV system user it may be a welcome functionality to be able to skip from one camera image to another and instantly find the same person, following his movement through several images. As the system intelligence is distributed on the camera units, the system is easily scaleable, allowing coverage of large areas with a high number of cameras and to have separate surveillance centers for parts of the area. The system is robust to the malfunctioning of some of the cameras as long as it was deployed with sufficient redundancy.

Future extensions

The algorithm is under further development to make it more robust towards objects not detected by all of the cameras and to make it work with moving cameras (still supposing the knowledge of their relative positions). Its detection part may need to be adapted to open-space lighting conditions (not part of previous projects).
Although off-the-shelf methods do not provide high certainty, they may be employed in the detection of the objects of interest (of persons, in this case).
Назад