For a project that requires non-cooperative imaging of people, the biggest challenge is not face recognition but face capture and face detection.
Some of the key challenges are that the scene may either not have any faces or it may have partially captured or occluded faces.
The faces captured might have different resolutions, orientation, expressions and/or shadows. Varying environment and light conditions, motion blur, cluttered and changing background make it even more difficult to capture the face. Not to mention the near impossible challenge to capture faces of people who try to hide their faces and deliberately avoid the cameras.
Since this is a novel idea and has not been implemented before, except for few handful of cases in which success rate is unknown, during phase I of the project we will be primarily focusing on identifying and testing the key components for face capture and face detection like camera type, camera positions and number of cameras and face detection algorithms.
A thorough evaluation of these key components is critical to success of this project.
Next, we will be focusing on implementing these components into our solution.
Video analysis and face detection are expensive tasks. We will be designing a multi-threaded system with built-in redundancy.
We will also be designing a test plan for functional testing, load testing & benchmarking response times in order to finalize the ideal hardware, software and design requirements. We will be using a modular approach in the development of this biometric solution whereby each module can be replaced without affecting other modules.
At the end of this project, we would have a viable, fully tested, ready to use product that can be installed at Highway/Port of Entry/…. to capture faces of people. The cameras would be weatherproof and would be able to capture images during day and night. The Software will be able to process the feed from the camera and extract faces and send them to the face verification System securely.
Technical information
Technology Description: Enlisted below are the key hardware and software components for this project along with the challenges, proposed solutions to overcome those challenges and results that we are proposing.
i. Camera
Challenge: The biggest challenge is to capture quality images of all the people in a moving vehicle. Environmental conditions such as changing lighting, wide ranging light levels, windshield reflection, varied weather conditions, haze, motion blur caused by moving vehicle, etc. may present challenge in capturing a usable image.
Solution: Selecting the right camera is the most important task.
Wide dynamic range (WDR): For this project, we expect surveillance scenes with very bright and dark areas hence a camera with wide dynamic range may provide the best solution. WDR cameras often incorporate an image sensor that takes different exposures of a scene (e.g., short exposure for very bright areas and long exposure for dark areas) and combine them into one image, enabling objects in both bright and dark areas of a scene to be visible. We will be using cameras with built-in WDR.
Infrared (IR) illuminators: Built-in IR LEDs send out near infrared light that allows cameras to capture good quality black and white images in darkness. This near Infrared light is not visible to the human eye. We will be using cameras with IR of 50 feet or more.
Resolution: Cameras with Megapixel sensors offer great detail in the image which would give us the best option to capture faces; most face detection algorithms need minimum 60 pixels between eyes. We will be testing cameras with resolution as high as 20MP. While high resolution cameras give a more detailed image, they also increase the bandwidth and storage expense. We will be focusing on using the compression methods in Phase 3.
Lens’ light gathering ability (f-number): Camera lens’ with the small f-number can gather more light and perform better in low-light settings. Lens’ f-number, exposure control settings, shutter speed, focal length, image sensor and image processing all play an important role in getting the right image and compensate for low lighting and motion blur. We will be optimizing these settings that get us an ideal image.
Shutter speed: Lower shutter speed will give us a sharper image and compensate for motion blur.
Autofocus: We will be using cameras that use a laser beam for range measurement and auto focus. This helps in getting the right focus on the object fast and without distorting the image.
Auto-tracking: We will be using the auto-tracking feature of the cameras. This feature automatically detects, zooms in on and follows moving objects. Once we track a vehicle, we can programmatically zoom into this object to see if we can find a face. We will focus on this feature in phase 2 of the project.
Additional features: Additional camera features that will be part of this project would be – IP67 rated, motion blur correction, Power over Ethernet (PoE), Arctic temperature control.
We believe these features will give us the best possible chance to get a good image of the face. In phase I, we have decided to test following high end IP camera that has these features:
Camera | Lens | IR, WDR, P67, PoE, NightVision | Resolution | f-number | ShutterSpeed | Frame Rate |
Axis Q1659 | 85mm | Yes | 20MP | f/1.2 L | 1/8000 to 1 s | 20 MP: 8 fps 4K: 25 fps |
We have also shortlisted the following cameras as backup cameras and will be testing these in phase 2 if the results based on the above mentioned camera are not satisfactory.
Camera | Lens | IR, WDR, P67, PoE, NightVision | Resolution | f-number | ShutterSpeed | Frame Rate |
Result: At the end of this task, we will have a report showing the values at which the camera captures the best images for these parameters – Camera, frame rate, fps, lens, camera position, height at which camera is placed and car speed.
Optional: We might need to introduce additional lighting in order to get a good image at night. This decision would be taking once the camera testing is completed.
ii. Camera placement and number of cameras
Challenge: Position of the camera is equally important as getting the right camera for this project. Getting images of the people sitting in the front seat is hard enough due to windshield reflection, changing lighting etc.
Solution: We will be capturing the scene from cameras placed on the side of the lane as well as front-on. Using multiple cameras strategically placed, would increase the chances of getting a good image. We will be testing different positions of the cameras.
Display attractor: Display attractors are audio video devices that attract a user’s attention and encourage people to look towards the attractor. These devices have been successfully deployed at various airports and other surveillance environments. The camera is mounted besides the attractor, thereby allowing it to capture a frontal view of the face and increasing the probability of face detection. We would want to explore the option of installing such attractors or placing the cameras besides road signs, if possible and feasible. Since these attractors need to follow safety and feasibility guidelines, we would be discussing these during our POE visits.
Result: We will have a report showing at which height, angle and position did the cameras give the best result. We will also be testing multiple cameras per lane. Our report will also have the ideal number of cameras needed to capture the necessary faces.
iii. Data transfer
Challenge: Since we are using IP cameras and transferring data from the cameras to the server, security becomes critical. We will be using managed windows service application for communication between the camera and server.
IP address filtering: We will be using the IP address filtering feature of the cameras that will allow only authorized computers to access the camera.
SSL: All data communication between the camera and server will be secured via SSL.
Data storage security: All data that is stored on our servers will use 256 Encryption. Videos & Images stored, if any, will be destroyed right away after processing.

iv. Face detection
Detecting faces from a video feed is another important feature. Once we have the video feed, our software will be constantly looking for a face within the feed. Once a face is found, face tracking algorithms can constantly track the face allowing us to capture multiple images of a face.
Challenge: Using the right face detection Algorithm is critical for this task to be successful. The images that we get would have faces with high distractors and high pose variance. Some images might have multiple faces and some might not have any but algorithms might (wrongly) detect certain objects as faces. Hence it is important to select the face detection algorithm that performs very well with such images.
Solution: In order to identify the right face detection algorithm for our project, we will be testing some of the best performing face detection algorithms.
We will be testing these algorithms on readily available face database and also on different youtube videos and see which algorithm detects the most faces accurately. We will be selecting varied datasets like images of faces “in the wild”, varied datasets and videos with high level of Distractors and pose variance. Some of the face database that we will be using would be – Megaface from University of Washington, Microsoft Celebrity database etc.
We have shortlisted following face detection algorithms to test as these algorithms have been tested on large datasets, have consistently performed well in competitions and/or have been used in large scale projects: Google, Amazon, Microsoft, Openface, 3DiVi Inc, faceplusplus, VGG (Oxford University), visionBox.
Result: At the end of this task, we will have a report showing performance of each algorithm.
Algorithm | Face Detection Rate – Distractor image | Face Detection Rate – pose variance | Processing Speed |
X.XX% | X.XX% | X sec/minute of video |
We will be shortlisting top 3 algorithms for further testing on live camera feed to be carried out in next phases.
v. Face detection on live feed
Once we have identified the top 3 face detection algorithms we will incorporate these top 3 face detection algorithms in our solution. At this point the face detection will be done on the live feed from the selected camera. We will be sending same live feed through 3 different algorithms. We will be testing these 3 algorithms extensively and based on the test results, we might have to use multiple algorithms in our project.

vi. Data processing
Challenge: Face detection on a live feed is an expensive task and needs a considerable amount of processing resources. Apart from face detection, our system also needs to communicate with the cameras and receive and process the live feed. The final solution will have multiple cameras, hence the challenge becomes multi-fold.
Solution: In order to process live video feed from multiple cameras, we would need to design a multithreaded system with built-in redundancy. Since video analysis is an expensive task, we would need a high end server (and backup server) to complete this task. The Processor, buffer size, RAM and other parameters of the server will be defined once we know the number of cameras that we would be using in order to capture the desired faces.
We would be designing a test plan for functional testing, load testing & benchmarking response times in order to finalize the ideal hardware and software requirements.
Alternatively, since the solution does not need to provide live match results, we could store the video on the Disk and process the video. This decision will be taken once we get the results from our benchmarking test results.
vii. Camera mounting pole
Once we have finalized the type of camera and camera positions, we would spec the camera mounting pole that needs to be used for this project. There are a wide variety of Poles readily available in the marketplace today.
Workflow
1. The first part of Phase 1 will be spent on testing the following 7 face detection algorithms: Google, Amazon, Microsoft, 3DiVi Inc, faceplusplus, VGG (Oxford University), VisionBox.
These algorithms have been selected because they have been tested on large datasets, have consistently performed well in competitions and/or have been used in large scale projects.
We will be testing these algorithms on readily available face database and also on different youtube videos and see which algorithm detects the most faces. We will be selecting varied datasets like images of faces “in the wild”, datasets and videos with high level of Distractors and pose variance.
Algorithm | Face Detection Rate – Distractor image | Face Detection Rate – pose variance | Processing Speed |
X.XX% | X.XX% | X sec/minute of video |
3. The third part of Phase 1 will be spent on integrating the top performing face detection algorithm from our tests into the Camera feed. By the end of this phase we should be able to capture video feed from the camera, send is securely using SSL to the server and detect faces from this feed using the top performing face detection algorithm.
2. The second part of Phase 1 will be spent on Camera Setup and capturing Camera Feed. In this part we will Setup the IP Camera and Server. The camera will be sending feed to the server securely using SSL and application would be able to capture and store this feed on the Server. At the end of this task we will deliver a video showing the working of this setup.
Conclusion
This project is very innovative and novel as something like this has not been successfully implemented as yet. Many attempts have been made in countries like Russia and China on surveillance of people in public places but none have been focused on or successfully been able to capture people within a vehicle.
Most of the successful face recognition projects have been carried out in a controlled environment like at an airport terminal/counter or retail outlets. None of these have the inherent challenges that this project has.
At the end of this Phase we would have a system that consists of a High End IP Camera that sends live feed securely via TCP/IP using SSL to the Server. The Server will be able to store this feed in video format (eg., MPEG 4 format). We would also be testing different face recognition algorithms with datasets that contain high distractors and YouTube videos and would have a report on the performance of each of these algorithms. By the end of this phase, we would also have defined the top 3 face detection algorithms that would meet our project requirements.
Comments are closed.