Behind the world's most sophisticated facial recognition system that tracks absolutely anything
Xjera Labs uses neural networks to detect any object or human from CCTV video footage in a flash.
A Singapore-based start-up has developed an incredibly clever face recognition system that uses artificial intelligence to detect and identify any person, or object, from CCTV camera footage within mere minutes.
Xjera Labs has spent the last five years developing its own neural networks (a type of machine learning) that can detect individuals, vehicles and objects from video footage with a 97% accuracy, based on a database containing 20,000 people whose faces have been captured on CCTV.
Neural networks are large networks of artificially intelligent classical computers that are trained using computer algorithms to solve complex problems in a similar way to the human central nervous system, whereby different layers examine different parts of the problem and combine to produce an answer.
At the moment, It is incredibly difficult for a human to spot an individual in a crowd, even when reviewing video footage, and at this point artificial intelligence is still far behind the human brain in terms of capability.
However, Xjera Labs has made it possible by building incredibly dense 52-layer neural networks and having multiple networks, so that it is possible for the system to produce the answer to a single attribute question like, "How many men have brown hair?" within 200 milliseconds.
The user would still have to run several search requests to narrow down the possible answers, but the idea is to speed up the search process so that it takes minutes instead of hours or days.
Breaking down a face into a multitude of attributes
"If the police want to find someone, then they want a very fast response. With our system, we are constantly doing facial indexing in real-time on CCTV camera footage. The footage is uploaded to the cloud by our customers and we process the data," Xjera Labs' founder and CTO Ethan Chu told IBTimesUK at the Innovfest Unbound 2017 event in Singapore.
"We use part-based representation and different layers in the neural network focus on different attributes. We have basic layers that describe the subject in general, ie. the subject's shape, texture and colours. Then other layers are split into different attributes to describe things in more detail, so if we want to know if someone is wearing glasses, only the layers that concern detecting the head will be used to search for that."
The start-up has developed three products – XHound, which is able to locate a person or vehicle of interest; XIntelligence, which is able to count people in a crowd in high density indoor and outdoor locations; and XTransport, which can count and classify cars on highways, as well as detecting illegal driving and traffic accidents.
The three systems rely on six neural networks that each contain an impressive number of layers – one focuses on detecting a person's mood by analysing their facial expression; a second can detect and recognise actions, such as fighting or climbing a fence; and a third focuses solely on facial attributes such as how dark a person's eyebrows are.
A fourth network is used to distinguish people from the background in the video; a fifth detects text in the form of licence plates, signs, logos, alphabet letters and language characters; and the sixth focuses on detailed categorisation of vehicles, such as the car's make and type.
"We use deep learning algorithms and our own neural network architecture that we started developing in 2012. It utilises very few GPU resources. We worked together with Nvidia and just one P4 Nvidia GPU can support 32 cameras in real time concurrently and filming in HD," said Chu.
Turning you into numbers for security
If rapid facial recognition from CCTV camera footage doesn't sound Big Brother enough, there's also the security question: How do you store such a huge database of millions of faces, and how do you make sure that the data doesn't get leaked into criminals' hands?
Xjera Labs solves the problem by not storing any images at all. Instead, it uses a technology it developed called "feature transformation", in which the system extracts details about a person's physical features – Do you wear glasses? Are you tall, short, male, female? Are you wearing jeans? What is your race? What colour is your hair and how long is it? – and turns it into a row of numbers.
Let's say you commit a crime in Singapore and then leave the country without being caught. The police investigate and they might not know who you are, but they have your face and physical appearance on captured CCTV. At the time, the system assigns your identity to a series of numbers.
If you return to the country five years later, another CCTV camera picks up your image. The system assigns you a new row of numbers, and then cross-checks with its existing database of numbers to see if it has come across you before. If the police have created an alert, they will be notified that you are back, and they can then search their network to see if you pop up again somewhere else in the country.
Big Brother is coming, whether you like it or not
Xjera Labs says that its products are already being used by the Singapore police, as well as by large corporations that run theme parks, conveyor belt sushi restaurants and even schools in China.
"In the past, interns were made to stand on the bridge at the entrance of Sentosa (Singapore's island resort) with a clicker manually counting the number of vehicles that entered and left the island, and in sushi restaurants, they have someone whose job is to count the number of plates of sushi to tell the chefs when to make more," Chu explained.
"No one wants to sit there and count sushi – it's not very interesting work. This technology is important because it liberates people from tedious work, and it's a much faster way to find people. In Nanjing there was a murderer on the loose, and in order to find the guy, the Chinese police activated over 10,000 officers. They found him, but it took a few months."
The start-up has two development teams in Shanghai and Shenzhen in China, and it says that there is a lot of interest from the Chinese market. In particular, secondary schools want to use the facial recognition technology to see whether students are paying attention in class, and to take attendance.
It's easy to imagine an autocratic government like China being keen using on such an advanced facial recognition system to locate dissidents, but is it right to provide Big Brother with even more ammunition?
"If we don't help them, other companies will. But if we play a part by providing our solution, we can still influence them and try to make sure they do the right thing because we have the technical influence," said Chu.
"However we're more interested in commercial corporations, and would be unlikely to sell our solutions to countries where we feel they will use it to do their citizens harm."
© Copyright IBTimes 2024. All rights reserved.