Sparkbit’s research lab is working on a new concept of visual telematics. In the next posts I’d like to share with you what does this concept mean, what are the business benefits and, of course, what we’ve achieved so far.
What is Visual Telematics?
Traditionally, driver behaviour analysis based on telematics was focused around the following measurements:
- Speeding: does the driver exceed the maximum speed allowed on the given street?
- Harsh acceleration / braking / cornering: does the driver change her velocity rapidly?
- Fatigue: does the driver drive for too long without taking a break?
The above is often combined with basic “environmental” data (e.g. weather conditions or road type) to calculate the behaviour score.
Such analysis relies on very basic sensors (mostly GPS and accelerometer). However, it has a major drawback – it does not take into account the real situation on the road, i.e. how does the scored driver behave in relation to other vehicles.
And here our concept of visual telematics comes into play: we propose to use video footage from a dash camera to detect and analyze risky and dangerous maneuvers. Applying computer vision deep learning techniques, we are able to detect events such as:
- Tailgating: does the driver follow the car in front of her too closely? According to police reports, this is one of the most important causes of accidents on motorways
- Lane hopping: does the driver change traffic lanes too often?
- Does the driver speeds / harshly accelerates in a close proximity of a pedestrian crossing?
We believe that such advanced, contextual events are the key to a precise and reliable driving style analysis.
In our approach, we detect such contextual events based on a video footage. This means that we can not only build a much more precise scoring model, but we can also give a real feedback to the driver: we can show short video clips of the events so that the driver can actually see what they did wrong (and get a chance to improve in the future). For privacy reasons, the video analysis happens directly on the driver’s phone and the entire video is not streamed to the server platform.
Most widely-deployed telematic systems are purely based on data collected through a mobile phone (to lower the project deployment costs). Similarly, we build our visual telematics as a mobile SDK (so we do not require additional hardware / camera to offer the service). This creates of course some technical challenges (like low quality, blurred images), but challenges are what our research lab likes most!
Building a Visual Telematics Platform
Humans are really good at interpreting images – we have no problems with e.g. detecting animals, buildings or vehicles on an image. For computers, such tasks were for decades considered as very hard. Recently, due to the development in the area of deep learning and due to the fast progress in hardware performance, it became feasible to be implemented in business applications.
Classical approaches to analyze images were based on building algorithms to detect low-level features of the images, such as e.g. edges of various shapes. These algorithms were based on some hard coded criteria (like difference in colors of neighboring pixels). This approach has gained some popularity, but it does not generalize well to diverse, real-life images (with various light conditions, blur etc.)
More recent approaches are based on deep learning (and convolutional neural networks in particular). In a deep learning algorithm, a machine learning expert builds a model that is able to automatically infer various properties of images, based on a large set of manually annotated data. In other words, rather than programming an algorithm on how to detect a feature, we give it a large number of examples of that feature and the model should automatically learn how to identify that feature. With deep learning there are several challenges as well:
- defining a suitable model architecture for a given task is quite tricky and requires a certain level of machine learning expertise,
- not all models are small and performant enough to run on a mobile phone,
- training a deep learning model requires a significant data set.
Our visual telematics solution is based on this more modern, deep learning approach.
To train our models, we have collected over 30 hours of recordings (more than 100 000 images), from roads in multiple countries (Poland, Czech Republic, Italy), recorded on various road types (from unpaved tracks, through city streets to mountain roads and motorways), in diverse weather and light conditions. To collect the data set, we have used multiple phone models.
We have then manually annotated (i.e. marked lanes) ca. 15000 images.
As I’ve mentioned, building visual telematics based on smartphone cameras is challenging. Each user would place their camera under a slightly different angle. Some of the recordings may be of low quality (e.g. videos captured at high speed at night may be blurry). Finally, the video needs to be analysed on the fly (we cannot store gigabytes of data for later processing).
To address these issues we have tried to collect a diverse data set, apply additional data augmentation and design a model that would be both precise and small enough to run on the phone. You can see the results that we’ve achieved on the attached video.
Today, we have the visual events implemented and integrated into our SparkT platform. In the future, we plan to offer the capability also as an SDK for other platforms.
In subsequent posts, I will describe in detail the models that we have built to detect the image-based, contextual events.