Beyond Physics: how behavioural modelling ushers in new possibilities for camera perception
For a seasoned driver, decision making at the wheel is intuitive. Drivers instinctively make thousands of micro-choices in the front seat: when to adjust the steering wheel, choosing an efficient route and checking your mirrors all while following the rules of the road. Acceleration, deceleration, braking and steering all happen in the blink of an eye; avoiding collisions is second nature as we seamlessly predict whether or not a person, animal or other object will collide with the car.
Even for human drivers, this becomes more difficult in urban environments. While diverse pedestrian behaviour means human drivers have difficulty decoding what their counterparts on the road will do; autonomous vehicle systems struggle even more. They’re either too conservative, stopping and starting at even the slightest disruption, or are built too recklessly. The Uber accident in 2018 is one example of this, as the security system was switched off to “ensure a smooth ride”. Now, the industry is grappling with how to make autonomous systems safer. But what’s the best prediction model to move forward with?
End-to-end deep learning is powerful, but has drawbacks
There’s been some recent hype around using end-to-end deep learning for pedestrian crossing prediction, but it is not without its drawbacks. This approach uses large amounts of annotated video data that show diverse pedestrian behaviours. Now, we can predict crossing for pedestrians that have not been seen before by just looking at past and current behaviour represented in the pixels of the video.
This method is very powerful. However, end-to-end deep learning implies very few constraints on the structure of the model. There may be billions of parameters in these models. This complex structure means it is near impossible to understand how decisions are made; nor can we obtain reliable and valid estimates of prediction uncertainty. Deep learning models are known for their overconfident predictions (see Adversarial Examples). This black box approach makes end-to-end deep learning difficult to justify in safety critical applications like autonomous driving.
Physics Models are interpretable, but lack complexity
Many state of the art autonomous vehicle and ADAS systems use deep learning to detect and localise pedestrians, but need to rely on transparent models to predict pedestrian crossing. Physics models, as we call them, is a term used to describe the combination of noisy sensor data with short-term predictions that are formed by our knowledge about how objects propagate in the physical world.
The models can be used to predict object locations for a set period of time. By building models for the pedestrian and the vehicle separately and combining their output with map information, it is possible to compute the probability of a pedestrian crossing in front of the vehicle. We call this the physics model approach because it uses past location, their derivatives and infrastructure information to predict pedestrian crossing. In contrast to end-to-end deep learning, the developer imposes rigorous structure on the underlying model and relevant input. This limits the number of parameters, distributions, and interactions between variables to consider. These restraints result in reduced model complexity, which makes a physics model approach more manageable than end-to-end deep learning.
Although the physics model approach is widely accepted as the secure standard for AVs and ADAS, our research shows that it is not clear if physics model prediction capabilities go far enough to enable a smooth, safe driving experience in pedestrian dense environments.
Moving beyond physics to tackle the limitations of camera perception
At Humanising Autonomy, we believe the missing link in crossing prediction is a true understanding of the pedestrians’ cognitive processes. Pedestrians know when crossing is safe or not; drivers can intuit when he or she is unsure. This theory of mind helps to identify when further communication is required in road scenarios to establish a smooth interaction. Moreover, this perspective is necessary to bridge the critical safety gap in the industry’s current approach to prediction models.
Our models have shown themselves to be more accurate in crossing prediction. A quantitative analysis revealed that our behavioural model can reduce the error of physics model crossing predictions by more than fifty per cent. In addition we can predict crossing with an accuracy of 90% up to 4 seconds in advance; gradually increasing for shorter predictions (99% up 1second). In addition the results surpass most deep learning approaches, but the behavioural model is far more transparent.
Without capturing the underlying psychological processes of pedestrian behaviour, physics models fail to accurately predict pedestrian crossing. Predictions can be wrong or delayed, which makes driving through downtown more difficult and potentially dangerous. By incorporating psychology into probabilistic machine learning models, Humanising Autonomy is able to mitigate the limitations of physics-based models while keeping their positive attributes of a white box approach: interpretability, transparency, small model size and a trustworthy estimate of its prediction uncertainty.
This is the first in a series of blog posts by Senior Behavioural Data Scientist Dominic Noy. His webinar Beyond Physics: Tackling the Limitations of Camera Perception is available now for download. Contact nick@humanisingautonomy.com to learn more.