The Oxford Martin Programme on

AI Threat Detection

The Challenge

Artificial Intelligence (AI) is now prevalent in every part of our life, and is rapidly transforming our world. However, despite our increasing reliance on AI, there remains a worrying gap in our ability to detect attacks on AI systems. The Oxford Martin Programme on AI Threat Detection aims to address this.

Current best practices in cybersecurity involve using prevention, detection, and recovery measures to build resilience against cyber-attacks. There is significant research underway to prevent attacks on AI systems, particularly focused on identifying and eliminating vulnerabilities and potential attack points. However, there are no specific methods to detect when such systems have been compromised, or predict how such risks might spread or cause harm. As a result, the recommended best international practices and standards for cybersecurity cannot be fully applied to organisations using AI.

The Oxford Martin Programme on AI Threat Detection will research optimal and effective threat detection models that can identify threatening activity in AI systems, particularly by malicious actors who have managed to access such systems. The programme will consider the features and datapoints that can show if an AI system has been compromised, checking all its parts including the AI data inputs, learning models and algorithms, and the interaction with the wider operating system and user environment.

The team aims to create a threat-detection tool that will continuously monitor AI-based systems for tell-tale signs of compromise. They will be guided by two objectives to help them achieve this aim. Firstly, to understand how to measure threatening activity on AI systems, how this varies across different types of AI models and organisational use, and in what way accuracy of detection varies so that the tool’s deployment is as optimised as possible. Secondly, to create a unique database of attack-response information on a range of cyberattack types and AI models suitable for testing threat detection effectiveness.