ANALYST-FRIENDLY AND TASK-DRIVEN EXPLANATIONS

Currently AI models are black boxes to the human analyst. For example, in a malware scenario an analyst is simply informed by a bot whether a code sample is malicious or not, without further explanation beyond an approximate confidence score. This discards significant work done by the AI, diminishes trust in the bot, and hinders root cause analysis. This could be due to natural evolution of data, or more serious black-swan events, such as release of a brand-new attack. Furthermore, current explainability techniques are at the feature level: such techniques assign a weight to each feature corresponding to importance towards an overall prediction. Existing explainability techniques are not robust - small changes to inputs can result in a large change in the corresponding explanations. Brittleness of these explanation techniques can be exploited by an adversary and erode an analyst’s trust in the bot.