There are different types of adversarial attacks and defences for machine learning algorithms which makes assessing the robustness of an algorithm a daunting task. Moreover, there is an intrinsic bias in these adversarial attacks and defences to make matters worse. Here, we organise the problems faced: a) Model Dependence, b) Insufficient Evaluation, c) False Adversarial Samples, and d) Perturbation Dependent Results. Based on this, we propose a model agnostic adversarial robustness assessment method based on L0 and L1 distance-based norms and the concept of robustness levels to tackle the problems. We validate our robustness assessment on several neural network architectures (WideResNet, ResNet, AllConv, DenseNet, NIN, LeNet and CapsNet) and adversarial defences for image classification problem. The proposed robustness assessment reveals that the robustness may vary significantly depending on the metric used (i.e., L0 or L1). Hence, the duality should be taken into account for a correct evaluation. Moreover, a mathematical derivation and a counter-example suggest that L1 and L2 metrics alone are not sufficient to avoid spurious adversarial samples. Interestingly, the threshold attack of the proposed assessment is a novel L1 black-box adversarial method which requires even more minor perturbation than the One-Pixel Attack (only 12% of One-Pixel Attack’s amount of perturbation) to achieve similar results. We further show that all current networks and defences are vulnerable at all levels of robustness, suggesting that current networks and defences are only effective against a few attacks keeping the models vulnerable to different types of attacks.
All Science Journal Classification (ASJC) codes