TY - JOUR
T1 - Transferability of features for neural networks links to adversarial attacks and defences
AU - Kotyan, Shashank
AU - Matsuki, Moe
AU - Vargas, Danilo Vasconcellos
N1 - Publisher Copyright:
© 2022 Kotyan et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
PY - 2022/4
Y1 - 2022/4
N2 - The reason for the existence of adversarial samples is still barely understood. Here, we explore the transferability of learned features to Out-of-Distribution (OoD) classes. We do this by assessing neural networks’ capability to encode the existing features, revealing an intriguing connection with adversarial attacks and defences. The principal idea is that, “if an algorithm learns rich features, such features should represent Out-of-Distribution classes as a combination of previously learned In-Distribution (ID) classes”. This is because OoD classes usually share several regular features with ID classes, given that the features learned are general enough. We further introduce two metrics to assess the transferred features representing OoD classes. One is based on inter-cluster validation techniques, while the other captures the influence of a class over learned features. Experiments suggest that several adversarial defences decrease the attack accuracy of some attacks and improve the transferability-of-features as measured by our metrics. Experiments also reveal a relationship between the proposed metrics and adversarial attacks (a high Pearson correlation coefficient and low p-value). Further, statistical tests suggest that several adversarial defences, in general, significantly improve transferability. Our tests suggests that models having a higher transferability-of-features have generally higher robustness against adversarial attacks. Thus, the experiments suggest that the objectives of adversarial machine learning might be much closer to domain transfer learning, as previously thought.
AB - The reason for the existence of adversarial samples is still barely understood. Here, we explore the transferability of learned features to Out-of-Distribution (OoD) classes. We do this by assessing neural networks’ capability to encode the existing features, revealing an intriguing connection with adversarial attacks and defences. The principal idea is that, “if an algorithm learns rich features, such features should represent Out-of-Distribution classes as a combination of previously learned In-Distribution (ID) classes”. This is because OoD classes usually share several regular features with ID classes, given that the features learned are general enough. We further introduce two metrics to assess the transferred features representing OoD classes. One is based on inter-cluster validation techniques, while the other captures the influence of a class over learned features. Experiments suggest that several adversarial defences decrease the attack accuracy of some attacks and improve the transferability-of-features as measured by our metrics. Experiments also reveal a relationship between the proposed metrics and adversarial attacks (a high Pearson correlation coefficient and low p-value). Further, statistical tests suggest that several adversarial defences, in general, significantly improve transferability. Our tests suggests that models having a higher transferability-of-features have generally higher robustness against adversarial attacks. Thus, the experiments suggest that the objectives of adversarial machine learning might be much closer to domain transfer learning, as previously thought.
UR - http://www.scopus.com/inward/record.url?scp=85128899353&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85128899353&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0266060
DO - 10.1371/journal.pone.0266060
M3 - Article
C2 - 35476838
AN - SCOPUS:85128899353
SN - 1932-6203
VL - 17
JO - PLoS One
JF - PLoS One
IS - 4 April
M1 - e0266060
ER -