TY - JOUR
T1 - An Empirical Study on Data Distribution-Aware Test Selection for Deep Learning Enhancement
AU - Hu, Qiang
AU - Guo, Yuejun
AU - Cordy, Maxime
AU - Xie, Xiaofei
AU - Ma, Lei
AU - Papadakis, Mike
AU - Le Traon, Yves
N1 - Funding Information:
This work was supported by the Luxembourg National Research Funds (FNR) through CORE project C18/IS/ 12669767/STELLAR/LeTraon.
Publisher Copyright:
© 2022 Copyright held by the owner/author(s).
PY - 2022/7/12
Y1 - 2022/7/12
N2 - Similar to traditional software that is constantly under evolution, deep neural networks need to evolve upon the rapid growth of test data for continuous enhancement (e.g., adapting to distribution shift in a new environment for deployment). However, it is labor intensive to manually label all of the collected test data. Test selection solves this problem by strategically choosing a small set to label. Via retraining with the selected set, deep neural networks will achieve competitive accuracy. Unfortunately, existing selection metrics involve three main limitations: (1) using different retraining processes, (2) ignoring data distribution shifts, and (3) being insufficiently evaluated. To fill this gap, we first conduct a systemically empirical study to reveal the impact of the retraining process and data distribution on model enhancement. Then based on our findings, we propose DAT, a novel distribution-aware test selection metric. Experimental results reveal that retraining using both the training and selected data outperforms using only the selected data. None of the selection metrics perform the best under various data distributions. By contrast, DAT effectively alleviates the impact of distribution shifts and outperforms the compared metrics by up to five times and 30.09% accuracy improvement for model enhancement on simulated and in-the-wild distribution shift scenarios, respectively.
AB - Similar to traditional software that is constantly under evolution, deep neural networks need to evolve upon the rapid growth of test data for continuous enhancement (e.g., adapting to distribution shift in a new environment for deployment). However, it is labor intensive to manually label all of the collected test data. Test selection solves this problem by strategically choosing a small set to label. Via retraining with the selected set, deep neural networks will achieve competitive accuracy. Unfortunately, existing selection metrics involve three main limitations: (1) using different retraining processes, (2) ignoring data distribution shifts, and (3) being insufficiently evaluated. To fill this gap, we first conduct a systemically empirical study to reveal the impact of the retraining process and data distribution on model enhancement. Then based on our findings, we propose DAT, a novel distribution-aware test selection metric. Experimental results reveal that retraining using both the training and selected data outperforms using only the selected data. None of the selection metrics perform the best under various data distributions. By contrast, DAT effectively alleviates the impact of distribution shifts and outperforms the compared metrics by up to five times and 30.09% accuracy improvement for model enhancement on simulated and in-the-wild distribution shift scenarios, respectively.
UR - http://www.scopus.com/inward/record.url?scp=85128902819&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85128902819&partnerID=8YFLogxK
U2 - 10.1145/3511598
DO - 10.1145/3511598
M3 - Article
AN - SCOPUS:85128902819
SN - 1049-331X
VL - 31
JO - ACM Transactions on Software Engineering and Methodology
JF - ACM Transactions on Software Engineering and Methodology
IS - 4
M1 - 78
ER -