A lightweight and portable approach to making concurrent failures reproducible

Qingzhou Luo, Sai Zhang, Jianjun Zhao, Min Hu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Citations (Scopus)

Abstract

Concurrent programs often exhibit bugs due to unintended interferences among the concurrent threads. Such bugs are often hard to reproduce because they typically happen under very specific interleaving of the executing threads. Basically, it is very hard to fix a bug (or software failure) in concurrent programs without being able to reproduce it. In this paper, we present an approach, called ConCrash, that automatically and deterministically reproduces concurrent failures by recording logical thread schedule and generating unit tests. For a given bug (failure), ConCrash records the logical thread scheduling order and preserves object states in memory at runtime. Then, ConCrash reproduces the failure offline by simply using the saved information without the need for JVM-level or OS-level support. To reduce the runtime performance overhead, ConCrash employs a static data race detection technique to report potential possible race conditions, and only instruments such places. We implement the ConCrash approach in a prototype tool for Java and experimented on a number of multi-threaded Java benchmarks. As a result, we successfully reproduced a number of real concurrent bugs (e.g., deadlocks, data races and atomicity violation) within an acceptable overhead.

Original languageEnglish
Title of host publicationFundamental Approaches to Software Engineering - 13th International Conference, FASE 2010, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2010, Proceedings
Pages323-337
Number of pages15
DOIs
Publication statusPublished - Apr 29 2010
Event13th International Conference on Fundamental Approaches to Software Engineering, FASE 2010, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2010 - Paphos, Cyprus
Duration: Mar 20 2010Mar 28 2010

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume6013 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other13th International Conference on Fundamental Approaches to Software Engineering, FASE 2010, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2010
CountryCyprus
CityPaphos
Period3/20/103/28/10

Fingerprint

Hazards and race conditions
Concurrent
Thread
Scheduling
Data storage equipment
Java
Atomicity
Interleaving
Deadlock
Schedule
Interference
Prototype
Benchmark
Unit
Software

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Luo, Q., Zhang, S., Zhao, J., & Hu, M. (2010). A lightweight and portable approach to making concurrent failures reproducible. In Fundamental Approaches to Software Engineering - 13th International Conference, FASE 2010, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2010, Proceedings (pp. 323-337). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6013 LNCS). https://doi.org/10.1007/978-3-642-12029-9_23

A lightweight and portable approach to making concurrent failures reproducible. / Luo, Qingzhou; Zhang, Sai; Zhao, Jianjun; Hu, Min.

Fundamental Approaches to Software Engineering - 13th International Conference, FASE 2010, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2010, Proceedings. 2010. p. 323-337 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6013 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Luo, Q, Zhang, S, Zhao, J & Hu, M 2010, A lightweight and portable approach to making concurrent failures reproducible. in Fundamental Approaches to Software Engineering - 13th International Conference, FASE 2010, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2010, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 6013 LNCS, pp. 323-337, 13th International Conference on Fundamental Approaches to Software Engineering, FASE 2010, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2010, Paphos, Cyprus, 3/20/10. https://doi.org/10.1007/978-3-642-12029-9_23
Luo Q, Zhang S, Zhao J, Hu M. A lightweight and portable approach to making concurrent failures reproducible. In Fundamental Approaches to Software Engineering - 13th International Conference, FASE 2010, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2010, Proceedings. 2010. p. 323-337. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-642-12029-9_23
Luo, Qingzhou ; Zhang, Sai ; Zhao, Jianjun ; Hu, Min. / A lightweight and portable approach to making concurrent failures reproducible. Fundamental Approaches to Software Engineering - 13th International Conference, FASE 2010, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2010, Proceedings. 2010. pp. 323-337 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{07691ebd315d488faaad311b22eef246,
title = "A lightweight and portable approach to making concurrent failures reproducible",
abstract = "Concurrent programs often exhibit bugs due to unintended interferences among the concurrent threads. Such bugs are often hard to reproduce because they typically happen under very specific interleaving of the executing threads. Basically, it is very hard to fix a bug (or software failure) in concurrent programs without being able to reproduce it. In this paper, we present an approach, called ConCrash, that automatically and deterministically reproduces concurrent failures by recording logical thread schedule and generating unit tests. For a given bug (failure), ConCrash records the logical thread scheduling order and preserves object states in memory at runtime. Then, ConCrash reproduces the failure offline by simply using the saved information without the need for JVM-level or OS-level support. To reduce the runtime performance overhead, ConCrash employs a static data race detection technique to report potential possible race conditions, and only instruments such places. We implement the ConCrash approach in a prototype tool for Java and experimented on a number of multi-threaded Java benchmarks. As a result, we successfully reproduced a number of real concurrent bugs (e.g., deadlocks, data races and atomicity violation) within an acceptable overhead.",
author = "Qingzhou Luo and Sai Zhang and Jianjun Zhao and Min Hu",
year = "2010",
month = "4",
day = "29",
doi = "10.1007/978-3-642-12029-9_23",
language = "English",
isbn = "3642120288",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "323--337",
booktitle = "Fundamental Approaches to Software Engineering - 13th International Conference, FASE 2010, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2010, Proceedings",

}

TY - GEN

T1 - A lightweight and portable approach to making concurrent failures reproducible

AU - Luo, Qingzhou

AU - Zhang, Sai

AU - Zhao, Jianjun

AU - Hu, Min

PY - 2010/4/29

Y1 - 2010/4/29

N2 - Concurrent programs often exhibit bugs due to unintended interferences among the concurrent threads. Such bugs are often hard to reproduce because they typically happen under very specific interleaving of the executing threads. Basically, it is very hard to fix a bug (or software failure) in concurrent programs without being able to reproduce it. In this paper, we present an approach, called ConCrash, that automatically and deterministically reproduces concurrent failures by recording logical thread schedule and generating unit tests. For a given bug (failure), ConCrash records the logical thread scheduling order and preserves object states in memory at runtime. Then, ConCrash reproduces the failure offline by simply using the saved information without the need for JVM-level or OS-level support. To reduce the runtime performance overhead, ConCrash employs a static data race detection technique to report potential possible race conditions, and only instruments such places. We implement the ConCrash approach in a prototype tool for Java and experimented on a number of multi-threaded Java benchmarks. As a result, we successfully reproduced a number of real concurrent bugs (e.g., deadlocks, data races and atomicity violation) within an acceptable overhead.

AB - Concurrent programs often exhibit bugs due to unintended interferences among the concurrent threads. Such bugs are often hard to reproduce because they typically happen under very specific interleaving of the executing threads. Basically, it is very hard to fix a bug (or software failure) in concurrent programs without being able to reproduce it. In this paper, we present an approach, called ConCrash, that automatically and deterministically reproduces concurrent failures by recording logical thread schedule and generating unit tests. For a given bug (failure), ConCrash records the logical thread scheduling order and preserves object states in memory at runtime. Then, ConCrash reproduces the failure offline by simply using the saved information without the need for JVM-level or OS-level support. To reduce the runtime performance overhead, ConCrash employs a static data race detection technique to report potential possible race conditions, and only instruments such places. We implement the ConCrash approach in a prototype tool for Java and experimented on a number of multi-threaded Java benchmarks. As a result, we successfully reproduced a number of real concurrent bugs (e.g., deadlocks, data races and atomicity violation) within an acceptable overhead.

UR - http://www.scopus.com/inward/record.url?scp=77951292783&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77951292783&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-12029-9_23

DO - 10.1007/978-3-642-12029-9_23

M3 - Conference contribution

AN - SCOPUS:77951292783

SN - 3642120288

SN - 9783642120282

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 323

EP - 337

BT - Fundamental Approaches to Software Engineering - 13th International Conference, FASE 2010, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2010, Proceedings

ER -