Curated list of resources on testing distributed systems
Andrey Satarin has curated a comprehensive list of resources focused on testing distributed systems. The collection includes research papers and studies that analyze various bugs and testing approaches in popular distributed systems. It aims to provide valuable insights for those interested in improving the reliability and performance of distributed systems.
- ▪The curated list includes studies on bugs in systems like Hadoop, Cassandra, and ZooKeeper.
- ▪Research highlights the importance of simple testing to prevent critical failures in distributed data-intensive systems.
- ▪The collection features papers on fault tolerance and the impact of hardware issues on distributed systems.
Opening excerpt (first ~120 words) tap to expand
List of resources on testing distributed systems curated by Andrey Satarin. If you are interested in my other stuff, check out public talks. For any questions or suggestions you can reach out to me on Twitter, Bluesky @asatarin.bsky.social or other platforms. {% comment %} Private notes https://docs.google.com/document/d/1xHt_PK9yGMTP6JNDMydQLF4SHIdlq-BF9IZeTOXtIOg/edit {% endcomment %} Table of Contents A Markdown unordered list which will be replaced with the ToC {:toc} Overview of Testing Approaches Research Papers Bugs What Bugs Live in the Cloud? A Study of 3000+ Issues in Cloud Systems— study of actual bugs in different popular distributed systems (Hadoop MapReduce, HDFS, HBase, Cassandra, ZooKeeper and Flume) TaxDC: A Taxonomy of Non-Deterministic Concurrency Bugs in Datacenter…
Excerpt limited to ~120 words for fair-use compliance. The full article is at GitHub.