What are important data systems problems, ignored by research? (2024)
The Dutch-Belgian DataBase Day highlighted significant challenges in database research, particularly regarding variable-length strings. Panelists discussed the lack of efficient string processing and the inadequacy of standard benchmarks in reflecting real-world workloads. They emphasized the need for more realistic benchmarks and research on both single-node and distributed query processing.
- ▪Variable-length strings pose major challenges in query processing and storage efficiency.
- ▪Standard benchmarks like TPC-H do not adequately address the complexities of string processing.
- ▪There is limited research on optimizing network connections and scheduling database workloads.
Opening excerpt (first ~120 words) tap to expand
In November, I had the pleasure of attending the Dutch-Belgian DataBase Day, where I moderated a panel on practical challenges often overlooked in database research. Our distinguished panelists included Allison Lee (founding engineer at Snowflake), Andy Pavlo (professor at CMU), and Hannes Mühleisen (co-creator of DuckDB and researcher at CWI), with attendees contributing to the discussion and sharing their perspectives. In this post, I'll attempt to summarize the discussion in the hope that it inspires young (and young-at-heart) researchers to tackle these challenges. Additionally, I'll link to some paper that can serve as motivation and starting points for research in these areas.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Blogspot.