Ex-Google DeepMind Researcher Warns Benchmarks Won’t Save Us
Lun Wang, a former researcher at Google DeepMind, has raised concerns about the inadequacy of current AI benchmarking tests in evaluating the risks of evolving AI models. He argues that existing methods are too focused on current capabilities and fail to account for new types of AI behavior that could emerge. Wang suggests the need for adaptive evaluation methods that can evolve alongside AI technologies to effectively identify potential risks.
- ▪Lun Wang announced his departure from Google DeepMind, citing concerns over AI benchmarking.
- ▪He emphasized that current benchmarks do not adequately evaluate new capabilities of AI models.
- ▪Wang proposed the development of self-evolving evaluations to better assess emerging AI risks.
Opening excerpt (first ~120 words) tap to expand
Remember when there was that stretch of time where people were leaving AI companies and every one of their farewell messages boiled down to, “This is going to kill us all?” Lun Wang, a researcher at Google’s DeepMind, recently announced he was departing from the company and may have reignited the trend by warning that current benchmarking tests aren’t capable of truly evaluating risks presented by evolving AI…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Gizmodo.