2 weeks, 4 days ago

This Tool Probes Frontier AI Models for Lapses in Intelligence

Executives at artificial intelligence companies may like to tell us that AGI is almost here, but the latest models still need some additional tutoring to help them be as clever as they can. Scale AI, a company that’s played a key role in helping frontier AI firms build advanced models, has developed a platform that can automatically test a model across thousands of benchmarks and tasks, pinpoint weaknesses, and flag additional training data that ought to help enhance their skills. The new tool “is a way for to go through results and slice and dice them to understand where a model is not performing well,” Berrios says, “then use that to target the data campaigns for improvement.” Berrios says that several frontier AI model companies are using the tool already. Jonathan Frankle, chief AI scientist at Databricks, a company that builds large AI models, says that being able to test one foundation model against another sounds useful in principle. The company says its new tool offers a more comprehensive picture by combining many different benchmarks and can be used to devise custom tests of a model’s abilities, like probing its reasoning in different languages.

Discover Related