Harvard University announced Thursday it’s releasing a high-quality dataset of nearly 1 million public-domain books that could be used by anyone to train large language models and other AI tools. Around five times the size of the notorious Books3 dataset that was used to train AI models like Meta’s Llama, the Institutional Data Initiative's database spans genres, decades, and languages, …