How are Indian languages faring in the age of AI and language models?
“Sanskrit suits the language of computers and those learning artificial intelligence learn it,” Indian Space Research Organisation chairman S. Somanath said at an event in Ujjain on May 25. “A large network is more powerful, can represent more complex functions, and needs more data to reach its maximum performance.” Data for ‘fine-tuning’ “The availability of text in each language is going to be a long-tail – a few languages with lots of text, many languages with few examples” – and this is going to affect models dealing with the latter, Makarand Tapaswi, a senior machine learning scientist at Wadhwani AI, a non-profit, and assistant professor at the computer vision group at IIIT Hyderabad, said. Meet AI4Bharat In India, AI4Bharat is an IIT Madras initiative that is “building open-source language AI for Indian languages, including datasets, models, and applications,” according to its website. To train language models, he said, AI4Bharat has a corpus called IndicCorp with 22 Indian languages, and its CommonCrawl website-crawler can support “10-15 Indian languages”. In a May 9 preprint paper, an AI4Bharat group addressed “the task of machine translation from an extremely low-resource language to English using cross-lingual transfer from a closely related high-resource language.” “I think we are at a point where we have data to train modest-sized models for Indian languages, and start experimenting with the directions … mentioned above,” Dr. Kunchukuttan said.
Discover Related

Nvidia's New AI Model for India: A Game-Changer for the Country's AI Landscape

Meta's LeCun Seeks to Shape India's AI Future: A Vision for Open-Source Innovation

BharatGen geared to put a desi spin on ChatGPT, IIT-Bombay plays role

AI companies race to adapt chatbots to India’s many languages

Indian AI startup Sarvam AI open-sources first Hindi AI model
