Indian AI model’s local language viability faces content availability barrier
1 month, 1 week ago

Indian AI model’s local language viability faces content availability barrier

The Hindu  

A key goal of Indian startups and the IndiaAI Mission has been to create a foundational large language model that is tuned to Indian languages. That has so far been a tall order, as the amount of Indian language content online — a key source of training data in English, which is what most foundational models like OpenAI and Google’s primarily work with — has been a fraction of other well-represented languages. Digitising content like news and books to extract local language content is also not a surefire solution, Mr. Pani said, as the sheer volume of public user posts on the internet dwarfs user generated content on the public web by volume. The creation of indic language datasets for a homegrown AI model that may be substantially useful, therefore, would depend on better availability of Indian language data, which in turn depends on more Indian language content being posted online. The IndiaAI Mission is also planning a repository of Indian language datasets, IT Minister Ashwini Vaishnaw said earlier this month, with details of the IndiaAI Datasets Platform to be announced later.

History of this topic

CIIL conference highlights 15 newly developed datasets and AI applications for Indian languages
1 day, 3 hours ago
National conference to discuss efficacy, improvement, and support of AI applications in Indian languages to be held on March 20
3 days, 6 hours ago
AI a Y2k moment for Indian industry: MeitY Secretary
1 month, 1 week ago
Want to build ChatGPT in India? Govt calls for LLM proposals, reveals 18000-GPU cluster for training
1 month, 3 weeks ago
India to develop its own AI model like ChatGPT and DeepSeek in 10 months: Ashwini Vaishnaw
1 month, 3 weeks ago
The AI breakthrough: How open innovation is changing the game
Trending News
2 months, 1 week ago
Nvidia's Nemotron-4-Mini-Hindi-4B: A Lightweight AI Model for India's Growing AI Landscape
Trending News
4 months, 3 weeks ago
Nvidia's New AI Model for India: A Game-Changer for the Country's AI Landscape
4 months, 3 weeks ago
Meta's LeCun Seeks to Shape India's AI Future: A Vision for Open-Source Innovation
4 months, 3 weeks ago
Bengaluru-based Sarvam AI launches Sarvam 2B: India’s first multilingual LLM to revolutionize AI landscape
6 months, 3 weeks ago
Unlocking India's Digital Potential: Role Of AI, Generative AI In Vernacular Languages For Users
6 months, 3 weeks ago
Google introduces AI-powered Search Results in India for English and Hindi users
7 months ago
AI companies race to adapt chatbots to India’s many languages
8 months, 3 weeks ago
16 new datasets in Indian languages for Artificial Intelligence and Machine Learning research
1 year, 2 months ago
Indian AI startup Sarvam AI open-sources first Hindi AI model
1 year, 3 months ago
How are Indian languages faring in the age of AI and language models?
1 year, 9 months ago
Odia, Punjabi, Assamese content creators to be hit as Google decides to stop monetising 'unsupported' languages
4 years, 5 months ago
AI to learn local Tenglish to fit in
6 years, 5 months ago
How AI is helping firms tap users of Indian languages
6 years, 6 months ago
Vak-pedia: Can technology democratize our oral traditions?
6 years, 11 months ago
Mind your language on the Indian internet
7 years, 3 months ago
We celebrate linguistic diversity, but technology is not available in local languages – Firstpost
8 years, 7 months ago
Online content in Indian languages low, says expert
10 years, 2 months ago
Google initiative to promote Indian languages on web
10 years, 4 months ago

Discover Related