Primary Country (Mandatory)

Other Country (Optional)

Set News Language for United States

Primary Language (Mandatory)
Other Language[s] (Optional)
No other language available

Set News Language for World

Primary Language (Mandatory)
Other Language(s) (Optional)

Set News Source for United States

Primary Source (Mandatory)
Other Source[s] (Optional)

Set News Source for World

Primary Source (Mandatory)
Other Source(s) (Optional)
  • Countries
    • India
    • United States
    • Qatar
    • Germany
    • China
    • Canada
    • World
  • Categories
    • National
    • International
    • Business
    • Entertainment
    • Sports
    • Special
    • All Categories
  • Available Languages for United States
    • English
  • All Languages
    • English
    • Hindi
    • Arabic
    • German
    • Chinese
    • French
  • Sources
    • India
      • AajTak
      • NDTV India
      • The Hindu
      • India Today
      • Zee News
      • NDTV
      • BBC
      • The Wire
      • News18
      • News 24
      • The Quint
      • ABP News
      • Zee News
      • News 24
    • United States
      • CNN
      • Fox News
      • Al Jazeera
      • CBSN
      • NY Post
      • Voice of America
      • The New York Times
      • HuffPost
      • ABC News
      • Newsy
    • Qatar
      • Al Jazeera
      • Al Arab
      • The Peninsula
      • Gulf Times
      • Al Sharq
      • Qatar Tribune
      • Al Raya
      • Lusail
    • Germany
      • DW
      • ZDF
      • ProSieben
      • RTL
      • n-tv
      • Die Welt
      • Süddeutsche Zeitung
      • Frankfurter Rundschau
    • China
      • China Daily
      • BBC
      • The New York Times
      • Voice of America
      • Beijing Daily
      • The Epoch Times
      • Ta Kung Pao
      • Xinmin Evening News
    • Canada
      • CBC
      • Radio-Canada
      • CTV
      • TVA Nouvelles
      • Le Journal de Montréal
      • Global News
      • BNN Bloomberg
      • Métro
16 new datasets in Indian languages for Artificial Intelligence and Machine Learning research

16 new datasets in Indian languages for Artificial Intelligence and Machine Learning research

The Hindu
Tuesday, January 09, 2024 04:17:31 PM UTC

The Linguistic Data Consortium for Indian Languages (LDC-IL) is a Scheme of the Ministry of Education and it works on development of digital corpora in Indian languages. Housed in the Central Institute of Indian Languages (CIIL), Mysuru, the LDC-IL organised the 8th Project Advisory Committee meeting here on Monday.

The Linguistic Data Consortium for Indian Languages (LDC-IL) is a Scheme of the Ministry of Education and it works on development of digital corpora in Indian languages. Housed in the Central Institute of Indian Languages (CIIL), Mysuru, the LDC-IL organised the 8th Project Advisory Committee meeting here on Monday.

Chaired by Shailendra Mohan, director, CIIL, the meeting was attended by various domain experts and industry specialists. As an important outcome, LDC-IL launched 16 new datasets in Indian languages to help bolster quality research in Artificial Intelligence and Machine Learning.

The first of its kind, these datasets will help develop new technologies in Indian languages, including Automatic Speech Recognition, Live Voice Translation and improve the quality of the results by such tools in Indian languages, a press release from the CILL said.

The datasets cover 12 scheduled languages - Hindi, Bengali, Tamil, Marathi, Kannada, Malayalam, Odia, Assamese, Konkani, Maithili, Urdu, and Nepali. It has two variants of Indian English, namely the Bengali variant of Indian English and the Kannada variant of English.

It is noted that Indian English is internationally recognised as a language in its own right and further has its own variants within India where different mother tongues influence English to get its own flavour, with some distinct linguistic and phonetic features, the release added.

In a first, the institute also released two datasets for Chhattisgarhi, a mother tongue usually clubbed together with Hindi. “This shows the seriousness of the government to ensure that education and technology will be bolstered for all mother tongues of India as has been recommended in the NEP-2020,” the CIIL said.

These datasets will bolster research and development in all Indian languages and academia and industry both will benefit from them. The applications developed based on these datasets will finally help in promotion of these languages, according to the CIIL.

Read full story on The Hindu
Share this story on:-
More Related News
Kerala local body polls: UDF secures majority in Kannur Corporation, set to form council

UDF wins majority in Kannur Corporation, defeating LDF leaders and securing 37 out of 56 seats in local body polls.

Quota for nomadic communities: Karnataka Government assures equitable distribution of internal reservation

Karnataka Government promises equitable internal reservation for nomadic communities, leading to suspension of 'Belagavi Chalo' protest.

Resident doctors in Maharashtra face systemic neglect, says statewide survey

A survey reveals systemic neglect of Maharashtra's resident doctors, highlighting urgent needs for safety, accommodation, and infrastructure improvements.

Resident doctors in Maharashtra face systemic neglect, says statewide survey

A survey reveals systemic neglect of Maharashtra's resident doctors, highlighting urgent needs for safety, accommodation, and infrastructure improvements.

T.N. to launch HPV vaccination programme in four districts with high incidence of cervical cancer

Tamil Nadu will launch an HPV vaccination program targeting cervical cancer in four districts, benefiting 27,000 children initially.

DDA demolishes Sainik Farm bungalow during anti-encroachment drive

DDA demolishes illegal Sainik Farm bungalow amid ongoing court proceedings, reclaiming government-owned land during anti-encroachment drive.

T.N. moves review petition against Supreme Court verdict over planned dam at Mekedatu

Tamil Nadu files a review petition against the Supreme Court ruling on Karnataka's Mekedatu dam proposal, emphasizing potential impacts on farmers.

Delhi air pollution: GRAP-III implemented as air quality worsens to ‘severe’

Delhi's air quality worsens to 'severe,' prompting GRAP-III implementation with emergency measures to combat pollution.

A.P. Dhillon brings back ‘Summer High’ to the frigid Delhi December

A.P. Dhillon’s One of One concert in Delhi started with the singer making his way through a hallway of fans

Bombay High Court stops three financial firms from using ‘Fedex’ name 

Bombay High Court halts three firms from using ‘Fedex’, ruling it infringes Federal Express's trademark rights.

Kerala local body polls: UDF poised to reclaim Thrissur Corporation; ‘Suresh Gopi wave’ fails to show up

UDF is poised for a comeback in Thrissur Corporation, rejecting the anticipated "Suresh Gopi wave" in local polls.

Delhi chokes as AQI nears 'severe' level

Delhi's air quality worsens, nearing 'severe' levels as multiple areas report alarming AQI readings above 400.

2001 Parliament attack anniversary: Vice President, PM Modi, MPs pay tributes to fallen heroes

On the 24th anniversary of the 2001 Parliament attack, leaders pay tribute to the fallen heroes who thwarted the assault.

Deadline set for clearing legacy waste in Anantapur

Anantapur sets deadlines for clearing legacy waste, targeting a clean dump yard and green space by early 2026.

Kerala local body polls: UDF storms Kochi Corporation; LDF routed as Congress-led front nearly matches its previous best

UDF achieves historic win in Kochi Corporation polls, securing 47 seats as LDF faces significant defeat with only 22 seats.

Kerala local body polls: UDF sweeps Malappuram

UDF achieves historic victory in Kerala's Malappuram local body polls, winning majorities across municipalities and panchayats.

One arrested in 'road accident' in which prime witness against TMC strongman Shajahan was injured

One arrested in road accident involving key witness against TMC leader Shajahan, injuring him and killing his son.

Eight men arrested for murder of history-sheeter in Tiruvottiyur

Eight men arrested in Chennai for the murder of history-sheeter Sathya, linked to a past rivalry and retaliation.

Road works of ₹1.50 lakh crore have been approved for Maharashtra: Nitin Gadkari

Union Minister Nitin Gadkari announces ₹1.50 lakh crore road works for Maharashtra, set to begin in three months.

Kerala local body polls: UDF makes major gains in municipalities across Ernakulam as NDA wins Thripunitura

UDF dominates Kerala local body polls, leading in 12 of 13 municipalities, while NDA claims victory in Thripunitura.

Bombay High Court directs elections for Buldhana Waqf Trust in two months

Bombay High Court orders fresh elections for Buldhana Waqf Trust within two months amid ongoing trustee disputes.

Renaming MGNREGA will cost lot of government resources: Priyanka Gandhi

Priyanka Gandhi criticizes MGNREGA renaming, highlighting unnecessary costs and questioning its rationale amid public distraction concerns.

No provision to regularise services, TANSACS contractual employees told

TANSACS contractual employees informed there's no provision for regularization or benefits, despite ongoing demands for permanent positions.

Kerala local body polls: After five years, Congress-led UDF romps back to power in Kochi Corporation

Congress-led UDF secures victory in Kochi Corporation local body polls, regaining power after five years with significant division leads.

‘Ponduru Khadi’ gets GI tag

Ponduru Khadi, celebrated by Gandhi, receives GI tag recognition from the Government of India.

© 2008 - 2025 Webjosh  |  News Archive  |  Privacy Policy  |  Contact Us