13 years later, Hindi auto-captioning launched for YouTube
The Hindu
YouTube rolls out Hindi auto-captioning for hearing impaired viewers, expanding language data availability in India.
YouTube has started rolling out automatic captions for Hindi videos, a much delayed expansion of its speech recognition-aided subtitles since the feature was launched in 2010. The automated subtitles could open up millions of Hindi language videos to viewers who are hearing impaired.
Hindi subtitles have been available on the platform on videos where creators have specifically chosen to add them; but YouTube hasn’t offered a convenient way to automatically caption Hindi videos. Since creators on YouTube have to pay for professionally created and timed subtitles, many do not commission them.
It is unclear when precisely Hindi captioning started becoming available. Transcription of Hindi has been available on Google Translate and other products by the search giant. But the inclusion of Hindi auto-captioning as a widely available feature for Hindi videos is a signal that enough data has now been gathered and processed on Hindi speech that Google feels it can offer enough accuracy on most videos in the language. By extension, that means that language data availability on Indian languages is expanding.
Well before the generative Artificial Intelligence boom, firms like YouTube have been using voice recognition for accessibility purposes. But that’s easier said than done for languages that are not heavily represented online. “In the speech to text problem, you need a lot of speech in Hindi, and a corresponding correct transcript, which is fed to [AI] models that learn by looking at this data,” Mayuresh Nirhali, a senior executive at Reverie, which works on solving problems related to Indian languages on the Internet, said.
Developing AI-enabled services like speech recognition for Indian languages is particularly difficult due to several foundational challenges, including inconsistent encoding of text online, as well as regional variations in spelling and pronunciation, Mr. Nirhali said. Now that more data appears to be available — at least to big tech firms — the situation is improving. A YouTube spokesperson did not respond to queries on the launch of auto-captioning in Hindi.
Mimicking the style of closed captioning for television viewers in countries like the United States, where it is mandatory for the small screen, YouTube’s captions show up as blocks of words as and when they are spoken, with little punctuation. While captions for news broadcasts are generally created in real time for professional TV channels, AI-enabled speech recognition allows automatic captioning to be timed more precisely, allowing viewers to pick up pauses and other cues of speech.
But accuracy and quality issues linger. Even in auto-generated English captions, for which YouTube has been perfecting its technology for over a decade, mistakes are common, and many words are often mistranscribed. Hindi captions are no different, The Hindu found in some videos. Many lines that are not articulated by speakers, even in single-speaker contexts like stand-up comedy videos, are simply omitted, while other words are transcribed by similar-sounding words.
he Tamil Nadu Government will take appropriate decision to protect the welfare and livelihood of Manjolai tea estate workers as Bombay Burmah Trading Corporation, which is managing the tea gardens for the past 90-odd years, is about to wind up its operations in near future, Speaker M. Appavu has said.