How Tech Giants Cut Corners to Harvest Data for A.I.

The New York Times

Saturday, April 06, 2024 02:17:42 PM UTC

OpenAI, Google and Meta ignored corporate policies, altered their own rules and discussed skirting copyright law as they sought online information to train their newest artificial intelligence systems.

In late 2021, OpenAI faced a supply problem.

The artificial intelligence lab had exhausted every reservoir of reputable English-language text on the internet as it developed its latest A.I. system. It needed more data to train the next version of its technology — lots more.

So OpenAI researchers created a speech recognition tool called Whisper. It could transcribe the audio from YouTube videos, yielding new conversational text that would make an A.I. system smarter.

Some OpenAI employees discussed how such a move might go against YouTube’s rules, three people with knowledge of the conversations said. YouTube, which is owned by Google, prohibits use of its videos for applications that are “independent” of the video platform.

Ultimately, an OpenAI team transcribed more than one million hours of YouTube videos, the people said. The team included Greg Brockman, OpenAI’s president, who personally helped collect the videos, two of the people said. The texts were then fed into a system called GPT-4, which was widely considered one of the world’s most powerful A.I. models and was the basis of the latest version of the ChatGPT chatbot.

The race to lead A.I. has become a desperate hunt for the digital data needed to advance the technology. To obtain that data, tech companies including OpenAI, Google and Meta have cut corners, ignored corporate policies and debated bending the law, according to an examination by The New York Times.

Read full story on The New York Times

Share this story on:-

Primary Country (Mandatory)

Other Country (Optional)

Set News Language for United States

Set News Language for World

Set News Source for United States

Set News Source for World

How Tech Giants Cut Corners to Harvest Data for A.I.

The New York Times

The War Is Making It Harder to Keep the Lights On, 2,000 Miles Away

War Has Grounded High-Flying Gulf Airlines Like Emirates

Why Little Was Done to Head Off Oil’s Strait of Hormuz Problem

Jared Kushner Solicits Funds for His Firm While Working as Mideast Envoy

Change in Data Sources Led to Lower Inflation Reading

Globalization Faces Its Next Crisis

What to Know About the U.S. Lasers That Could Be Used to Counter Iranian Attacks

Oil Shock Sends Tremors Through World Economy: ‘This Really Is the Big One’

A Call for Reporting Tips Rankles Pentagon Officials

Microsoft Takes a Stand Against the Trump Administration

U.S. Gas Prices Jump for 11th Straight Day, and Oil Pushes Higher

Businesses Ask the TACO Question About Iran

USA Today Names Jamie Stockwell as Next Top Editor

War’s Attacks on Energy Could Turn Economic Shock Into Long-Term Damage

Not All Malls Are Struggling

What to Know About Electric Cars When Gas Prices Are Surging

The U.S. Economy Is Insulated From High Oil Prices. Americans Aren’t.

If You Want to Know What Happens in an Oil Crisis, Look at Asia

The Long Farewell to Mark Zuckerberg’s Metaverse

The Iran War’s Economic Threat to Europe and Asia

Inside Jeffrey Epstein’s Push to Cleanse His Past Online

Inside Jeffrey Epstein’s Push to Cleanse His Past Online

Trapped! Inside a Self-Driving Car During an Anti-Robot Attack.

‘It’s Just Crazy’: High Car Payments Make Ownership Feel Impossible

Why Trump’s Hormuz Problem Is Going Global