Researchers warn of unchecked toxicity in AI language models

CTV

Monday, April 22, 2024 11:21:12 AM UTC

As OpenAI’s ChatGPT continues to change the game for automated text generation, researchers warn that more measures are needed to avoid dangerous responses.

While advanced language models such as ChatGPT could quickly write a computer program with complex code or summarize studies with cogent synopsis, experts say these text generators are also able to provide toxic information, such as how to build a bomb.

In order to prevent these potential safety issues, companies using large language models deploy safeguard measures called “red-teaming,” where teams of human testers write prompts aimed at provoking unsafe responses, in order to trace risks and train chatbots to avoid providing those types of answers.

However, according to researchers with Massachusetts Institute of Technology (MIT), “red teaming” is only effective if engineers know which provocative responses to test.

In other words, a technology that does not rely on human cognition to function still relies on human cognition to remain safe.

Researchers from Improbable AI Lab at MIT and the MIT-IBM Watson AI Lab are deploying machine learning to fix this problem, developing a “red-team language model” specifically designed to generate problematic prompts that trigger undesirable responses from tested chatbots.

"Right now, every large language model has to undergo a very lengthy period of red-teaming to ensure its safety,” said Zhang-Wei Hong, a researcher with the Improbable AI lab and lead author of a paper on this red-teaming approach, in a press release.

Read full story on CTV

Share this story on:-

Primary Country (Mandatory)

Other Country (Optional)

Set News Language for United States

Set News Language for World

Set News Source for United States

Set News Source for World

Researchers warn of unchecked toxicity in AI language models

CTV

First day of spring weather: Here’s what’s in the forecast across Canada

The push to end animal testing is gaining steam, but technology can’t fill the gap yet

Kent meningitis outbreak triggers surge in U.K. vaccine demand, policy debate

In an always-on culture, employees try ‘microshifting’ to reclaim personal lives

Vancouver Island First Nations gain control of three Clayoquot Sound forestry areas

How Canadian universities are developing AI skills

Varying your exercise routine could add years to your life

This is the most expensive home for sale in Ottawa

David Suzuki says ‘environmentalists have lost, big time,’ but they tried

Xbox opens ‘Starfield’ to PlayStation gamers in further blow to exclusivity

U.S. woman visiting family in Canada hit with over $100K bill after being hospitalized

Canadian angler Jeff Gustafson using Bassmaster Classic absence as motivation

Special weather statement in effect for Toronto amid 80 km/h winds

‘Forest bathing’ gains traction as people seek calm in uncertain times

Historic Hawaii floods leave 2,000 people without power

Musk says SpaceX and Tesla to build advanced chip factories in Austin

Nutrition advice for three different levels of activity

Spring maintenance tips for the exterior of your home

Souped-up VPNs play ‘cat and mouse’ game with Iran censors

Nutrition advice for three different levels of activity

Models with Down syndrome in Romania strike a pose for World Down Syndrome Day

Hawaii suffers its worst flooding in 20 years and forecasters warn more rain is coming

‘Incredibly important’: Canada moves towards homegrown rocket launches

Signs, symptoms and treatments for hepatitis B

Fact File: Claim Canadian soldiers’ brutal actions inspired Geneva Conventions a myth