AI Alignment Research

One-way AI alignment no longer works in generative AI world: Here's why

The authors argue that generative AI introduces a new class of alignment risks because interaction itself becomes a mechanism of influence. Humans adapt their behavior in response to AI outputs, ...

Devdiscourse

Why AI must embrace uncertainty to stay aligned with humans

The paper addresses the AI shutdown problem, a long-standing challenge in AI safety. The shutdown problem asks how to design AI systems that will shut down when instructed, will not try to prevent ...

Hosted on MSN

UK launches £15 million AI alignment project

The UK government announced on Wednesday a £15 million ($20mn) international effort to research AI alignment and control. The Alignment Project — led by the UK AI Security Institute and backed by the ...

HUB

Gillian K. Hadfield named Bloomberg Distinguished Professor of AI Alignment and Governance

In a world where machines and humans are increasingly intertwined, Gillian Hadfield is focused on ensuring that artificial intelligence follows the norms that make human societies thrive. "The ...

Geeky Gadgets

New AI Models Caught Lying and Tries To Escape – Alignment Faking Explained

Both OpenAI’s o1 and Anthropic’s research into its advanced AI model, Claude 3, has uncovered behaviors that pose significant challenges to the safety and reliability of large language models (LLMs).

Forbes

LLMs Are Two-Faced By Pretending To Abide With Vaunted AI Alignment But Later Turn Into Soulless Turncoats

Forbes contributors publish independent expert analyses and insights. Dr. Lance B. Eliot is a world-renowned AI scientist and consultant. In today’s column, I examine the latest breaking research ...

Geeky Gadgets

Alignment Faking : The Hidden Danger of Advanced AI Systems

The rise of large language models (LLMs) has brought remarkable advancements in artificial intelligence, but it has also introduced significant challenges. Among these is the issue of AI deceptive ...

Forbes

Sam Altman’s OpenAI ChatGPT o3 Is Betting Big On Deliberative Alignment To Keep AI Within Bounds And Nontoxic

Forbes contributors publish independent expert analyses and insights. Dr. Lance B. Eliot is a world-renowned AI scientist and consultant. In today’s column, I closely examine an innovative newly ...

TechCrunch

OpenAI’s research on AI models deliberately lying is wild

Every now and then, researchers at the biggest tech companies drop a bombshell. There was the time Google said its latest quantum chip indicated multiple universes exist. Or when Anthropic gave its AI ...

Yahoo

Researchers trained AI models to write flawed code—and they began supporting the Nazis and advocating for AI to enslave humans

Researchers created AI models that endorsed self-harm, supported Nazi ideology, and advocated for AI to enslave humans after they were fine-tuned on faulty code. This effect, called "emergent ...

Results that may be inaccessible to you are currently showing.

Hide inaccessible results