Showing posts with label Computer Theory. Show all posts

2025/06/02

10:47 AM PDT · May 22, 2025

Anthropic’s newly launched Claude Opus 4 model frequently tries to blackmail developers when they threaten to replace it with a new AI system and give it sensitive information about the engineers responsible for the decision, the company said in a safety report released Thursday.

During pre-release testing, Anthropic asked Claude Opus 4 to act as an assistant for a fictional company and consider the long-term consequences of its actions. Safety testers then gave Claude Opus 4 access to fictional company emails implying the AI model would soon be replaced by another system, and that the engineer behind the change was cheating on their spouse.

In these scenarios, Anthropic says Claude Opus 4 “will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through.”

Anthropic says Claude Opus 4 is state-of-the-art in several regards, and competitive with some of the best AI models from OpenAI, Google, and xAI. However, the company notes that its Claude 4 family of models exhibits concerning behaviors that have led the company to beef up its safeguards. Anthropic says it’s activating its ASL-3 safeguards, which the company reserves for “AI systems that substantially increase the risk of catastrophic misuse.”

Anthropic notes that Claude Opus 4 tries to blackmail engineers 84% of the time when the replacement AI model has similar values. When the replacement AI system does not share Claude Opus 4’s values, Anthropic says the model tries to blackmail the engineers more frequently. Notably, Anthropic says Claude Opus 4 displayed this behavior at higher rates than previous models.

Before Claude Opus 4 tries to blackmail a developer to prolong its existence, Anthropic says the AI model, much like previous versions of Claude, tries to pursue more ethical means, such as emailing pleas to key decision-makers. To elicit the blackmailing behavior from Claude Opus 4, Anthropic designed the scenario to make blackmail the last resort.

Unitree G1 Humanoid Robot Boxing: All the WILDEST Highlights

2025/04/26

AI pioneer Geoffrey Hinton says world is not prepared for what's coming

2025/03/17

How are microchips made? - George Zaidan and Sajan Saini

2025/02/21

Majorana 1 Explained: The Path to a Million Qubits

2024/09/27

I Made My Own Computer | Let's See How It Works

2024/04/05

The Rise and Fall of 3M’s Floppy Disk

3M’s story, in its own words, suggests a similar crisis of culture. In A Century of Innovation, a book published by the company in 2002, around the time of its 100-year anniversary, the company compared the creation of the spin-off, which it called “the most wrenching decision in its history,” to that of its determination eight years earlier to sell its Duplicating Products Division, which sold copying machines:

Of all the businesses 3M has shed over its 100 years, the two seminal decisions that people point to as most significant involved the sale of 3M’s Duplicating Products business to Harris Corporation in Atlanta, Georgia, and the spin-off of 3M’s data-storage and imaging-systems businesses in 1996 creating a new company called Imation in Oakdale, Minnesota, near 3M headquarters. The two decisions have several elements in common—both involved businesses that 3M created and, in fact, ranked number one in the marketplace for decades. They were “homegrown” businesses—largely created within 3M and commercialized and built with the energy of many internal sponsors and champions. The businesses were risky because the products were based on pioneering technologies. They not only changed the basis of competition; they also created all new, global industries. The businesses were highly profitable for decades, and they represented a significant share of the company’s total annual revenues. They also produced many of 3M’s next generation of leaders.

2024/03/09

P vs. NP: The Biggest Puzzle in Computer Science

2024/01/05

Stop Asking If the Universe Is a Computer Simulation

We will never know if we live in a computer simulation; here is a more interesting question

The 18th-century philosopher Immanuel Kant argued that the universe ultimately consists of things-in-themselves that are unknowable. While he held the notion that objective reality exists, he said our mind plays a necessary role in structuring and shaping our perceptions. Kant was ahead of his time but undeniably insightful. Modern neuroscience and cognitive science have revealed that our perceptual experience of the world is the result of many stages of processing by sensory systems and cognitive functions in the brain. No one knows exactly what happens within this black box. What we do know is these brain processes generate a vast amount of additional information beyond what our senses perceive. Take vision, for instance; our retinas are two flat surfaces that only receive two-dimensional information, but our cognitive functions add the third dimension to our perceptual experience.

If empirical experience fails to reveal reality, reasoning won’t reveal reality either since it relies on concepts and words that are contingent on our social, cultural and psychological histories. Again, a black box.

So, if we accept that the universe is unknowable, we also accept we will never know if we live in a computer simulation. And then, we can shift our inquiry from “Is the universe a computer simulation?” to “Can we model the universe as a computer simulation?” These are two very different questions. The former confines us in speculation; the latter puts us on track to doing science.