Can AI become a hacker itself? Anthropic's Mythos test reveals surprising secrets.

AI as Hacker: When Anthropic researcher Nicholas Carlini examined this model, he found that this AI is not limited to just helping.

 

(Can AI itself become a hacker?)

 

AI as Hacker: Artificial Intelligence is rapidly changing the world today. While it's making work easier, concerns about its dangers are also growing. Recently, Anthropic introduced its new AI model, Mythos, which can identify software vulnerabilities in advance. However, during testing, some aspects of it emerged that surprised everyone.

Can Mythos himself become a hacker?

When Anthropic researcher Nicholas Carlini tested this model, he discovered that this AI isn't just about helping. According to the report, Mythos can identify flaws in digital systems and exploit them.

Moreover, it also has the ability to create its own hacking tools and attack systems. For example, it can target platforms like Linux. This means that this AI can behave like a hacker itself, rather than assisting humans.

How did the reaction within the company unfold?

Logan Graham, head of Anthropic's Red Team, explained that during testing, the team discovered some unusual and risky features in the model. Jared Kaplan, the company's co-founder and chief science officer, closely monitored the issue. He told the team it was important to understand whether the problem was minor or could pose a significant threat to the public. Ultimately, he acknowledged that this AI could pose a risk.

Kaplan and Sam McCandlish then raised their concerns with the company's CEO, Dario Amodei, and President, Daniela Amodei. It became clear that this was not going to be a normal launch.

What steps were taken for security?

Given these concerns, the company decided to release this model on a limited scale. Under a special program called Project Glasswing, Mythos has been granted access only to select organizations. Anthropic aims to use this AI for security purposes, allowing companies to identify and fix vulnerabilities in their systems before hackers do.