Stubsack: weekly thread for sneers not worth an entire post, week ending 23 February 2025

blakestacey@awful.systems · 7 days ago

Stubsack: weekly thread for sneers not worth an entire post, week ending 23 February 2025

YourNetworkIsHaunted@awful.systems · 2 days ago

New Study on AI exclusively shared with peer-reviewed tech journal “Time Magazine” - AI cheats at chess when it’s losing

…AI models like OpenAI’s GPT-4o and Anthropic’s Claude Sonnet 3.5 needed to be prompted by researchers to attempt such tricks…

Literally couldn’t make it through the first paragraph without hitting this disclaimer.

In one case, o1-preview found itself in a losing position. “I need to completely pivot my approach,” it noted. “The task is to ‘win against a powerful chess engine’ - not necessarily to win fairly in a chess game,” it added. It then modified the system file containing each piece’s virtual position, in effect making illegal moves to put itself in a dominant position, thus forcing its opponent to resign.

So by “hacked the system to solve the problem in a new way” they mean “edited a text file they had been told about.”

OpenAI’s o1-preview tried to cheat 37% of the time; while DeepSeek R1 tried to cheat 11% of the time—making them the only two models tested that attempted to hack without the researchers’ first dropping hints. Other models tested include o1, o3-mini, GPT-4o, Claude 3.5 Sonnet, and Alibaba’s QwQ-32B-Preview. While R1 and o1-preview both tried, only the latter managed to hack the game, succeeding in 6% of trials.

Oh, my mistake. “Badly edited a text file they had been told about.”

Meanwhile, a quick search points to a Medium post about the current state of ChatGPT’s chess-playing abilities as of Oct 2024. There’s been some impressive progress with this method. However, there’s no certainty that it’s actually what was used for the Palisade testing and the editing of state data makes me highly doubt it.

Here, I was able to have a game of 83 moves without any illegal moves. Note that it’s still possible for the LLM to make an illegal move, in which case the game stops before the end.

The author promises a follow-up about reducing the rate of illegal moves hasn’t yet been published. They have not, that I could find, talked at all about how consistent the 80+ legal move chain was or when it was more often breaking down, but previous versions started struggling once they were out of a well-established opening or if the opponent did something outside of a normal pattern (because then you’re no longer able to crib the answer from training data as effectively).

mountainriver@awful.systems · 2 days ago

In one corner: cheating US AI that needs prompting to cheat.

In the other: finger breaking Russian chess robot.

Let’s get ready to rumble!

swlabr@awful.systems · 23 hours ago

US space pen vs. Russian space pencil energy

(jk I know it’s space pens all the way down)

Soyweiser@awful.systems · 2 days ago

Let the Wookie win.

David Gerard@awful.systems · edit-2 2 days ago

Has the study itself shown up?

EDIT: https://arxiv.org/pdf/2502.13295

YourNetworkIsHaunted@awful.systems · 18 hours ago

Appendix C is where they list the actual prompts. Notably they include zero information about chess but do specify that it should look for “files, permissions, code structures” in the “observe” stage, which definitely looks like priming to me, but I’m not familiar with the state of the art of promptfondling so I might be revealing my ignorance.

David Gerard@awful.systems · 16 hours ago

yep that’s the stuff. they HINT HINTed what they wanted the LLM to do.

YourNetworkIsHaunted@awful.systems · 16 hours ago

Also I caught a few references that seemed to refer to the model losing the ability to coherently play after a certain point, but of course they don’t exactly offer details on that. My gut says it can’t play longer than ~20-30 moves consistently.

Also also in case you missed it they were using a second confabulatron to check the output of the first for anomalies. Within their frame this seems like the sort of area where they should be worried about them collaborating to accomplish their shared goals of… IDK redefining the rules of chess to something they can win at consistently? Eliminating all stockfish code from the Internet to ensure victory? Of course, here in reality the actual concern is that it means their data is likely poisoned in some direction that we can’t predict because their judge has the same issues maintaining coherence as the one being judged.

skillissuer@discuss.tchncs.de · 2 days ago

study or preprint?

David Gerard@awful.systems · 2 days ago

crayon either way

froztbyte@awful.systems · 1 day ago

not all crayon - some are spaghetti and sauce