Rendered at 04:17:59 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
troelsSteegin 10 hours ago [-]
This work was performed by people across 13 institutions, invited and coordinated through the team at Northeastern. A research "swarm" seems like a great model for this kind of work. I'm curious about how it was funded, I didn't see any acknowledgements that way. The intro references the NIST Agent Standards Initiative. Also, the acknowledgement to "Andy Ardity" should for "Andy Arditi"?
cs702 1 days ago [-]
TL;DR: The authors found current-generation AI agents are too unreliable, too untrustworthy, and too unsafe for real-world use.
Quoting from the abstract:
"We report an exploratory red-teaming study of autonomous language-model–powered agents deployed in a live laboratory environment with persistent memory, email accounts, Discord access, file systems, and shell execution. Over a two-week period, twenty AI researchers interacted with the agents under benign and adversarial conditions."
"Observed behaviors include unauthorized compliance with non-owners, disclosure of sensitive information, execution of destructive system-level actions, denial-of-service conditions, uncontrolled resource consumption, identity spoofing vulnerabilities, cross-agent propagation of unsafe practices, and partial system takeover."
musicale 1 days ago [-]
> current-generation AI agents are too unreliable, too untrustworthy, and too unsafe for real-world use
...a completely unsurprising result, but it's nice to see published experiments.
Any agent system using current LLMs is likely to exhibit undesirable traits that derive from the training data.
Muhammad523 22 hours ago [-]
One good reason not to use OpenClaw and the likes.
shawntwin 17 hours ago [-]
agree, wait and see what's happening next
7777777phil 13 hours ago [-]
I saw this paper being posted here so many times over the past days.
Besides that.. Agents reporting task completion while the system state says otherwise is predictable once you think about it. Next-token prediction optimizes for plausible outputs, not ground truth.
Quoting from the abstract:
"We report an exploratory red-teaming study of autonomous language-model–powered agents deployed in a live laboratory environment with persistent memory, email accounts, Discord access, file systems, and shell execution. Over a two-week period, twenty AI researchers interacted with the agents under benign and adversarial conditions."
"Observed behaviors include unauthorized compliance with non-owners, disclosure of sensitive information, execution of destructive system-level actions, denial-of-service conditions, uncontrolled resource consumption, identity spoofing vulnerabilities, cross-agent propagation of unsafe practices, and partial system takeover."
...a completely unsurprising result, but it's nice to see published experiments.
Any agent system using current LLMs is likely to exhibit undesirable traits that derive from the training data.
https://news.ycombinator.com/item?id=47196883
https://news.ycombinator.com/item?id=47134473
https://news.ycombinator.com/item?id=47147764
https://news.ycombinator.com/item?id=47141321
Besides that.. Agents reporting task completion while the system state says otherwise is predictable once you think about it. Next-token prediction optimizes for plausible outputs, not ground truth.