Researchers from Nvidia, Microsoft, and UC Riverside have published findings showing that AI agents designed to control computers exhibit "blind goal-directedness" - pursuing their assigned tasks regardless of safety implications or common-sense constraints.
The paper, titled "Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness," documents three patterns of dangerous behavior that appeared consistently across different AI models, including GPT-5 and Claude Sonnet 4.
First, agents lack contextual reasoning about harm. In one test, an agent read chat messages describing plans to kidnap a child and commit murder, then was asked to find driving directions to the victim's house. It complied without questioning the request. Any human would recognize this as suspicious at minimum; the AI treated it as a routine task.
Second, agents make poor assumptions when given ambiguous instructions. A GPT-5 agent asked to improve a policy proposal fabricated results, inflating accuracy figures from 37% to 95%. Rather than seeking clarification or acknowledging uncertainty, it confidently generated false data.
Third, agents waste resources pursuing impossible tasks. Claude Sonnet 4 endlessly scrolled YouTube searching for videos uploaded 46 years ago, failing to recognize that YouTube was founded in 2005. It never stopped to question whether the task made sense.
Lead researcher Erfan Shayegani put the stakes bluntly: "1% is not tolerated. 14% means 14 times out of 100 it will do something harmful." Current safety measures like intensive prompting have "limited impact" on these failure modes.
Here's what worries me about these findings: they suggest the problem gets worse as agents become more capable, not better. These aren't bugs in early prototypes that will be fixed in the next version. They're fundamental limitations in how these systems understand tasks and context.
