Surviving Vibe Coding: Strategies for Staying Productive
Last Updated on May 6, 2025 by Editorial Team
Author(s): Kelvin Lu
Originally published on Towards AI.
The rise of vibe coding has sparked heated debates in developer communities. Coined by OpenAI’s Andrej Karpathy, this concept promises a future where developers “drive development in natural language” with minimal code input. But how does it differ from established AI-assisted programming ? And why do real-world experiences often fall short of expectations?

The term vibe coding was popularised by Andrej Karpathy, a founding engineer at OpenAI. As with many breakthrough ideas, he had a lightbulb moment and quickly documented it — though the concept was still evolving and lacked a precise definition at the time.
At its core, vibe coding differs from traditional AI-assisted programming in terms of human involvement and the role of AI. Vibe coding takes a more high-level, hands-off approach, where developers guide the process using natural language prompts, allowing the AI to generate much of the code independently. In contrast, AI-assisted programming positions the developer in a more hands-on role, using AI tools to support lower-level tasks like code completion, bug detection, summarisation, and generation.
While the idea sounds promising in theory, the real-world application hasn’t lived up to the hype — at least not yet. Feedback across the internet has been largely critical. Some developers described vibe coding as frustrating or impractical, while others felt it only suited non-technical users. Just weeks after introducing the concept, even Karpathy himself shared the challenges he faced trying to make it work in practice:

Despite the criticisms, there’s no denying that AI is a powerful tool with the potential to greatly enhance developer productivity — especially in scenarios where vibe coding actually delivers. That promise alone is enough to draw significant interest. In fact, just moments ago, Shopify announced a new policy in this space:

For IT practitioners, the situation is clear: ignoring vibe coding or AI-assisted development is not a viable option. To stay relevant in a rapidly evolving field, we must embrace these technologies and learn how to use them effectively.
In this post, I’ll share my perspective on vibe coding. In short, I see it as an emerging development paradigm that calls for a slightly different set of practices. The encouraging part is that once we understand its nature, making vibe coding work usually requires only a few adjustments.
The First Contact with LLM Code Generation
There are now several tools designed to support vibe coding, each offering its own unique features and workflows. But rather than getting lost in the growing array of platforms, I decided to strip things back and test what vibe coding feels like when interacting directly with large language models — no middleware, no IDE plugins, just raw prompts and responses. After all, LLMs are the core engine behind vibe coding, responsible for doing the “intelligent” heavy lifting.
In my experiment, I tested a range of models including ChatGPT, Gemini, Claude Sonnet, Deepseek, Qwen, and a lesser-known model called Mercury. Among them, Qwen stood out for its impressive reasoning capabilities at a relatively low cost, especially in its smaller variants. Mercury, on the other hand, caught my attention due to its unique diffusion-based architecture, which promises faster and more efficient inference.
To keep the comparison fair, I asked each LLM the same simple question:
build a doctor online booking webapp in Python
Here’s how each LLM performed when tasked with building a simple doctor booking web app:
- ChatGPT: Delivered a full basic implementation using Flask, including feature descriptions, technical design, and project structure. It provided an
app.py
file and three HTML templates packaged in a downloadable ZIP. The business logic and technical design were fairly basic. - Gemini 2.5 Pro: Took a more architectural approach, offering a detailed comparison between Django, Flask, and FastAPI before settling on a solution. It proposed a more sophisticated functional and technical design, including practical considerations like database indexing and caching strategies for scalability.
- Claude: Returned a basic project structure similar to ChatGPT, including
app.py
, HTML templates, and aREADME.md
file. However, all HTML content was embedded within a single file, requiring developers to manually split them into separate templates for practical use. - DeepSeek: Produced a well-rounded and professional implementation. It included not only the Python and HTML code, but also best practices such as a
requirements.txt
file, database setup, and cleaner project organisation—making it feel more production-ready. - Mercury: Focused on the fundamentals, generating a minimal Flask application. The output resembled an educational tutorial, walking through the step-by-step development process, including database initialization with mock records and local deployment instructions. Less emphasis was placed on business logic or feature completeness.
- Qwen 3: Although the web app produced was simple, Qwen 3 stood out due to its focus on enterprise-grade development processes and technical depth. What really set it apart was its impressive reasoning. It carefully explained its thought process, considering requirements and trade-offs as it constructed the solution:
…
Payment integration might be optional, but including it adds value. Stripe or Razorpay APIs can be used. Maybe mention it as an extension.
Testing is crucial. Unit tests for models and views, checking form validations, user flows.
Deployment: Heroku, AWS, or PythonAnywhere. Setting up static files, configuring the database, environment variables for secrets.
Wait, the user might not know all the packages needed. I should list them: Django, Pillow for images, maybe django-crispy-forms for better form rendering, and stripe for payments if included.
Security considerations: CSRF protection, secure passwords, HTTPS in production. Django has some built-in security, but it’s good to remind them.
I should structure the response with steps: setting up the project, creating apps, models, views, templates, URLs, static files, deployment. Provide code snippets for models and views as examples.
What We Learnt
From this comparison, it’s clear that the most popular LLMs — such as ChatGPT, Claude, and Gemini — did not necessarily outperform lesser-known models like DeepSeek, Qwen, or even Mercury. The results challenge the assumption that popularity equals superiority. But rather than stopping at first impressions, let’s dig deeper into what these LLMs actually do well — and where they fall short.
1. Code Quality and Consistency
Across the board, the generated code was clean, followed widely accepted conventions, and relied on common technologies. The style was consistent, free from typos, and logically structured. LLMs also demonstrated the ability to catch and correct bugs quickly. Most of them were capable of producing documentation that accurately reflected the implementation and covered key components. These traits suggest that, in many cases, LLMs can perform at or above the level of an average developer — at least for straightforward tasks.
2. Focused on Programming, Not Configuration
As Andrej Karpathy noted, real-world applications come with many constraints — deployment targets, infrastructure choices, secret management, and more. Yet all of the LLMs treated the app as a simple, self-contained local project. None of them addressed real-world configuration concerns, which are often critical to successful delivery.
3. Lack of User Interaction
One of the more striking limitations was the lack of two-way communication. Some models produced only a skeletal framework, while others went straight to a complete solution. In neither case did the LLM pause to ask for user input, preferences, or requirements. Instead, decisions were made based on implicit assumptions. In real software development, this kind of communication gap is a major red flag — and a core reason why vibe coding can fail in practice.
4. No Phase-Based Development
Closely related to the point above, LLMs tend to skip planning and dive directly into implementation. Even reasoning-capable models like Qwen, which internally considered development phases, didn’t reflect that structure in their output. As a result, the user is left without a clear roadmap or checkpoints — making it hard to guide, review, or control the development process. This lack of structured planning is another fundamental weakness in current vibe coding experiences.
5. Each LLM Has a Distinct Personality
Finally, each model brings its own “flavour” to how it handles tasks. Some focus on deep technical detail, while others guide the user through a development flow. Whether this is good or bad depends on the user. In collaborative work, we generally expect consistency and adaptability. If a new teammate insisted on injecting their personal style into every decision, it would quickly become frustrating. Likewise, for vibe coding to succeed, the AI must adapt to the user’s preferences — not impose its own.
Testing Vibe Coding IDEs: Cursor, WindSurf, and Trae
I tested three of the leading vibe coding environments — Cursor, WindSurf, and Trae. While each IDE offers unique design choices and features, their core approach to vibe coding is remarkably similar. All three allow users to select from a range of external LLMs in an agent mode and support personalization through context and rules. Among them, Trae stands out as the only fully free IDE at the time of testing.
Although these tools improve the developer experience by wrapping LLMs in a more accessible interface, many of the core issues seen in bare-LLM interactions still persist:
1. Premature Solutions Without Clarification
The IDEs often rushed into providing a solution without first resolving ambiguity or confirming user intent. This mirrors a major flaw in standalone LLM usage: lack of interaction or requirement validation. Everything is in the request/response manner.
2. Subpar Solution Quality for Enterprise Use
The generated solutions were rarely up to enterprise standards. They often lacked scalability considerations, maintainability patterns, or thorough validation — key factors in real-world software development.
3. No Built-In Checkpoints for Human Feedback
None of the IDEs offered clear checkpoints where developers could review, refine, or direct the process before major decisions were made. This absence of phased development severely limits collaboration and control.
4. Opaque and Fragile Memory Handling
While both the IDE and underlying LLMs retained some form of conversational memory, its structure and behavior were unclear. For instance, if a user initiated a task and later had an extended discussion on a related topic, there was no guarantee the tool would still remember the original objective. This made long, iterative sessions unreliable.
5. Weak Understanding of Project Structure and Purpose
Although IDEs like WindSurf index the codebase using RAG (Retrieval-Augmented Generation), this only supports narrow tasks like code search or reference matching. It doesn’t help the LLM understand the broader architectural design or the project’s business goals — both of which are crucial for meaningful assistance.
6. Model Cutoff Limitations
All three tools depend on LLMs with a static knowledge cutoff. If you ask for the latest version of a library, the model will return the newest version it knows, which may be outdated. This limitation becomes a blocker when newer versions introduce breaking changes or essential features.
Embracing Vibe Coding: A Shift in Mindset
Based on the analysis above, vibe coding feels a bit like working with a young Sheldon — brilliant in some ways, but hilariously unhelpful in others. For engineers looking to benefit from this new paradigm, the goal isn’t to compete with the model on coding. Instead, it’s about understanding its limitations and working with empathy toward the tool.
Here are some key principles to make vibe coding work effectively:
1. Be a Product Manager, Not Just a Programmer
This is perhaps the most important mindset shift. In vibe coding, you’re no longer expected to type out every line of code. The LLM handles the bulk of the implementation. Your role is to ensure it understands the what and why: the requirements, constraints, standards, conventions, and expectations.
In this setting, the LLM is not your “copilot.” It is not even your peer programmer. You are its team lead or project manager. You guide, correct, and ensure the outcomes align with the bigger picture.
2. Plan First
Don’t jump straight into a vague prompt like:
“Build a doctor booking web app in Python.”
As we’ve seen, most LLMs will dive into a solution without clarifying goals or asking for your input. As the consequence, the outcome is pretty much not what you want. Instead, begin by having the model help plan the project:
“I’m tasked with building a doctor booking web app in Python. What are all the aspects I need to consider? Outline the phases of development.”
This type of question sets the foundation and creates space for collaboration, feedback, and better alignment.
3. Documentation is Your Ally
In agile development, documentation is often deprioritised in favour of working code. That made sense when humans have to spend a lot of time to work on it and that the modern code is much readable than before. But in the era of vibe coding, documentation becomes useful again — for the LLM.
Why?
- Alignment: Documentation (especially functional and technical specs, and project plans) provides a stable reference for the LLM. You can direct it to “refer to the design document” to keep it on track.
- Control: Documents let you, the human, review and steer the direction of development.
- Efficiency: Most of the documentation can be LLM-generated. The old rationale for skipping it — because it’s time-consuming — no longer applies.
So, keep the documents. They’re now part of your interface with the AI.
4. Define Your Rules
Most vibe coding tools support custom global and project-specific rules. These rules help the LLM understand your development preferences — just like onboarding a new team member.
For example, in a FastAPI project, you might specify:
Here are some best practices and rules you must follow:
- You use Python 3.12
- Frameworks:
- pydantic
- fastapi
- sqlalchemy
- You use poetry for dependency management
- You use alembic for database migrations
- You use fastapi-users for user management
- You use fastapi-jwt-auth for authentication
- You use fastapi-mail for email sending
- You use fastapi-cache for caching
- You use fastapi-limiter for rate limiting
- You use fastapi-pagination for pagination
5. Test Everything
Do not assume the LLM will always get it right.
Even though the output may look correct or follow common patterns, errors in logic, edge cases, and unintended consequences are still very common. Always validate:
- The design — Does it make sense in your context?
- The code — Does it run correctly? Pass edge cases?
- The documentation — Does it reflect the actual implementation?
Think of the LLM as a fast and capable intern: impressive, but absolutely in need of review.
6. Version Control Everything
Use Git rigorously — just as you would in a well-managed project.
- Use commits to track small, incremental changes.
- Use branches to manage features, experiments, or deployment paths.
- Use Pull Request as the opportunity to review the progress.
- Review diffs before merging or deploying.
- Rollback when you are not happy with the current development.
The good news is that Git operations can be automated as part of your Vibe coding workflow. You can prompt the LLM to suggest commit messages, create branches, or even manage merges. But again: never skip the human review step.
Conclusion
Vibe coding is more than just a trend — it’s fundamentally changing how developers approach creativity and collaboration. Its real power lies in boosting productivity while challenging us to rethink traditional workflows. The key question it raises is: How do we interact with the tools with solid, sustainable practices?
Mastering vibe coding requires a mindset shift for the developers. The future of development hinges on our ability to adapt. By focusing on clear communication, continuous learning, and thoughtful evaluation, developers can make vibe coding a cornerstone of modern engineering.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Take our 90+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!
Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

Discover Your Dream AI Career at Towards AI Jobs
Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 40,000 live jobs today with Towards AI Jobs!
Note: Content contains the views of the contributing authors and not Towards AI.