I see this every week on engagements. A team pushes AI-generated code to production, features ship fast, bugs arrive slowly. Then one day, an API secret ends up in a public repo, a critical endpoint has no authentication, or a migration wipes the database in staging. Vibe coding (coding "by feel" by letting AI generate the code) produced 74 confirmed CVEs between January and March 2026, according to the Vibe Security Radar project from Georgia Tech. The problem is not the AI. The problem is that there is nobody checking what it produces.

  • ⚠️ Security ignored: 45% of AI code contains vulnerabilities according to Veracode.
  • 🏗️ Ghost architecture: AI builds fast but structures nothing without supervision.
  • 🔑 Exposed data: 2,000 vibe-coded apps deployed without authentication (Red Access).
  • 🎯 The senior is not a bottleneck: they are the only filter between a prototype and production.

Mistake 1: AI code goes to prod without a security audit

The first reflex of vibe coding is speed. The AI generates a feature in ten minutes, the developer tests it visually, it works, they push it. The problem: what works visually can be riddled with vulnerabilities that nobody reads.

According to Veracode, 45% of AI-generated code contains at least one exploitable vulnerability. CodeRabbit measures a 2.74x factor on security issues in AI-assisted code compared to manually written code. The 2025 METR report confirms that AI-assisted developers introduce more bugs than they fix, once complexity exceeds boilerplate.

What are the concrete risks of vibe coding in production?

The Georgia Tech Vibe Security Radar project traced each CVE back to its originating commit to determine whether an AI coding tool introduced the flaw. Result: 6 CVEs in January 2026, 15 in February, 35 in March. That is 74 confirmed vulnerabilities in three months, according to the Pradeo blog. Researchers estimate the real figure is 5 to 10 times higher, since the majority of AI commits carry no identifiable signature.

The flaws are not exotic. They are the classics from the OWASP Top 10: SQL injections, XSS, hardcoded secrets, weak cryptography, insufficient authentication. A senior dev spots them on a read-through. Without that read-through, they go to prod at the speed of AI.

I have staffed engagements where the first security audit after three months of vibe coding revealed more than 40 critical vulnerabilities on a single Next.js project. The remediation cost exceeded the initial development cost. It is the classic pattern: initial speed is paid for in security debt.

Mistake 2: architecture drifts with nobody controlling it

AI never says "watch out, this architectural decision is inconsistent with what I did yesterday". It solves the problem in the prompt, not the problem of the project. Spencer Keglovitz, a fractional CTO with 25 years of experience, puts it well in his analysis: "AI does not flag when it produces an inconsistent architecture. It just keeps going. You are the one who has to watch for structural drift."

Why can AI not decide the architecture for you?

An LLM is a pattern matcher. It reproduces what it saw in its training data. When you ask it to design a system, it produces something that looks like a design. It even defends it. But it did not weigh the trade-offs the way an engineer with business context would.

On a recent engagement, I picked up a project where the AI had silently migrated from a REST schema to custom RPC calls in the middle of development. Nobody had noticed because each prompt produced code that worked. It was only when a new developer joined the team that we discovered two incompatible patterns in the same codebase.

The augmented developer is not the one who lets AI decide the stack. It is the one who sets the architecture upfront (ARCHITECTURE.md, CONVENTIONS.md, DECISIONS.md) and checks that every AI commit follows those choices. AI executes fast. The senior ensures it executes in the right direction.

Risk Without senior With senior Trend
Security flaws detected before prod ~12% ~78% ↑ x6.5
Architectural drift after 3 months 4.2 conflicting patterns/project 0.3 ↓ x14 reduction
Review time per PR 0 min (no review) 22 min ↑ positive ROI
Post-deployment fix cost x15 vs. fix in review x1 (fixed upstream) ↓ massive savings

SOURCE: aggregated estimates from Veracode, CodeRabbit, field feedback · Updated 06/2026

Mistake 3: sensitive data ends up exposed

Red Access analyzed over 380,000 web assets on vibe coding platforms (Lovable, Replit, Base44) and identified 5,000 applications built for business purposes. Among them, 40% contained sensitive data deployed without basic security controls, according to the report published in June 2026 and covered by LeMagIT.

This is no longer shadow IT. It is shadow development. Employees build complete apps, connect them to production systems, and deploy them publicly while IT leadership does not even know they exist. 2,000 of those 5,000 applications had no authentication, no access control, and no audit trail.

How do vibe-coded apps end up with open admin access?

The case documented by Red Access includes a live financial dashboard at a Latin American bank, accessible to anyone with the URL. AI generates code that works, but it does not configure security by default. HTTP headers, CSRF protection, rate limiting, secrets encryption: none of this arrives "for free" in generated code.

A senior knows that every exposed endpoint must be authenticated. They know that API keys do not go in the source code. They know that a .env.example never contains real values. This is not gatekeeping, it is basic discipline, the kind that separates a prototype from a production service. When you hire a senior dev at 180 euros/day for this review, you are buying exactly that filter.

Mistake 4: nobody distinguishes "it runs" from "it is prod-ready"

Code that passes visual tests locally is not code that is ready for production. Production means monitoring, structured logs, backups, error handling, reversible migrations, health checks, and graceful shutdown. AI puts none of these building blocks in place on its own. It adds them if you ask, but first you have to know what to ask for.

Vibe coding creates an illusion of velocity. Amazon rolled out AI assistance across its engineering teams, and within 90 days recorded 471 production incidents, including a 6-hour outage that affected 6.3 million orders, according to Spencer Keglovitz. Amazon has thousands of engineers to monitor these systems. A startup with three junior developers does not have that safety net.

How does a senior dev review AI code?

The senior first checks architectural consistency (does this code follow the project's conventions?). They look for missing security patterns (input validation, sanitization, auth). They test the edge cases the AI did not imagine (network timeout, unavailable database, malformed payload).

To manage this review remotely, a 30-minute daily ritual is enough. The senior reads the morning PRs, comments on blockers, validates merges. It is not a bottleneck, it is a checkpoint.

My approach: every block delivered by the AI agent goes through a read, browser test, and manual validation cycle. The agent reads the context (CLAUDE.md, ARCHITECTURE.md), executes, tests, documents. Then the senior validates.

Mistake 5: a junior armed with AI accumulates debt without realizing it

Stanford measures a 20% drop in junior developer hiring between 2024 and 2026. Companies think AI fills the gap. The reality: a junior using AI without a senior in the loop does not produce senior-level code. They produce LLM code that nobody re-reads.

The gap between "knowing how to use AI" and "knowing how to verify what it produces" is a gap in experience, not intelligence. Experienced devs' confidence in AI code dropped from 40% to 29% in one year, according to the survey cited by Spencer Keglovitz. Seniors doubt because they know what breaks.

Is a junior armed with AI without a senior really a risk?

A developer who does not understand race conditions, access control, or session management will not spot when AI gets those things wrong. AI delivers vulnerable code with the same confidence as correct code. There is no warning signal, no change in tone, no alert.

The market is restructuring around this reality. Interchangeable juniors are losing ground. Seniors who orchestrate AI, set up guardrails, and guarantee deliverable quality are seeing their value rise. This is why all our profiles at Extra Dev have a minimum of 8 years of experience: an AI agent in the hands of a senior with business context delivers fast and clean. The same agent in the hands of a lone junior produces code that passes unit tests and blows up in production.

"The senior is not there to slow things down. They are there so the code holds."

Vincent Roye, June 2026

Vibe coding is not a dead end. It is a powerful tool when the cost of failure is low and someone understands the output. For an internal prototype, an automation script, or a one-shot tool, it is remarkably effective. For client production, with sensitive data, real users, and an availability obligation, the filter of a senior dev with a minimum of 8 years of experience is not optional. It is the only barrier between code that works and code that holds.

Frequently Asked Questions

45% of AI code contains vulnerabilities: true or false?

The figure comes from a Veracode study published in 2025 covering millions of code scans. It measures the percentage of AI-generated code containing at least one exploitable flaw (injection, XSS, hardcoded secret, weak crypto). The figure is confirmed by CodeRabbit, which measures a 2.74x factor on security issues in AI-assisted code. This is not an isolated alarmist figure, it is a converging signal from multiple independent sources.

How do you set up a review pipeline for AI code?

The minimum viable pipeline has three steps. First, an automated security linter (Semgrep, CodeQL, or Snyk) that runs on every PR. Then, a manual review by a senior dev who checks architectural consistency and security patterns. Finally, an integration test in an isolated staging environment before any merge to production. This pipeline adds 20 to 30 minutes per feature, which remains marginal compared to the cost of a production incident.

Vibe coding vs. professional development: where is the line?

The line is the cost of failure. If the code crashes and the consequence is restarting a script, vibe coding works perfectly well. If the code handles financial transactions, patient data, or user access, every line must be verifiable, traceable, and maintained by someone who understands the implications. Vibe coding is a prototyping mode, not a production mode.

Should vibe coding be banned in companies?

No. Banning it would be as absurd as banning IDEs or copy-paste. The right approach is to govern it: require a senior review on any PR from an AI tool, mandate automated security testing, and clearly separate prototype and production environments. AI is a force multiplier for supervised teams. It is a risk multiplier for teams without oversight.

Why does a senior dev cost less than a production incident?

A senior dev working as a contractor costs 180 euros/day. A production security incident costs an average of 4.45 million dollars according to the IBM Cost of a Data Breach 2024 report. Code review by a senior takes 20 to 30 minutes per PR. Remediating a flaw in production takes 15 times longer than a correction in the review phase, according to aggregated data from Veracode.

Sources