Your Team Uses AI to Code. Why Isn’t It Faster?

🌏 閱讀中文版本


Your Team Started Using AI to Write Code

Last week, a PR was submitted in just two hours.

500 lines. Logic looked solid. No obvious issues.

During review, we found inconsistent code style, missing type definitions, and several functions that duplicated existing modules but with different implementations.

After fixing all that, the total time didn’t match expectations.


Another project tried using AI to refactor legacy code.

The AI cleaned it up. The logic was indeed clearer.

After deployment, three seemingly unrelated features broke.


This isn’t about AI being bad.

It’s that your codebase isn’t ready to let AI help.


⚡ 3-Second Summary

  • Core issue: AI output quality ceiling = codebase maturity
  • Framework: 5-level engineering foundation model (L1-L5)
  • For: Teams adopting AI coding tools, managers evaluating AI ROI
  • Not for: Those just learning AI tool operations (this is about “environment readiness”)

Part B: Why AI Isn’t Magic

The Ceiling of AI Tools Isn’t AI Itself

In 2025, 85% of developers are using AI tools to write code.

But according to Gartner, 43% of enterprises abandon AI projects due to “lack of technical maturity.”

S&P Global’s data is more direct: in 2025, 42% of companies abandoned most of their AI initiatives, more than doubling from 17% in 2024.


The key isn’t the tool itself.

The key is: how much automation can your codebase handle?


Think of it this way:

Even the best chef can’t make good food with bad ingredients.

Even the strongest AI can’t write good code with an inconsistent codebase.


The Real Problems Behind Those Two Cases

Back to the two scenarios from the opening:

Symptom Surface Problem Real Problem
More PR rejections AI writes bad code? Codebase lacks unified style and type definitions
Refactor breaks three things AI doesn’t understand business logic? Too much tech debt, no test coverage

AI just writes based on what it “sees.”

If what it sees is chaos, what it produces is chaos.


The Engineering Maturity Pyramid

This isn’t a new concept. Software engineering has long used “maturity models” to describe codebase governance levels.

Applied to AI adoption, it breaks down into 5 levels:

        L5: Architecture Drift Correction
           ↑ AI can do system-level refactoring
       L4: Tech Debt Cleanup
          ↑ AI changes won't cascade failures
      L3: Dependency Updates & Security Patches
         ↑ AI output can safely go to production
     L2: Types & Documentation
        ↑ AI can correctly infer intent
    L1: Formatting & Import Organization
       ↑ AI output has consistent style

Each level is a prerequisite for the next.


Key Insight: The ceiling of AI tools isn’t the AI’s capability—it’s how much automation your codebase can handle.


Part A: 5-Level Adoption Roadmap

L1: Formatting & Import Organization

Goal: Make the codebase look like “one person wrote it”

How:

  • Adopt Prettier / ESLint / gofmt / black
  • CI enforces linting—PRs can’t merge without passing
  • Remove unused imports

Completion Criteria:

  • lint error = 0
  • New PRs don’t get rejected for formatting issues

What AI Can Do:

  • Produce style-consistent code
  • No more “tabs here, spaces there” review comments

L2: Types & Documentation

Goal: Let AI “understand” your code

How:

  • TypeScript / Python typing
  • Add docstrings to key functions
  • Complete API documentation

Completion Criteria:

  • Type coverage ≥ 80%
  • Core modules have documentation

What AI Can Do:

  • Correctly infer function inputs and outputs
  • No more guessing types and producing runtime errors

This level is particularly important.

AI relies heavily on types and documentation for automated modifications. Without them, AI inference becomes very unstable.


L3: Dependency Updates & Security Patches

Goal: Reduce known vulnerability and version risks

How:

  • Adopt Dependabot / Renovate
  • CVE scanning, SBOM management
  • Regular dependency updates

Completion Criteria:

  • CVE high/critical = 0
  • Dependencies within 2 major versions

What AI Can Do:

  • Generated code won’t introduce known vulnerabilities
  • Auto-upgrade PRs can safely merge

L4: Tech Debt Cleanup

Goal: Prevent AI changes from “pulling one thread and unraveling three”

How:

  • Build a refactoring roadmap
  • Add tests (at least for core paths)
  • Modular decomposition

Completion Criteria:

  • Test coverage ≥ 60% (core paths ≥ 80%)
  • Clear inter-module dependencies

What AI Can Do:

  • Local refactoring won’t cause cascade failures
  • Change scope is predictable and verifiable

This level is the threshold for AI to “truly boost productivity.”

The more tech debt, the more likely AI modifications will fail, with unpredictable impact scope.


L5: Architecture Drift Correction

Goal: Bring system architecture back to a maintainable state

How:

  • Realign with architectural principles
  • Re-draw domain boundaries
  • System-level refactoring

Completion Criteria:

  • Clear module boundaries
  • Architecture docs match implementation

What AI Can Do:

  • Help analyze dependency graphs
  • Suggest module boundaries
  • Generate system-level refactoring PRs (but needs strong guardrails)

What Happens When You Skip Levels

Skipped Common Outcome
L1 AI output has mixed styles, review time increases
L2 AI guesses types wrong, runtime errors increase
L3 AI introduces vulnerable dependencies, security incidents
L4 AI changes one thing, breaks three others
L5 AI makes things messier, system eventually becomes uncontrollable

Key Insight: Each level is a prerequisite for the next. Skipping levels causes problems.


Part C: How Teams Should Divide the Work

Who Should Own Which Level?

Role Levels Specific Tasks
Junior L1-L2 Lint setup, type additions, basic docs
Mid-level L2-L3 Complex types, dependency updates, security scans
Senior L3-L4 Security architecture, tech debt prioritization, test strategy
Architect L4-L5 Architecture governance, module boundaries, system refactoring

This division has two benefits:

  1. Juniors have a clear growth path—moving from L1 to L2 builds foundational skills
  2. Seniors don’t waste time on formatting issues—CI should catch those at L1

What Metrics Should Managers Watch?

No need to understand technical details. Just watch these numbers:

L1: Lint Error Count

Item Description
Meaning Number of code style inconsistencies
Healthy = 0
Warning > 50 (team isn’t managing formatting)
Typical Legacy projects often show 200-500+ when first adding linting
How to check CI reports, or run npm run lint

L2: Type Coverage

Item Description
Meaning How much code has explicit type definitions (enables AI inference)
Healthy ≥ 80%
Warning < 50% (AI will guess types wrong)
Typical JavaScript-to-TypeScript migrations start at 30-50%
How to check npx type-coverage, or IDE built-in tools

L3: CVE High/Critical Count

Item Description
Meaning Number of known high-risk security vulnerabilities in dependencies
Healthy = 0
Warning > 0 (known vulnerabilities unpatched)
Typical Projects not updated for 6 months usually have 5-20
How to check npm audit, snyk test, GitHub Dependabot

L4: Test Coverage

Item Description
Meaning How much code is protected by automated tests
Healthy ≥ 60% (core paths ≥ 80%)
Warning < 30% (changing code is like defusing a bomb)
Typical Projects without deliberate maintenance are around 10-30%
How to check jest --coverage, SonarQube

L5: Module Coupling

Item Description
Meaning How complex the dependencies between modules are
Healthy Project-specific (lower is better)
Warning Single module depended on by > 10 other modules
Typical Legacy projects often have “God modules” with 30+ dependents
How to check madge --circular, SonarQube, dependency graph tools

If your team says “AI tools aren’t working,” check these numbers first.

If the numbers aren’t there, the problem isn’t AI.


Gradual Adoption Recommendations

Don’t try to do L1-L5 all at once. Instead:

  1. Start with L1—simplest, quickest wins
  2. Stabilize L1, then do L2—type additions take time
  3. Make the codebase a little better with each PR—Boy Scout Rule

Time estimates (mid-sized project):

Level Estimated Timeline
L1 1-2 weeks
L2 1-3 months
L3 Ongoing
L4 3-6 months
L5 Depends on architecture complexity

Key Insight: The higher the maturity, the more AI can evolve from “assistant” to “automation.”


Next Steps

5-Question Self-Assessment: What Level Is Your Team At?

  1. L1: Do PRs get rejected for “formatting issues”?
  2. L2: Are AI-generated types correct? Or do you constantly fix them manually?
  3. L3: When did you last update dependencies? Any known vulnerabilities?
  4. L4: Would you let AI do refactoring? Or are you afraid of cascade failures?
  5. L5: Does system architecture match documentation? Or have they diverged?

Checklist: Completion Criteria for Each Level

□ L1: Lint error = 0, CI enforcement enabled
□ L2: Type coverage ≥ 80%, core functions documented
□ L3: CVE high/critical = 0, dependencies regularly updated
□ L4: Test coverage ≥ 60%, core paths ≥ 80%
□ L5: Architecture docs match implementation

Sources

AI Project Failure Rates

Root Causes of AI Project Failure

AI Tool Adoption Statistics

Leave a Comment