🌏 閱讀中文版本
Last week, an engineering manager with an 80-person team grabbed coffee with me.
He didn’t mention how advanced AI is. He just said: “We spent a lot of money on compute, and now the team spends more time fixing AI-generated code every day than writing new features.”
He laughed a tired laugh. “We thought we hired a superstar engineer. Turns out we hired an intern who never sleeps but has no idea how our business works.”
This is normal.
Gartner predicted in July 2024 that by end of 2025, at least 30% of GenAI projects would be abandoned after the PoC stage. (Source: Gartner Newsroom, “Top Technology Trends 2024”)
But here’s the thing: it’s not that the technology isn’t good enough. When AI moves from “toy” to “tool,” the organizational friction it triggers far outweighs the efficiency gains.
We tend to think AI adoption is a technical problem. It isn’t — it’s a stress test for people and process.
Where Disillusionment Starts: PoC Survivorship Bias
Most companies start their AI adoption the same way: a PoC.
Pick a pain point. Find two enthusiastic engineers. Give them maximum API key access. Then the miracle happens.
A junior finishes in 10 minutes what normally takes a day of CRUD work. A senior refactors 4,127 lines of legacy code in 5 minutes.
The report goes to leadership: “300% efficiency gain.” The boss nods. Budget approved. Roll it out company-wide.
Then reality shows up.
The PoC used a greenfield feature. No legacy baggage. Production is different — there are columns from 2019 that nobody ever documented.
Nobody has time to ask what a field means. AI will infer from its training distribution, the prompt, and available context — and produce something that looks right but is completely wrong. With a perfect comment attached.
Nobody checks every item. By the 50th one, everyone’s tired.
We overestimate AI’s capability in known territory, and underestimate the damage it can do in unknown territory.
When AI touches core business logic, the cost of a mistake isn’t “rewrite a function” — it’s “mislead a customer’s decision” or “data breach.” Especially anything involving PII, permissions, or external output. That cost is irreversible.
The Hidden Iceberg: Maintenance Costs Shift
Most managers evaluating AI ROI only track one number: how much dev time was saved.
But they forget to track the other number: how much did the cost of maintaining all that AI-generated output increase?
This isn’t technical debt. It’s trust debt.
The senior’s frown isn’t about bad code. It’s that the existing spec isn’t enough to confirm whether this logic covers all the business rules.
Imagine this: a senior reviews AI-generated payment logic. Syntax perfect. Tests pass. He won’t merge it.
It’s not that he doesn’t trust AI. He doesn’t know whether the AI actually understood the business rules behind that logic — or just got lucky.
1. Readability Collapses
Without repo context, coding standards, and review, AI-generated code tends to be long, clever, and unreadable.
It loves complex nested structures, hidden conditionals, and variables named after magic numbers.
I recently dug up some AI-generated code: 7 levels of nested if-else, with a variable named MAGIC_RATE.
When your team starts using AI heavily, your codebase fills up fast with this kind of “clever but unreadable” code. The efficiency gains from AI get eaten by the rising cost of comprehension.
2. Dependency Explosion
Without a dependency policy and prompt constraints, AI loves adding new libraries.
Every new dependency is a potential conflict point. Six months later you open package.json and find a pile of dependencies nobody approved.
This is the invisible maintenance cost. Stale lockfile, license violations, missed CVE scans — all landmines AI helped you bury.
This is classic “AI debt”: development got faster, but system transparency and maintainability went down.
Organizational Friction: When AI Hits Old Processes
Some technical problems have clear engineering solutions. The harder part is resetting accountability and process at the same time.
1. The Accountability Vacuum
Traditional development: the person who writes the code owns the code. AI-assisted development: who owns it?
The person who wrote the prompt? The senior who reviewed the code? The vendor who provides the API?
A friend who does IT at a manufacturing company told me: they deployed AI to generate reports. The AI calculated “revenue” as “gross profit.”
The IT manager said: “The AI made an error.” The business manager said: “You didn’t review it properly.”
Neither side was the bad guy — IT wasn’t told the report was going to the board, and the business team assumed IT would review it. The problem was that the process didn’t cover this situation.
And nothing changed. Reports kept going out every week. Errors kept going unowned.
PM asks: “Who owns this bug?” You: “Uh… AI wrote it?” PM: “…”
The biggest obstacle to AI adoption usually isn’t the technology. It’s the accountability vacuum.
2. The Knowledge Transfer Risk
Senior engineers’ value is in their tacit knowledge. AI doesn’t have those memories.
If the team over-relies on AI, juniors may lose the chance to build order out of chaos. What they learn isn’t how to solve problems — it’s how to describe problems to AI.
Especially without review, pairing, or follow-up explanations, that kind of contextual practice just stops.
The Tradeoffs: When Not to Let AI Generate Directly
AI isn’t a cure-all. In some situations, keeping AI out of direct code generation for core logic is the smarter call.
High-Stability Systems Where Errors Are Irreversible
Think: a bank’s core accounting system, an aircraft’s flight control software. These systems have extremely high error costs, very rare requirement changes, and strict regulatory review.
For these systems, after risk adjustment, a strict review process may be more efficient.
Highly Customized Internal Tools with Complex Business Logic
Think: a regional supermarket chain’s inventory transfer system.
Something happened on my team recently. A junior used AI to write an inventory transfer feature. About 500 lines, two hours. Efficient — he was happy with it.
Our senior spent 10 minutes reviewing it, then said with a frown: “There’s a logic problem here. Client A’s special tax rate of 1.05 from 2021 — it’s not handled.”
Junior: “AI didn’t mention that.” Senior: “AI never read the 2021 contract.”
AI works well for tasks with clear patterns and high repetition. For ambiguous rules and heavily customized business logic, ROI gets eaten by verification, integration, and rework costs.
Decision Framework: How to Evaluate Your AI Adoption Strategy
If you’re evaluating AI adoption, stop asking “Can we use it?” Start asking “Where should we use it?”
| Dimension | High-ROI Situation | Low-ROI Situation |
|---|---|---|
| Task Type | High repetition, clear patterns | Hard to validate, ambiguous rules |
| Error Tolerance | High (quick to fix) | Low (irreversible, high cost) |
| Dependency Maturity | Sufficient standardized libraries | Requires heavy customization |
| Team Capability | Enough seniors for review | Team is mostly juniors |
| Measurable Signals | Review pass rate > 80% Rework rate < 10% |
Review pass rate < 50% New dependencies > 5/month |
But the table is just a starting point. These numbers are example thresholds — actual benchmarks need to be calibrated to your team.
The real question is: how many people on your team can actually spot when AI got it wrong during review?
Like the Scout Rule: leave the campsite cleaner than you found it. But first you need to know what “clean” means.
Execution Guide: Building AI Code Review Standards
To fight AI debt, companies shouldn’t ban AI — they should build standardized processes.
Here’s how I’d frame it for leadership:
“I recommend we open up AI for tests and documentation first. Keep it out of payment logic for now. In three months I’ll bring you the review pass rate data.”
And here’s what I’d tell the team — three things:
First, define AI’s scope. Be explicit about which modules (e.g., unit tests, documentation) allow AI generation, and which core modules (e.g., payment logic, access control) prohibit AI from generating directly.
Second, mandatory “logic tracing” in reviews. During code review, reviewers must ask developers to explain the key decision logic in AI-generated code — not just check syntax.
Third, automate dependency audits. Add CI/CD checkpoints that trigger automated security and compliance scans whenever a PR adds a third-party dependency. (If you want to track AI provenance, add provenance tagging as a separate step.)
The Gap Isn’t Whether You Know How to Use AI
Many companies fail because they treat AI as a faster keyboard.
But generative models produce high-probability outputs. Correctness still needs to be verified through specs, tests, and human review.
The real gap isn’t “whether you know how to use AI” — it’s “whether you can spot when AI got it wrong.”
Tools are amplifiers. They amplify what you already have.
So What Should You Do Tomorrow?
Where agents will be in a year, the market signal isn’t clear yet — it’ll take 1-2 years to know.
But I know one thing: whoever clarifies accountability from the start will find it much easier to manage ROI, responsibility, and risk on the same scorecard.
If you’re reporting AI adoption progress to your boss tomorrow, the report can show more than just numbers — it can also show the review and accountability mechanisms.
AI doesn’t automatically generate ROI. It only amplifies your existing strengths, or exposes your existing weaknesses.
If this code breaks, do you know who to go to?