Blog featured image

Reading the Reader: Why Software Engineering With AI Is Harder, Not Easier

Software engineers are done. That diagnosis is wrong in a specific, diagnosable way.

The people making it are pattern-matching on the visible layer of the job. That layer was typing. The actual job was interpretation. AI did not erase engineering. It x-rayed it.

Key Takeaways

  • The visible layer of software engineering was typing. The actual job was interpretation of requirements, stakeholder intent, legacy code and ambiguous bug reports. AI made the implementation layer cheap and exposed the real structure of the work.
  • Engineers now work across seven interpretive layers where three used to do the job. Four are new: the brief to the AI, the AI’s reading of the problem, the AI’s decisions and the artifact’s fidelity to the original intent.
  • The base layer of the new pyramid is legislation, not programming. CLAUDE.md files, skill definitions and eval suites are the case law a non-human interpreter operates inside of.
  • If you cannot afford to replay the failure, you cannot afford to skip the interpretation. That is the one-sentence rule separating vibe coding from malpractice.

The Wrong Autopsy

Software engineers are done. That is the argument making the rounds, and it is internally consistent. If typing is what engineers did, and AI can now type faster and cheaper than any human, then engineers are done. Fair enough, there is a rhyme and reason to it. And it is a deeply flawed one.

The visible layer was typing. Thousands of lines. Hours at the keyboard. Code reviews full of formatting nitpicks. But the actual job was interpretation. Interpretation of requirements. Of bug reports. Of legacy code written by people who left years ago. Of what users meant versus what they said. Senior engineers spent most of their time there while juniors pounded the keyboard. Nobody outside the profession saw that work.

AI did not erase engineering. It x-rayed it.

With AI, the hardest part of engineering is not writing code. It is interpreting the interpretations of an author you cannot cross-examine. The pull to skip that reading and ship whatever the machine produced is the loudest pull in the industry right now. The best engineers refuse it. That refusal is the new measure of craft.

Two Modes People Keep Conflating

Two cognitive modes get conflated all the time, and the conflation is the thing that wrecks most workplace AI experiments. Naming the split is the first move.

Analysis answers deterministic questions from data. What was Q3 revenue by region? There is a correct answer. Power BI, SQL, dashboards. The success criteria are reproducibility and correctness.

Interpretation produces meaning from signals. Why might Q3 revenue in region A have diverged? There are several valid readings. People do this. LLMs do this. The success criteria are insight and usefulness.

These modes are not interchangeable. Feed Power BI an interpretive question and you get a dashboard that never surfaces the real problem. Feed AI a computational question and you get confident nonsense, plausibly formatted, wrong. Same data, wrong cognitive mode, wrong answer.

One refinement closes the obvious hole in this argument. AI is not banned from analysis territory. It is banned from computing the answer itself. In a well-built system, when someone asks AI a deterministic question, AI writes the SQL, runs the query and returns the deterministic result. Its intelligence is in orchestration, not computation. Deterministic machinery decides the answer. AI decides what to invoke. That is the agentic design pattern in one sentence, and it is why people who grasp the distinction get far more out of AI than those who do not.

Keep that split in mind. The rest of this article is about what happens to software engineering when the interpretation side of the line suddenly has a non-human collaborator.

The Pyramid Got X-rayed

For most of engineering’s history, the effort pyramid looked like this: writing code at the base (roughly 70 percent of effort), design and review and debugging in the middle, strategy and requirements at the top. That was the public view. Juniors lived at the base. Seniors lived up top. The rest of the work handled itself because everyone was busy typing.

With AI, the proportions look inverted. The base is deciding what to automate, what to determinize and what to keep human. The middle is context engineering, evals and feedback loops. The upper layer is review and curation. The capstone is the ship-or-don’t-ship decision on the full artifact set.

The tempting reading is that AI flipped the pyramid. That is not what happened.

AI did not flip the pyramid. It x-rayed the pyramid we were hiding.

An ancient stone pyramid at dusk with a glowing x-ray layer revealing the internal structural blocks and hidden chambers beneath the visible exterior

Senior engineers always spent most of their time on judgment. Reading the problem. Deciding what to build and what not to build. Interpreting stakeholder ambiguity. Cutting scope. Protecting the architecture from well-intentioned damage. That work was always the job. It was invisible because typing was so visible, and because juniors were not yet experienced enough to do it.

AI did not change the real structure of the work. It made the implementation layer so cheap that the real structure became visible. Mediocre engineers used to get by on execution speed. They cannot anymore. The skills that mattered before still matter. The ones that mattered most visibly no longer do.

The Interpretation Cascade

This is where the new job gets harder than the old one.

Before AI, the engineer’s interpretation did not happen before coding. It happened through coding. A day-long implementation cycle was a thinking loop. Edge cases surfaced when you called the function. The API shape emerged when you wrote the consumer. The bug you had not anticipated taught you something about the domain. Coding was thinking-by-making. You did not know what you thought until you made something.

AI took the making away. The thinking has to happen somewhere else, and neither of the remaining places is as good.

Upfront thinking is thinner. There is no workbench to push against. You are trying to anticipate what the code would have taught you, which is exactly the thing you could not anticipate, which is why you used to code to find out.

Review-time thinking is defensive. You are reading a finished artifact, not shaping one in progress. You catch what looks wrong. You miss what you would not have built in the first place.

Then the cascade kicks in.

  1. The engineer forms a thinner interpretation of the problem. No thinking-by-making.
  2. The engineer verbalizes that thinner interpretation imperfectly, because engineering is full of people who chose the profession partly to avoid having to explain themselves to other humans.
  3. AI receives the thinly-verbalized thin interpretation and commits to decisions based on its reading of it.
  4. The engineer now interprets the AI’s reading of their own degraded brief.

Four layers of interpretive loss where there used to be one. The old loop was short. You, problem, code, feedback, you. The new loop is long and lossy, and every hop degrades the signal. This is the structural reason AI-assisted development fails in the hands of people who just let the machine do the thing. By the time they are reviewing, the signal has decayed three times.

Every engineer in the new era is doing seven interpretive layers where three used to do the work.

  • L1: the problem and requirements
  • L2: existing code, constraints and domain
  • L3: stakeholder intent, what was meant versus what was said
  • L4 (new): your brief to the AI. Did you give it the context it needed?
  • L5 (new): the AI’s reading of the problem. What did it infer? What assumptions is it making?
  • L6 (new): the AI’s decisions. Of the many paths, which did it take and why?
  • L7 (new): the artifact’s fidelity. Does the output solve the original problem or a nearby one the AI found easier?

Layers 4 through 7 did not exist before AI. The engineer was the interpreter. The interpretation is external now, and interpreting an external reader is a muscle that was never needed and is now central.

Juniors who skip L5 through L7 are not lazy. They do not know the layers exist. Which is why the old apprenticeship model, watching a senior read someone else’s code and absorbing the pattern, is suddenly the most important training ground in the industry again. Not the least.

Why This Is Harder Than Reviewing a Colleague

Code review has always been interpretive work. You read an author’s decisions and judge their reading of the problem. That work has scaffolding. AI code strips the scaffolding away.

The author cannot be queried. When a human makes a strange choice, you ask. They had reasons, or they did not, and either answer is information. When AI makes a strange choice, you can ask, but you are querying a different instance with a different context, and the answer is post-hoc reconstruction that confabulates confidently.

The author has no continuous identity. A human author has a style you can model. This one over-abstracts. That one under-tests. The other one hates inheritance. You build a mental map of how they think and interpretation gets easier with each review. AI style is context-dependent and drifts mid-session. There is no stable author to model.

The author has no skin in the game. A human engineer’s code commits them to something. They will defend it, learn from it, evolve. AI will cheerfully agree with your objection and rewrite in the opposite direction. The code is not a record of belief. It is a record of the prompt’s gravity. You cannot interpret a position because there was no position.

Plausible-looking bad decisions. Human bad code often looks bad. Rushed, inconsistent, clearly fatigued. AI bad code looks good. Well-formatted, reasonably named, convention-following, sometimes with polite comments explaining the wrong thing. The visual markers of quality have decoupled from actual quality. Reviewers trained on “spot the messy one” are flying blind.

No feedback loop back to the author. Reviewing a human teaches them. Reviewing AI teaches no one except you. Every interpretation is one-directional labor.

These are the structural reasons a senior engineer’s judgment got more valuable with AI, not less. Seniors already had calibrated judgment about other people’s code. Juniors are being handed a version of the job that demands the one skill they have not had time to develop.

My Current Nightmare

I recently removed Gravity Forms from the customer-inquiry flow on one of my sites. The decision had a real reason. Gravity Forms ships hundreds of kilobytes of JavaScript on every page that loads it. The site runs measurably faster without it. Page performance was a legitimate engineering priority. That part of the call was honest.

The problem is what I did not do after making it.

The new chain runs like this: a static HTML form, an n8n webhook, my agent router, then me. No database row lives at any hop. If any link in the chain drops the payload, the message evaporates. There is no “show me inquiries from last week” query. There is no replay. The customer who wrote to me at 3 a.m. and whose message never arrived has no way of knowing. And for a while, neither did I.

The trade-off was real. Durability versus page weight. That is a legitimate engineering call and in many contexts the right answer is cut the weight. But making a trade-off does not exempt you from doing the reading about the trade-off. I cut the weight and did not do the reading about what the new chain needed in order to be durable under stress. No first-hop persistence. No idempotency key threaded through the hops. No replay tool. No daily digest that would let me notice silence before a customer did.

Every reason for the original decision was true. None of them survived the interpretation work I did not do at design time. Each link in the chain is now an interpreter reading the previous link’s intent, and not one of them was briefed about durability. They are all doing their honest best on a problem none of them was asked to solve.

“This is what the interpretation cascade costs when you skip it. It does not announce itself. It shows up weeks later, in the form of a customer email you never received, from someone who decided you were not worth a second attempt.”

— Alex Kudinov, MCC

I am telling you this because the article would be worthless without it. The theory is easy to write. The discipline to apply the theory to yourself is the whole point, and I am currently on the wrong side of that discipline. The fix is neither exotic nor hard. A durable inbox at the first hop. An idempotency key. A replay tool. A daily silence alarm. I have not built it yet. That deferral is the siren song, sung in my own voice.

You Are Legislating, Not Programming

A law library corner with leather-bound legal volumes filling a wooden bookshelf, a walnut desk holding a modern laptop displaying code in cool blue light beside an open law book lit by a warm brass banker's lamp

The metaphor that finally makes the new work sound like work is legal.

A judge with no statutes, no precedent and no procedural rules is a bad judge. Not because they are unintelligent. Because unbounded interpretation is indistinguishable from arbitrariness. A judge working within a well-defined body of law is useful precisely because the law narrows what can be validly interpreted. The constraint is what makes the interpretation trustworthy.

This is what engineers are doing when they write CLAUDE.md files, skill definitions, agent boundaries, project conventions and eval suites. It is case law. It is the statute book. It is the procedural framework the interpreter operates inside. The base layer of the new pyramid is not “setting AI up for success.” It is legislating the world the interpreter operates within.

The engineer’s new base-layer job is legislation, not programming.

Engineers who struggle to justify their role to managers can point at this without flinching. You are building the legal system a non-human reasoner operates inside of. Without that system, nothing it produces is trustworthy. With that system, everything downstream becomes possible, reviewable and measurable. A CLAUDE.md file is either present or it is not. A skill either exists or it does not. An eval either catches the regression or lets it through. The work that looked soft when you called it “context engineering” becomes concrete when you call it legislation.

Note

This is not a metaphor to dress up the work. It is a diagnostic. If your team’s AI output is drifting, unpredictable or collapsing under load, look at the statute book before you look at the model or the prompt. Nine times out of ten, the interpreter is acting reasonably inside a world you forgot to legislate.

When Vibe Coding Is Actually Fine

Vibe coding is fine when the cost of a bug is survivable. Your personal project. Your small-biz internal tool. The script that processes your own inbox. The prototype that will be thrown away the moment you have learned what it was trying to show you. Let the pipes flow, find out what you were building, and learn more from the mistake than you would have from the reading.

Vibe coding is malpractice when a bug compounds into costs you cannot walk back. Customer lawsuits. Lost customer trust. A security hole that hands the company to an attacker. Exposure of regulated data. Irrecoverable loss of a message someone trusted you to receive.

If you cannot afford to replay the failure, you cannot afford to skip the interpretation.

Look at the thing you are building right now. Picture the worst version of the output you ship without reading it carefully. If that picture is survivable, ship fast and learn fast. If that picture is a customer losing faith in you, or regulators losing faith in your company, or a vulnerability landing in production because you did not notice what the reader decided, read every line.

The Siren Song and the Refusal

The temptation to skip the reading is not a character flaw. It is the loudest pull in the industry right now, because AI finally made it possible. For the first time, an engineer can appear productive, shipping artifacts, closing tickets, moving the dashboard, without having done the interpretation work that makes any of those outputs trustworthy. The incentive to skip is structural. The incentive to catch the skipping is not.

I know the pull. I am living with one of its consequences. This article was partly written to shame me into fixing the chain before a real message gets lost.

The engineers who will be valuable in the era of AI-assisted development are the ones who refuse the pull. Not because they are smarter. Because they have learned, through the wrong kind of mistake or through watching someone else make one, that the reading is the job. They do L1 through L7 every day. They write the CLAUDE.md file. They run the evals. They read the AI’s decisions with the same suspicion they would bring to an unfamiliar colleague’s first commit. They ask, every time, what this interpretation would cost if it were wrong, and they calibrate the depth of their reading to the stakes.

Before AI, the hardest part of engineering was reading the problem. With AI, the hardest part is reading the reader. And refusing to skip the reading when everyone around you is skipping it.

The old pyramid looked like typing. The new pyramid looks like judgment. The base was always judgment. We finally had to admit it.

Frequently Asked Questions

Will AI replace software engineers?

No, but it will change what software engineering looks like from the outside. The visible layer of the job was typing. AI made typing cheap. The actual job was always interpretation of requirements, stakeholder ambiguity, legacy code and production failures. That work just got harder, not easier, because engineers now have to interpret the decisions of a non-human author they cannot cross-examine. Engineers who understood the real job will be more valuable than before. Engineers who conflated typing with engineering are in trouble.

What is the biggest skill shift for engineers working with AI?

Second-order interpretation. Engineers used to interpret the problem and write code that reflected their own judgment. Now they interpret the problem, brief the AI, read the AI’s interpretation of their brief, read the AI’s decisions and read whether the artifact actually solves the original problem. Four new interpretive layers have been added on top of the three that always existed. Reading an external reader is a muscle most engineers never developed because they used to be the reader.

What is context engineering and why does it matter?

Context engineering is the work of giving an AI enough domain knowledge, constraints, examples and success criteria to make its interpretations trustworthy. The better frame is legislation. You are writing the case law, statutes and procedural rules a non-human reasoner operates inside of. Without that framework, interpretation is unbounded and unbounded interpretation is indistinguishable from arbitrariness. CLAUDE.md files, skill definitions and eval suites are the concrete deliverables of this work.

When is it safe to vibe code and when is it not?

Vibe coding is fine when the cost of a bug is survivable. Personal projects, internal small-business tools, throwaway prototypes, scripts that process your own data. Let the pipes flow and learn from the mistakes. Vibe coding is malpractice when a bug compounds into costs you cannot walk back: customer lawsuits, lost customer trust, security holes, exposure of regulated data, irrecoverable loss of a message someone trusted you to receive. The one-sentence rule: if you cannot afford to replay the failure, you cannot afford to skip the interpretation.

Why is reviewing AI code harder than reviewing a colleague’s code?

Five structural reasons. The author cannot be meaningfully queried about their reasoning. The author has no continuous identity, so you cannot build a mental model of how they think. The author has no skin in the game, so the code is not a record of belief. Bad AI code looks plausible (well-formatted, convention-following) in ways bad human code usually does not. And reviewing AI code teaches no one except you, so the labor is one-directional. These are not fixable with better prompts. They are the structural cost of having a non-human collaborator.

Not Sure Where to Start?

Book a free consultation to discuss your goals and find the right path forward.

Book a Conversation →