AI Made Production Easier, But Made Work Harder

My Day-to-Day Work Is Unrecognizable

Six months ago I was still writing code. A typical day now looks like this: define the boundaries of a task, write out the acceptance criteria, hand it off to an agent, wait for results, review them, push back if something is wrong, let it fix the issue, then run the loop again. Repeat.

Sometimes I have several agents running in parallel on different tasks at the same time. By end of day, my commit volume matches what I used to ship in a week. But if you asked how many lines of code I wrote today, the answer is probably zero.

At first I called this "AI-assisted programming." Then I realized that framing is completely wrong. "Assisted" implies I'm still the one writing code, with AI helping on the side. What actually happened is a role swap: AI writes the code, I make decisions and review the output.

This isn't an efficiency improvement. It's a job transition.

The three things I actually spend time on now: deciding the shape of an API, defining what should and shouldn't be built, and verifying that output matches expectations. The implementation itself is fully delegated. Experience hasn't disappeared; it's migrated from the syntax-and-frameworks layer to the judgment layer. You need to know what "correct" looks like, but you no longer need to write it yourself.

The subtlest part of this shift is that it makes "knowing how to use AI" less important than it sounds. What actually creates the gap is whether you can verify AI output. Prompt writing has a ceiling. The ability to design a verification loop has none. Someone who can quickly judge output quality on a fast-turnaround task, compared to someone who just throws requirements at ChatGPT and accepts the first reply: the output gap between them is orders of magnitude.

My Understanding of "Reliability" Got Flipped

Traditional software engineering holds a deep-rooted belief: controlled processes produce controlled results. Code reviews, process standards, branching strategies: all of these constrain the process. The more rigorous the process, the more reliable the outcome.

I used to believe this too. Then I started using agents heavily for coding and noticed something counterintuitive: the more I tried to control the AI's execution process, the less stable the results were.

Writing "you must write tests before implementation" in the system prompt, the AI sometimes followed it, sometimes didn't. Adding more constraint items shifted the AI's attention from completing the task to complying with the constraints. The more granular the control, the more things went wrong. This is what I'd call constraint inflation.

So I switched approaches: stop managing the process, manage only the outcome. Lock down the acceptance criteria, define the isolation boundaries, and leave the process alone.

Concretely, this means making tests the only hard constraint. A test either passes or fails with no middle ground. What the AI does inside the test-fail-fix loop is essentially a search: each failure provides a clear error signal, each fix narrows the search space. As long as the acceptance framework is solid, the path the AI takes to reach the correct result doesn't matter.

This is a significant shift in engineering philosophy. Before, writing code was about implementation steps. Now it's about constraints, verification, and isolation. Reliability used to come from process determinism; now it comes from outcome determinism.

I call this the "outcome determinism" paradigm. Its prerequisite is that you can precisely define what "correct" means. If you can't define that, AI can't help you, because it has no converging target. And defining what "correct" means is precisely where engineering judgment is most concentrated.

Pushing this one step further, multi-agent architecture design becomes clearer too. A lot of people imagine multi-agent systems as a group of specialists with different roles collaborating: one agent as PM, one as frontend, one as backend. I tried this. It didn't work well. AI isn't human, and there's no theoretical basis for slicing agents along human job titles. The real problems to solve are how to partition context, how to synchronize across agents, and how to aggregate results. Strip away the anthropomorphizing and what remains is a context management problem.

But I'm More Exhausted Than Before

This is the thing I least expected.

After AI took over the coding work, I assumed I'd have more slack. Output went up, manual labor went down. The actual experience is the opposite: I'm significantly more tired than before.

I've thought carefully about why. When I was writing code, a large portion of the time was spent on low-cognitive-load tasks: puzzling over a regex for an hour and feeling relieved when it finally clicked; scanning StackOverflow and picking the best of three answers; running a build, waiting two minutes, listening to a podcast in the meantime. These look like work, but they're actually covert cognitive rest. The brain recovers during the gaps between simple tasks.

AI fills all those gaps. Now the entire day is high-density decision-making: how to break down this task, are these acceptance criteria right, is this output quality good enough, what's next. No spacing out, no listening to music, no waiting for builds.

There's solid empirical research from Berkeley, an 8-month embedded study at a 200-person company, that concluded AI tools didn't reduce working hours. They actually made people work more. This matches my experience exactly.

The deeper reason is that AI eliminates the natural "wait and recover" windows. Spacing out while writing code by hand, looking up documentation, running tests: these were rhythm-built breathing spaces. Now the rhythm is continuous. People can keep pushing forward even when running on the last fumes of cognitive capacity, because AI lowers the execution cost of each individual step. Without realizing it, you run yourself to a more extreme state.

This raises a concrete question. If a tool increases output but increases depletion, is it making everyone more valuable, or accelerating everyone toward burnout? When you can solve 50 problems a day but your pay hasn't changed, compared to solving 5 problems a day before, the unit price per problem is falling. This isn't AI-specific; every productivity revolution in history has faced the same distribution problem. But AI's speed means it's arriving unusually fast.

My current approach is simple: when 8 hours are up, I close the laptop. The "let your brain go blank" phase that writing code used to provide is gone, so the window of high-quality decision-making available each day is shorter. Accepting that is more rational than pushing through and pretending there's another round left in the tank.

What's Valuable Is Migrating

If code can be rewritten at any time and prompt techniques have a ceiling, what's going up in value?

The trend I'm observing is context organization capability.

An example. Two people, same Claude Code setup. One starts from scratch every time: open the chat, describe the requirement, take the result, close it. Repeat next session. This is consumptive usage; context resets to zero with each interaction.

The other approach treats the context built up in AI collaboration as infrastructure to be maintained. What I've been doing is structuring a project's rules, skills, memory, and principles into AI-consumable markdown files and keeping them in a monorepo. Every time an agent starts up, it doesn't read a blank slate. It reads several months of accumulated decisions, preferences, and judgment.

This gap widens over time. The ceiling on consumptive usage is the quality of a single-session prompt. The ceiling on investment-based usage is how much long-term semantic capital you can turn into an operating system that agents can call on. The former is linear. The latter compounds.

One layer above context organization, and even scarcer, is taste. AI can generate anything. But judging what's worth generating, which direction is right, what quality is "good enough": these aren't learnable from training data. Execution-layer ICs face replacement, the taste layer gets amplified. The stronger AI gets, the more leverage people with taste have, and the easier it is for people without taste to be replaced by AI's average output.

Distilling a project is easy. Writing a project's specs, processes, and decision logic into agent-readable documentation is something most people can do. Distilling a person is hard. Architecture taste, design instinct, judgment about what should and shouldn't be built: these things are difficult to fully capture in rules. They live in accumulated practice and show up as the ability to glance at something and know immediately whether it's right.

This gives me a clearer priority for how to invest my time. Rather than spending more time learning new prompt techniques or chasing new models, I'd rather put energy into two things: structuring my existing experience and judgment into reusable context assets, and continuing to accumulate in real projects the taste that only comes from doing.

Software Engineering Is Transforming, Not Disappearing

A popular narrative says AI will make software engineering obsolete. My read is the opposite: AI makes the core problems of software engineering more important.

The reason is simple. Design patterns don't abstract functionality. They abstract change. Interface segregation, dependency injection, layered architecture: these don't solve "how do you write the code," they solve "how do you change the code when requirements shift." AI can generate an entire feature in one shot, but without proper isolation, the first modification scatters the problem across the whole codebase.

AI itself is also constrained by context window limits. Good isolation means each modification requires less context, so AI performs more consistently. Poor isolation means AI needs to understand the entire system to make a local change, and failure probability rises exponentially.

Good architecture used to keep human engineers from overworking. Now it keeps AI agents from making mistakes. The goal changed. The principles didn't.

If you push this thinking all the way, you arrive at an interesting place: building infrastructure for AI is essentially the compiler development of this era. Compiler authors never write application code. They write the infrastructure that lets others write code more effectively. Now that AI is increasingly taking on the application code role, the people designing constraints for AI, scaffolding it, building context infrastructure, are this era's compiler authors.

This analogy gives me a clear sense of positioning. I don't need to compete with AI on who writes code faster. That race is over. What I need to do is turn my judgment into infrastructure that AI can consume, so that AI runs better inside my boundaries than it does for anyone else.