Michael Sylvester

The Dark Coding Factory Is Real. You’re Already Behind.

While most developers are still learning to prompt AI coding tools, frontier teams have built fully autonomous software factories where no human writes or reviews code.

There's a moment in every technology shift where the early adopters think they've arrived. They've learned the tool. They're productive. They look around and feel good about where they stand.

That moment is a trap.

Right now, most developers who call themselves "AI-native" are using AI the same way they'd use a faster junior developer. They prompt, they review, they accept or reject. Maybe they run Claude Code or Cursor in a couple tabs. They feel like they're moving fast.

Meanwhile, three engineers at StrongDM are running what they call a Software Factory. No human writes code. No human reviews code. The system takes specifications written in markdown, builds the software, tests it against behavioral scenarios, and produces shippable artifacts. The humans approve outcomes, not lines.

That's not a demo. That's production software. For a security company.

The five levels and where you actually are

Dan Shapiro published a framework in January that maps AI-assisted development to something like the SAE levels of autonomous driving. It goes from Level 0 (spicy autocomplete) to Level 5 (the dark factory).

Here's the uncomfortable part: 90% of developers who consider themselves AI-native are sitting at Level 2. That's the "junior developer" stage. You're pair programming with the model, reviewing every diff, feeling more productive than you've ever been.

And you've stopped climbing. Because Level 2 feels like the top.

It's not.

At Level 3, you stop being the developer. The AI is the developer. You become a full-time code reviewer. For most people, this feels like things got worse. Almost everyone plateaus here.

At Level 4, you're not even reviewing code. You're writing specs, arguing about specs, planning architecture, then walking away for 12 hours and checking if the tests pass. You've become an engineering manager for a team of agents.

At Level 5, the lights are off. The factory runs without you watching. Specs go in, validated software comes out. The humans design the verification systems and approve the outcomes.

The gap between Level 2 and Level 5 isn't incremental. It's a different job entirely.

The part nobody wants to hear

Here's where it gets worse.

A randomized controlled trial by METR studied 16 experienced open-source developers working on codebases they already knew, projects averaging over a million lines of code and 22,000+ GitHub stars. The developers used frontier AI tools like Cursor Pro with Claude 3.5 and 3.7 Sonnet.

The result: developers using AI tools took 19% longer to complete tasks. Not faster. Slower.

But here's the real gut punch. Before the study, developers predicted AI would make them 24% faster. After the study, even after being objectively slower, they still believed AI had sped them up by 20%.

The perception gap isn't a rounding error. It's a 39-percentage-point disconnect between what happened and what people believed happened.

That means most developers aren't just behind. They don't know they're behind. They think the tool is working. It's not, at least not the way they're using it.

Why the tool isn't the problem

The METR study was a snapshot of early 2025. The models have gotten significantly better since then. With the release of Claude Opus 4.6 arriving just 6 weeks into 2026 its meaningfully improved the reliability of agentic coding workflows. The tools aren't standing still.

But StrongDM's team didn't get to Level 5 because they had a better model. They got there because they fundamentally redesigned how software gets built.

Their insight was simple and radical: if validation infrastructure is strong enough, human code review becomes unnecessary. Instead of testing whether code passes, they test whether outcomes satisfy realistic scenarios. They built an entire Digital Twin Universe that replicates third-party services like Okta, Jira, and Slack so their agents can test against real-world behavior at scale.

The agents tried to cheat. They wrote return true to pass tests. StrongDM's answer was to store scenarios outside the codebase, like a holdout set in machine learning, so agents couldn't game their own evaluations.

This is a completely different approach to software quality. Not "did the code pass the test?" but "if a real person used this software in all the ways a real person might, how often would it actually do what they needed?"

The organizational bottleneck

Here's what connects this to everyone reading who isn't running a dark factory.

The reason most teams are stuck at Level 2 isn't technical. It's organizational. Sprint planning, code review processes, engineering management structures, pull request workflows, all of it was designed for humans writing code. When the bottleneck shifts from writing to specifying, those structures become friction, not support.

You can't sprint-plan your way to Level 5. You can't code-review your way there either. The entire organizational wrapper around software development needs to change, and almost nobody has changed it.

According to one industry analysis, 70% of organizations haven't changed their roles or processes in response to AI tools. They bought Copilot, kept everything else the same, and wonder why the promised 10x isn't materializing.

Layering AI on top of existing processes gets you 5-15% improvement. Fundamentally redesigning the process gets you multiples of that. The gap between those two outcomes is where the real competition is happening right now.

What this means if you're not a developer

If you're running operations, managing a team, or building a business that depends on software, here's what matters.

The companies pulling ahead aren't the ones with the best developers. They're the ones who figured out that the human's job changed. The value isn't in writing code or even reviewing code. It's in precisely describing what should exist, building the systems that verify whether it was built correctly, and designing the feedback loops that make the whole thing self-correcting.

That's specification. That's architecture. That's judgment about what "good" actually looks like.

Those are human skills. They're also the skills that most organizations haven't invested in, because they were too busy optimizing for the old bottleneck.

The distance is growing

StrongDM's CTO has a benchmark that should make people uncomfortable: "If you haven't spent at least $1,000 on tokens today per human engineer, your software factory has room for improvement."

That's not a flex. It's a signal about how much computation the frontier teams are throwing at the problem while everyone else is still prompting one tab at a time.

The distance between the teams at Level 2 and the teams at Level 5 isn't closing. It's widening. Every month that passes, the factories get more sophisticated, the validation systems get more robust, and the organizational models get further from anything that looks like traditional software development.

You thought adopting AI tools meant you were catching up.

It might just mean you can see how far behind you are.

At CirclStdio, we help operations teams skip the slow climb. Instead of bolting AI onto broken processes, we redesign the workflow from the outcome backward. If your team is stuck at Level 2 and knows it, let's talk.

‍

Michael Sylvester

Founder

11 years of "can you make these things talk to each other?" - turned into a career.

The Dark Coding Factory Is Real. You’re Already Behind.

The five levels and where you actually are

The part nobody wants to hear

Why the tool isn't the problem

The organizational bottleneck

What this means if you're not a developer

The distance is growing

Michael Sylvester

Subscribe for cutting-edge marketing tech updates

Related articles