03 — Building with Claude Code inside a 20-year-old C++ codebase

If the WebDAV plugin were a greenfield side project, the AI collaboration story would be simple. But it wasn’t. It had to fit inside pwsafe — a mature, security-sensitive C++ codebase with a long history, multiple UI front-ends, and very clear architectural boundaries between core, UI, and OS layers.

That constraint changed everything. This wasn’t ‘generate me a microservice’. It was ‘help me modify a living system without breaking its design philosophy’.

Understanding before generating

One of the first surprises working with Claude Code was how effective it was when treated as a reader first and a writer second. Before writing any transport code, we spent time mapping the architecture: where file I/O actually enters the system, how the UI talks to the core, how commands are structured, and how locking was historically implemented.

The breakthrough insight — intercepting only pws_os::FOpen and FClose — came out of that architectural reading phase. Those two functions are the only file entry points in the Unix build. If you can intercept them safely, you don’t need to modify the core, the encryption layer, or the UI logic at all. Everything else remains blissfully unaware that the ‘file’ might actually be a URL.

Claude was particularly strong at tracing call paths across files and summarising them back in plain English. In a large C++ codebase, that kind of orientation normally takes hours of manual grepping and note-taking. Here, it became an interactive conversation: ‘If we change this, what else does it affect?’

Designing the plugin ABI

The plugin ABI was small by design: a struct of function pointers for fetch, store, exists, lock and unlock. We iterated repeatedly on what belonged in that interface and what didn’t. The rule became: keep it minimal, keep it stable, and keep error handling boring (POSIX errno values only).

What surprised me was how useful Claude was at arguing against complexity. Several early designs were more ambitious — more callbacks, more abstraction — and the model consistently pushed toward a smaller, more testable surface area. In hindsight, that constraint probably prevented a lot of future pain.

The hardest problem: the lock daemon

The real stress test for AI collaboration was the lock daemon. WebDAV locking requires live network calls. But pwsafe’s UnlockFile can run from signal handlers and destructors — places where most libraries (including libcurl) are unsafe. The naive design would eventually leave server locks hanging after a crash.

The solution — fork a child process that owns the lock token and communicates over a Unix socket — emerged through iterative reasoning. We explored thread-based designs, in-process registries, and various shutdown hooks before converging on a separate process as the only design that satisfied async-signal-safety constraints.

This was not a one-shot ‘AI writes perfect code’ moment. It was a back-and-forth: propose, critique, refine. Claude was particularly strong at enumerating edge cases: what happens on SIGKILL? What if the parent crashes? What if the child inherits file descriptors? Each question forced the design to become more robust.

At the same time, it wasn’t infallible. The missing SOCK_CLOEXEC flag — which would have allowed unrelated child processes to inherit the daemon socket and keep locks alive indefinitely — was only caught in a later manual self-review. That was a humbling reminder: AI reasoning is broad, but not omniscient.

Working in phases

The development unfolded in six phases: infrastructure, a file:// reference plugin, UI updates, the WebDAV implementation, the lock daemon, and finally the ‘Open URL…’ menu integration. Treating the work as explicit phases turned out to be critical.

With Claude, each phase became a bounded problem. First make the intercept machinery work locally. Then prove the plugin ABI with a trivial file transport. Only then add real networking. This incremental approach reduced risk and made failures diagnosable.

One pattern that worked well was asking the model to explain the code we had just written — sometimes line by line. If the explanation sounded confused or hand-wavy, that was often a sign the design itself was unclear.

What surprised me

Three things stood out.

First, speed of exploration. I could sketch three different architectural options in an afternoon and pressure-test each one conversationally. That dramatically lowered the cost of considering better designs.

Second, the model’s ability to hold a large mental map of the system. When we discussed the interaction between FClose, the in-memory lock registry, and the daemon’s copy of the token map, it could reason across files in a way that felt closer to pair programming than autocomplete.

Third, the need for discipline. It’s easy to let an AI generate large swathes of code quickly. In a security-sensitive project, that’s dangerous. Every non-trivial block still required human review, test coverage, and explicit reasoning about failure modes.

AI as collaborator, not replacement

By the end of the project, it felt less like ‘Claude wrote this’ and more like we co-designed it. The model accelerated the mechanics — drafting code, checking invariants, suggesting edge cases — but the responsibility for architecture, threat modelling, and final judgement remained human.

The most productive mental model was not ‘AI as junior developer’ or ‘AI as magic generator’, but ‘AI as tireless design partner’. It never got bored of refactoring an interface. It never complained about re-running the reasoning from first principles. But it also needed direction, constraints, and occasionally correction.

In a legacy C++ codebase where small mistakes can have subtle consequences, that balance turned out to be powerful — and occasionally nerve-wracking.