Case Study: Accelerating Legacy Platform Modernization with Agentic AI

A North American enterprise ran its operations on a platform that had been in production for over two decades—layers of business logic accumulated across many years and many hands, much of it locked in legacy stored procedures, with routing decisions buried in nested T-SQL that no single person fully understood.

Modernizing a system like this is not optional. But doing it without disrupting live operations demands a level of precision that makes most teams move slowly, cautiously, and expensively.

This is the story of how an agentic AI workflow—built on Anthropic's Claude and the Model Context Protocol—turned a slow, high-risk migration into steady, reviewable, low-risk delivery, and how the approach spread from one engineer to an entire team.

The challenge

This was a large, long-lived enterprise codebase—many years of accumulated .NET versions organized across numerous solutions and projects, with critical behavior split between application code and a large body of legacy stored procedures and services, each wired differently, documented inconsistently, and modified by different people over different years.

Much of that logic implemented business-critical rules where a single wrong decision is a serious incident. The team needed bug-for-bug compatibility with legacy behavior while simultaneously improving the architecture—in a domain where "close enough" is not acceptable.

The goal was to migrate critical logic out of legacy stored procedures and services into clean, testable, modern C#. The motivation was equally clear—the legacy code was undocumented, its interactions opaque, and the knowledge to maintain it concentrated in a handful of long-tenured people. When they were unavailable, work stalled.

But this was not a greenfield effort with a dedicated migration team. It had to happen alongside ongoing feature development, production support, and compliance obligations—carried out by engineers who also had other responsibilities.

The approach

Rather than treating AI as a coding autocomplete, the work centered on giving the model deep, accurate context about the codebase, the data, the business rules, and the team's conventions. The approach rested on three layers: structured project knowledge, custom skill automation, and a centralized tool layer for cross-system integration.

Structured project knowledge

Detailed instruction files were maintained at multiple levels—global preferences, repository-wide conventions, and area-specific context including schema, architectural standards, and anti-hallucination rules. These were not generic documentation; they encoded hard-won lessons (for example, that the migrated C# is the source of truth, not the legacy stored procedures). A persistent memory system tracked ongoing project state and feedback corrections across sessions.

Custom skill automation

Purpose-built skills collapsed repetitive multi-step workflows into single commands—pulling the latest code, fetching the relevant work item, setting up a correctly named branch and worktree, and running a structured pull-request review. The review command launched several specialized agents in parallel (standards compliance, bug scan, history context, test quality, and requirements-gap analysis), each independently analyzing a diff against the project's actual standards, then aggregating findings with confidence scores and filtering known false positives.

These skills were not generic templates—they understood the project's real conventions, so the review distinguished genuine issues from accepted framework patterns that merely look like bugs to a naive reviewer.

A centralized tool layer

A central layer connected the AI workflow to the systems developers actually used—source control and work tracking, internal knowledge and documentation, and project context—each behind appropriate access controls. It included semantic search over the team's internal knowledge base, structured access to work items and pull requests, and a self-service catalog of available tools.

When a developer opened a session in a given part of the codebase, a startup hook automatically matched the working directory and loaded exactly the right context for that area—silently, without the developer doing anything. Every session, across dozens of parallel worktrees, started already oriented to the code at hand.

Key technical decisions

Build the tool layer in-house

Building the integration layer in-house was a pragmatic choice—the team needed tight integration with on-premises systems behind a corporate firewall, with authentication and hosting that off-the-shelf tooling didn't handle cleanly. Central hosting meant unified authentication, logging, and updates without touching individual developer machines.

Safety and governance by design

The system enforced clear boundaries. Data access defaulted to safe, non-production environments, with explicit opt-in required for anything riskier. Permitted operations were whitelisted; destructive ones required explicit approval. Anti-hallucination rules were embedded directly in the project context—"never reshape evidence to fit earlier assumptions," "say 'unclear, need to inspect X' instead of guessing"—and a human stayed in the loop on every change, with access restricted to named users. Governance was the starting point, not an afterthought.

Feature flags for gradual migration

Each migration was deployed behind a feature flag, allowing the legacy and modern paths to run side-by-side in production. This let the team validate behavioral equivalence with real traffic before cutting over, and the AI workflow understood the pattern and applied it consistently to each new migration.

Multi-agent review architecture

The review workflow split analysis across several specialized agents running in parallel, each focused on a different concern—standards, bugs, history context, test quality, and requirements gaps. This was a deliberate choice to prevent the failure mode where a model fixating on one issue class misses another. Each agent received the same diff but different instructions, and findings were aggregated with confidence scores, filtered against known false-positive patterns, and presented as categorized review threads.

Results

The outcome was a step change in delivery. The engineer driving the workflow went from onboarding to producing more merged work than anyone else on the team within a couple of months—a velocity increase of roughly an order of magnitude—while keeping the revert rate well under one percent. Speed went up and quality held.

These were not trivial changes inflated by tooling. The work included:

Large structural migrations—complete, reviewable moves of major services from the legacy architecture onto a modern Entity Framework Core foundation
Business-logic migration—complex rules lifted from nested legacy T-SQL into discrete, testable C# classes, with the AI workflow reading the legacy definitions, cross-referencing existing code, and surfacing behavioral gaps
Net-new capabilities—wholly new systems built from scratch, including supporting pipelines, dashboards, and operational safeguards, not just like-for-like ports

Quality did not suffer. Across well over a thousand commits, reverts stayed under one percent—and the few that occurred clustered around a single feature, not a pattern of instability.

Dozens of parallel worktrees ran at once—a scale of concurrent development that would be unmanageable without automated context management. Each had its own session with automatically injected context, its own branch under strict naming conventions, and its own link back to the work item that justified it.

Organizational adoption

What began as one engineer's workflow became an organizational shift. After demonstrating the results, the approach was widely adopted across the engineering team. Developers now collaboratively maintain shared skill libraries, hook configurations, and project instruction files—treating AI workflow infrastructure as a shared codebase in its own right. Teams contribute domain-specific context for their areas of the platform, review and refine each other's automation, and build on the MCP server ecosystem for their own use cases. The methodology has changed how work gets done across the organization—not as a top-down mandate, but as a bottom-up adoption driven by visible results.

Lessons learned

Context is the product, not the model

The most impactful investment was not in prompt engineering or model selection but in building the infrastructure that gave AI accurate, comprehensive context about the codebase, the databases, the business rules, and the team's specific conventions. A model without context generates plausible but dangerous code in a legacy enterprise system. A model with deep context—the right instructions, the right database schemas, the right anti-patterns to avoid—generates code that experienced developers review and merge with confidence.

Encode corrections, not just instructions

The persistent memory system proved essential. When the AI made a mistake—using OUTPUT INSERTED instead of SCOPE_IDENTITY(), or catching exceptions it couldn't handle—the correction was saved as a durable memory with explicit reasoning. This meant the same mistake never happened twice, across sessions, across worktrees, across weeks of work. Traditional documentation tells you what to do; feedback memories tell the AI why not to do the thing it's about to do.

Invest in the seams, not the center

The highest-value automation was at the boundaries between systems—the skill that fetched a work item, created a branch, and set up a worktree; the hook that injected context at session start; the review workflow that pulled diffs, cross-referenced requirements, and posted findings back to the PR. The actual code generation was, in some sense, the easy part. What made the workflow sustainable was eliminating the friction of switching contexts, remembering conventions, and maintaining consistency across dozens of parallel workstreams. The model writes code; the infrastructure makes sure it writes the right code, in the right place, for the right reasons.

Accelerating legacy platform modernization with agentic AI workflows