cut-over.net Generate plan →
Guides

Dry run migration: a complete how-to

How to plan and execute a dry run migration at production-equivalent data volumes. Covers the T-10 to T-7 timing, environment prep, validation strategy, what to measure, and what only a dry run catches that other rehearsals don't.

A dry run migration is a full end-to-end execution of the data migration phase, against production-equivalent data volumes, in a non-production environment. It exists for one reason: data migrations are the longest and most variable phase of a cutover, and their failure modes only surface at full scale.

It is one of the four cutover rehearsal types and the one most often skipped — usually because teams convince themselves that a smaller-scale test “tells us the same thing.” It does not.

Definition

A dry run migration is a time-boxed, end-to-end execution of the data migration step of the cutover plan, run against a production-equivalent dataset in a non-production environment. It includes data extraction, transformation, load, and validation. It does not include source-system lock, user-facing cutover, smoke tests, or go-live.

The dry run’s primary outputs are two: (1) a measurement of how long the real migration will take, and (2) a list of scale-dependent failures that have to be fixed before T-0.

When the dry run happens

T-10 to T-7 days. Usually a day or two before the dress rehearsal at T-7.

That timing works because:

  • The dry run de-risks the data migration phase so the dress rehearsal can focus on coordination and decisions
  • A migration tool that fails in the dry run can still be fixed before T-7
  • Production-equivalent data exists by then (the refresh cycle should align)

A dry run earlier than T-14 is usually premature — migration code and ETL jobs are not stable enough. A dry run later than T-3 leaves no time to act on failures.

What’s in scope

A dry run migration includes:

  • Source-system data extraction at full volume
  • All transformations / ETL processing
  • Load into the target system
  • Row-count and checksum validation
  • Sample-record verification (statistical and business-rules-driven)
  • End-to-end elapsed-time measurement

A dry run migration excludes:

  • Source-system lock (the source stays live)
  • User-facing cutover (DNS, load balancer, opening to users)
  • Smoke tests on the target’s full application layer
  • Business sign-off
  • Rollback exercise

The narrower scope is deliberate. The dry run answers a single question: how long does the data migration take, and what fails at full scale? Other rehearsals answer other questions.

How a dry run differs from adjacent rehearsals

Dry run (T-10)Mock cutover (T-14)Dress rehearsal (T-7)
ScopeData migration onlyTechnical cutover phasesEnd-to-end
Data volumeProduction-equivalentProduction-equivalent (recommended)Production-equivalent
Source-system lockNoYesYes
Smoke testsNoYesYes
Business sign-offNoNoYes
Rollback exerciseNoNoYes
Time investment8–24 hoursHalf day to one dayFull cutover window

The three rehearsals complement each other. Each catches something the others do not.

The dry run runbook

Pre-run preparation (T-14 to T-11)

  1. Refresh the non-production environment from a recent production snapshot. No older than 7 days. Data patterns drift, and the patterns that drift fastest are usually the ones that cause production failures.
  2. Confirm production-equivalent infrastructure. Same target compute, same storage class, same network throughput as production. A dry run on undersized infrastructure will time out and you will not know whether the failure is real or environmental.
  3. Verify migration tooling. The exact version of the migration code that will run in production. No “we’ll patch it before T-0” — patch it now, then dry run the patched version.
  4. Stage observability. Log aggregation, performance metrics, error tracking. You need to be able to answer what was slow after the run completes, not just did it finish.
  5. Brief the participants. Migration owner, data lead, infra lead, and one observer per impacted business stream. The cutover lead is optional but useful.

Execution day (T-10)

  1. Start the run on the real clock. If the production migration starts at 23:00 Saturday, start the dry run at 23:00 the previous week. Throughput patterns differ between business hours and off-hours, especially for migrations that traverse shared infrastructure.
  2. Time-stamp every checkpoint. Extraction start, extraction end, transform start, transform end, load start, load end, validation start, validation end. Granularity matters — a 24-hour run that finished in 18 needs to know which phase saved the time.
  3. Run the validation suite. Row counts, checksums, and at least 100 sample-record verifications across the data set. Validation is itself a phase to measure — it is consistently underestimated.
  4. Capture every error. Not just the ones that broke the run. Warning-level logs from production-volume data often reveal the next failure mode.
  5. Do not skip steps because they passed last time. The migration tool may have changed; the data may have changed.

Post-run analysis (T-9)

  1. Produce the timing report. Elapsed time per phase, throughput per phase, peak resource utilization. Compare to the cutover plan estimate.
  2. Triage every failure and warning. Each one has an owner and a due date — before T-7 if at all possible, before T-0 always.
  3. Update the cutover plan. If the dry run took longer than the plan budget, the plan changes. Pretending the production run will be faster is the same mistake that ends careers.
  4. Decide on a second dry run. If the first surfaced fewer than three issues and timing fit budget with margin, one is enough. If it surfaced ten or more, or timing was tight, schedule another at T-3.

What only the dry run catches

Failure modes that the mock cutover and the dress rehearsal often miss, because their scope is broader and the data is implicitly smaller:

Throughput collapse. ETL tools that perform fine on 10% data hit memory or I/O ceilings at 100%. This is the most common failure mode and the most common reason dry runs are scheduled in the first place.

Outlier records. The one customer record with 50,000 line items. The one order with malformed Unicode. These pass small-sample tests and fail production at 03:00.

Datetime and timezone edge cases. DST transitions inside the migration window. Records with timestamps near zero (epoch). Records with timestamps in the future. All survive small-sample validation; some break at full scale.

Lock contention and concurrency. At small scale, migration scripts that hold long locks finish before another job needs the resource. At full scale, they collide.

Cumulative validation cost. Row-count validation on 50M rows takes meaningfully longer than on 5M rows — sometimes non-linearly. The validation phase is often the one that blows the timing estimate.

What to measure

Five numbers determine whether the data migration phase of the production cutover is ready:

MetricWhy it matters
Total elapsed timeDoes the migration fit in the cutover window with ≥30% buffer?
Throughput (rows/sec or GB/min)Can the production environment sustain this?
Peak resource utilizationIs there headroom for the unexpected?
Validation elapsed timeOften half of total migration time on large datasets
Error / warning countTrend matters — if error count is rising, fix root causes before T-0

Capture all five, in writing, before the dry-run debrief.

Production data vs synthetic data

The default position should be: use production data, refreshed within the last week, in a non-production environment with the same access controls as production.

Where regulations or data-protection policies prohibit this — for GDPR-regulated programs, regulated healthcare data, financial data with audit constraints — the alternative is data that is:

  • Volumetrically equivalent (same row counts, same table sizes, same blob sizes)
  • Statistically representative (matching null distributions, outlier patterns, value ranges)
  • Encoding-equivalent (same character sets, same datetime formats, same numeric precisions)

Most synthetic datasets meet the first criterion and fail the second and third. If your synthetic data is not statistically and encoding-equivalent, the dry run will under-report failure modes — which is worse than not running it at all, because it generates false confidence.

Common mistakes

Using last quarter’s data refresh. Data patterns drift. The patterns that drift fastest — outlier sizes, value distributions, encoding edge cases — are the ones most likely to break production. Refresh within 7 days.

Cutting validation to save time. “We’ll spot-check 10 records instead of 100.” Validation is the cheapest insurance the dry run provides; do not cut it.

Running on smaller infrastructure to save cost. Cloud cost is real, but a dry run on half-size compute does not tell you the production migration will work. Pay for the right-sized environment or do not run the dry run at all.

Ignoring warning-level errors. “Those are just informational.” Until they are not. Triage the warnings; many become T-0 incidents.

Pretending production will be faster. Production is slower more often than it is faster. Adjacent jobs, network noise, and authentication overhead all add latency that does not appear in non-prod. If the dry run takes 18 hours, plan for 20 in production.

When to run a second dry run

Schedule a second dry run at T-3 if any of these conditions hold after the first:

  • More than five action items
  • Elapsed time exceeded the plan budget at all
  • Throughput collapse during any phase
  • Migration tooling required a patch
  • Production data refresh changed materially

Otherwise, one dry run is enough — provided the dress rehearsal at T-7 exercises the data migration step as part of its end-to-end run.

Generate a phased cutover plan

The cutover plan template generator produces a plan with a dry run scheduled at T-10 and the dress rehearsal at T-7, with timing budgets per phase that the dry-run output can be measured against.

Frequently asked questions

What is a dry run migration?
A dry run migration is a full end-to-end execution of the data migration phase only, against production-equivalent data volumes, in a non-production environment. It does not include source-system lock, go-live, or user-facing changes. Its purpose is to surface scale-dependent failures and produce an accurate timing estimate for the production cutover.
How is a dry run migration different from a mock cutover?
A dry run focuses on the data migration phase only and always uses production-equivalent data volumes. A mock cutover covers more cutover-window activities (lock, migrate, validate, smoke test) but can be run with a smaller dataset. The dry run is narrower in scope but deeper in scale coverage.
When should a dry run migration happen?
T-10 to T-7 days — usually one or two days before the full dress rehearsal at T-7. This gives the team time to act on issues found and ensures the migration tooling is stable before being incorporated into the broader rehearsal.
Should a dry run use real production data?
Yes, where regulations and policy allow. Anonymized or synthetic data hides the patterns that cause production failures — null distributions, outlier records, character encodings, datetime formats. Where production data cannot be used, the test data must be statistically representative, not just volumetrically equivalent.
How many dry runs should a program do?
At least one before the dress rehearsal. For complex migrations — large data volumes, multiple sources, transformation-heavy ETL, regulated data — schedule two: a first at T-21 to surface gross errors, a second at T-10 to confirm fixes and refine timing.
What is the most important metric from a dry run?
Total elapsed time, measured from start of extraction to completion of validation. This is the number that determines whether the production cutover window is realistic. If elapsed time exceeds the window's data-migration budget without a 30% buffer, the cutover plan needs revision before T-0.