Skip to content

Shard Management and Online Split Lifecycle

Shard Management and Online Split Lifecycle

Section titled “Shard Management and Online Split Lifecycle”

WorkerSQL supports live shard rebalancing without downtime. The split lifecycle is orchestrated inside the edge worker by ShardSplitService, with data export/import delegated to the TableShard durable objects. This document captures the operational contract, component interactions, and acceptance criteria for the shard split workflow delivered in this iteration.

  • ShardSplitService: owns plan state, phase transitions, progress metrics, and validation against routing policies.
  • RoutingVersionManager: persists routing policy history in KV, guards version bumps, and handles rollback.
  • ConfigService: supplies table policies (primary keys, shard-by columns) consumed during backfill iteration.
  • TableShard Durable Objects: expose admin/export, admin/import, and admin/events admin endpoints used for bulk backfill and tail replay.
  • Cloudflare KV (APP_CACHE): durable storage for split plans, routing versions, and idempotency cursors.
PhaseTriggerKey WorkExit Criteria
planningplanSplitValidate tenants, snapshot table policies, persist plan skeletonPlan persisted with backfill/tail initialized
dual_writestartDualWriteEnable dual-write routing; clear previous error stateReady for backfill launcher
backfillrunBackfillExecute async export/import loops per tenant & table; update row counters and cursorsAll tables copied, tail.status reset to pending, transition to tailing
tailingBackfill completesMaintain dual-write until tail replay catches upreplayTail finishes with tail.status = caught_up, phase → cutover_pending
cutover_pendingTail caught upAwait routing update approvalcutover bumps routing version, phase → completed
completedCutover committedTenants now routed to target shardMetrics frozen, plan retained for audit
rolled_backrollbackRevert routing pointer, reset plan stateReady to re-run workflow
  1. Planning

    • Inputs: source shard, target shard, tenant IDs.
    • Validations: tenant list non-empty, source ≠ target, tenants currently routed to source shard, no overlapping active plan.
    • Outcomes: KV record shard_split:plan:{id} with canonical backfill/tail structures.
  2. Dual Write

    • Flip plan phase to dual_write, clear residual error state, mark dualWriteStartedAt (used for tail replay window).
  3. Backfill Execution

    • runBackfill persists running status then schedules executeBackfill via ExecutionContext.waitUntil.
    • backfillTable iterates admin/export results using cursor + limit 200; each batch invokes target shard admin/import in upsert mode.
    • Progress metrics: totalRowsCopied, per-table cursor map, backfill.startedAt/completedAt.
  4. Tail Replay

    • Fetch events via admin/events with afterId cursor to ensure idempotency.
    • Filter events to target tenants, ignore SELECT statements, route DDL to /ddl endpoint, other mutations to /mutation.
    • Persist lastEventId and lastEventTs after each event to support resumability.
    • When batch size < limit, mark tail.status = caught_up, phase → cutover_pending.
  5. Cutover

    • Copy routing policy, update tenant → target shard assignments, increment version, persist via RoutingVersionManager.updateCurrentPolicy.
    • Plan captures routingVersionCutover for audit and metrics reporting.
  6. Rollback

    • Reset routing pointer to routingVersionAtStart.
    • Clear backfill/tail progress markers and error message, phase → rolled_back.
  • Backfill work is always scheduled via ExecutionContext.waitUntil to avoid worker request timeouts.
  • Each export/import batch yields to event loop (yieldToEventLoop) to keep cooperative concurrency responsive.
  • Failures capture an errorMessage and flip the associated phase status (backfill.status or tail.status) to failed for observability.

ShardSplitService.getMetrics() surfaces per-plan metrics:

  • splitId, sourceShard, targetShard
  • phase, totalRowsCopied
  • backfillStatus, tailStatus
  • tenants, startedAt, updatedAt

These are suitable for dashboarding and alerting on stalled phases.

Test SuiteLocationCoverage
Unittests/services/ShardSplitService.test.tsPlanning validation, async backfill execution, status persistence
Integrationtests/integration/shardSplit.integration.test.tsFull lifecycle: plan → dual-write → backfill → tail replay → cutover
Smoketests/smoke/shardSplit.smoke.test.tsService boot sanity check
Fuzztests/fuzz/shardSplit.fuzz.test.tsRandomized tenant lists verifying validation invariants
Browser (Playwright)tests/browser/shardSplit.spec.tsAcceptance documentation rendered and key lifecycle sections visible
  1. Valid Planning: A split cannot be planned unless every tenant is currently routed to the source shard and no other active plan overlaps (unit + fuzz coverage).
  2. Backfill Safety: Each export/import batch updates cursors atomically and marks success, retrying from persisted state (unit + integration coverage).
  3. Tail Idempotency: Events replay strictly increasing id and skip non-applicable writes, guaranteeing no duplicate mutations (integration coverage).
  4. Cutover Discipline: Routing version increments exactly once and records the new version on the plan (integration coverage).
  5. Rollback Preparedness: Rolling back restores pending statuses allowing operators to re-run the pipeline (covered by unit smoke assertions of plan normalization).
  6. Operator Visibility: Metrics expose backfill/tail status transitions for observability (implicitly covered by unit tests verifying persisted state changes).
  • Use planSplit via admin API (payload: source, target, tenants, description).
  • After validation, call startDualWrite to begin mirrored writes.
  • Trigger runBackfill (worker automatically schedules background execution); monitor metrics for backfill.status.
  • Poll and execute replayTail until phase becomes cutover_pending with tail.status = caught_up.
  • Call cutover to move tenants to the new shard.
  • If anomalies surface before cutover, call rollback to revert routing and reset state.
  • Emit structured logs for each phase transition to feed observability pipelines.
  • Persist per-table success/failure counters for detailed reporting.
  • Add queue-based notification when tail replay completes to reduce manual polling.