feat(M018/S01): GateEvalContext persistence — resume-on-miss for gate_run #3

Merged
wollax merged 3 commits from kata/root/M018/S01 into main 2026-04-01 13:09:17 +00:00
Owner

Completes the write-through cache pattern for GateEvalContext so gate evaluations survive MCP server restarts.

Changes

T01 (crates/assay-core/src/gate/session.rs):

  • pub fn find_context_for_spec(assay_dir, spec_name) -> Result<Option<GateEvalContext>>
  • Scans .assay/gate_sessions/*.json in reverse chronological order, returns most recent match
  • Skips unreadable files with tracing::warn! (never propagates individual errors)
  • 4 unit tests: match, no-match, empty-dir, most-recent-wins

T02 (crates/assay-mcp/src/server.rs):

  • Wired find_context_for_spec into gate_run handler before create_session()
  • On resume: replaces command_results with fresh results, re-persists, inserts into HashMap, spawns timeout task
  • On disk scan failure: logs warn, falls through to create_session() defensively
  • Contract test test_gate_run_resumes_session_from_disk proves the full restart-resume cycle

Verification

  • cargo test -p assay-core -- find_context_for_spec: 4
  • cargo test -p assay-core -- save_and_load: 3
  • cargo test -p assay-mcp -- gate_run_resumes: 1
  • cargo test -p assay-mcp -- gate_run: 16
  • just ready (1579 tests):

Closes R100 — GateEvalContext persistence to disk

Completes the write-through cache pattern for GateEvalContext so gate evaluations survive MCP server restarts. ## Changes **T01** (`crates/assay-core/src/gate/session.rs`): - `pub fn find_context_for_spec(assay_dir, spec_name) -> Result<Option<GateEvalContext>>` - Scans `.assay/gate_sessions/*.json` in reverse chronological order, returns most recent match - Skips unreadable files with `tracing::warn!` (never propagates individual errors) - 4 unit tests: match, no-match, empty-dir, most-recent-wins **T02** (`crates/assay-mcp/src/server.rs`): - Wired `find_context_for_spec` into `gate_run` handler before `create_session()` - On resume: replaces `command_results` with fresh results, re-persists, inserts into HashMap, spawns timeout task - On disk scan failure: logs warn, falls through to `create_session()` defensively - Contract test `test_gate_run_resumes_session_from_disk` proves the full restart-resume cycle ## Verification - `cargo test -p assay-core -- find_context_for_spec`: 4 ✅ - `cargo test -p assay-core -- save_and_load`: 3 ✅ - `cargo test -p assay-mcp -- gate_run_resumes`: 1 ✅ - `cargo test -p assay-mcp -- gate_run`: 16 ✅ - `just ready` (1579 tests): ✅ Closes R100 — GateEvalContext persistence to disk
feat(S01/T02): wire resume-on-miss into gate_run handler with contract test
All checks were successful
CI / Validate plugins (pull_request) Successful in 3s
CI / Check (stable) (pull_request) Successful in 3m13s
60b56c128b
fix(S01): address review findings — in-memory-first lookup, pending_criteria filtering, regression tests
All checks were successful
CI / Validate plugins (pull_request) Successful in 3s
CI / Check (stable) (pull_request) Successful in 4m46s
a2c32c47ca
- C2: Check in-memory HashMap before scanning disk. Disk scan is now only triggered
  on actual miss (cold start / post-restart), not on every gate_run. This prevents
  a live in-memory session from being silently replaced by its stale on-disk snapshot,
  which would discard accumulated agent_evaluations.

- I2: pending_criteria now filters out already-evaluated criteria on resume (same logic
  as gate_report). New sessions are unaffected (no evaluations yet).

- I3: Removed the redundant re-persist inside the disk-resume arm. A single write-through
  at the end of the handler is sufficient.

- Tracing: renamed 'spec' field to 'spec_name' for consistency with other handlers.

- Tests added:
  - test_gate_run_creates_new_session_after_finalize: gate_run after gate_finalize
    must create a fresh session, not resume the finalized one (disk file deleted).
  - test_gate_run_preserves_in_memory_session_evaluations: live in-memory session
    must not be bypassed by disk scan (agent_evaluations preserved across gate_run).
  - Updated existing resume test to also assert pending_criteria excludes
    already-evaluated criteria.
Author
Owner

Review summary (multi-agent, 3 reviewers)

Reviewed by: correctness, API/observability, performance reviewers in parallel.

Critical findings addressed in this PR:

  • C2 (critical): In-memory HashMap was bypassed on every gate_run — disk scanned unconditionally even for live sessions, which would discard accumulated agent_evaluations. Fixed: check HashMap first; disk scan only on actual miss (cold start / post-restart). See fix(S01) commit.
  • I2 (important): pending_criteria returned stale full list even on resume. Fixed: filter out already-evaluated criteria, same logic as gate_report.
  • I3 (important): Redundant double-write in resume path. Fixed: one write-through at end.
  • I5 (important): No test for gate_run after gate_finalize. Fixed: test_gate_run_creates_new_session_after_finalize.
  • Added test_gate_run_preserves_in_memory_session_evaluations proving the HashMap-first path.

False positive from review (C1): Reviewers flagged finalized sessions being re-opened. gate_finalize already deletes the disk file — confirmed in source. Not a bug.

Backlogged (minor, not merge blockers):

  • WOL-164: test for corrupted session file being skipped
  • WOL-165: gate_sessions unbounded growth — add eviction
  • WOL-166: normalize spec_name tracing field across handlers

Final state: 1582 tests, just ready green, all critical review findings addressed.

## Review summary (multi-agent, 3 reviewers) **Reviewed by:** correctness, API/observability, performance reviewers in parallel. **Critical findings addressed in this PR:** - **C2 (critical):** In-memory HashMap was bypassed on every `gate_run` — disk scanned unconditionally even for live sessions, which would discard accumulated `agent_evaluations`. Fixed: check HashMap first; disk scan only on actual miss (cold start / post-restart). See `fix(S01)` commit. - **I2 (important):** `pending_criteria` returned stale full list even on resume. Fixed: filter out already-evaluated criteria, same logic as `gate_report`. - **I3 (important):** Redundant double-write in resume path. Fixed: one write-through at end. - **I5 (important):** No test for gate_run after gate_finalize. Fixed: `test_gate_run_creates_new_session_after_finalize`. - Added `test_gate_run_preserves_in_memory_session_evaluations` proving the HashMap-first path. **False positive from review (C1):** Reviewers flagged finalized sessions being re-opened. `gate_finalize` already deletes the disk file — confirmed in source. Not a bug. **Backlogged (minor, not merge blockers):** - WOL-164: test for corrupted session file being skipped - WOL-165: gate_sessions unbounded growth — add eviction - WOL-166: normalize `spec_name` tracing field across handlers **Final state:** 1582 tests, `just ready` green, all critical review findings addressed.
wollax merged commit d6247fe3a1 into main 2026-04-01 13:09:17 +00:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
wollax/assay!3
No description provided.