#3 - feat(M018/S01): GateEvalContext persistence — resume-on-miss for gate_run - wollax/assay

wollax commented

2026-04-01 12:55:53 +00:00

Owner

Completes the write-through cache pattern for GateEvalContext so gate evaluations survive MCP server restarts.

Changes

T01 (crates/assay-core/src/gate/session.rs):

pub fn find_context_for_spec(assay_dir, spec_name) -> Result<Option<GateEvalContext>>
Scans .assay/gate_sessions/*.json in reverse chronological order, returns most recent match
Skips unreadable files with tracing::warn! (never propagates individual errors)
4 unit tests: match, no-match, empty-dir, most-recent-wins

T02 (crates/assay-mcp/src/server.rs):

Wired find_context_for_spec into gate_run handler before create_session()
On resume: replaces command_results with fresh results, re-persists, inserts into HashMap, spawns timeout task
On disk scan failure: logs warn, falls through to create_session() defensively
Contract test test_gate_run_resumes_session_from_disk proves the full restart-resume cycle

Verification

cargo test -p assay-core -- find_context_for_spec: 4 ✅
cargo test -p assay-core -- save_and_load: 3 ✅
cargo test -p assay-mcp -- gate_run_resumes: 1 ✅
cargo test -p assay-mcp -- gate_run: 16 ✅
just ready (1579 tests): ✅

Closes R100 — GateEvalContext persistence to disk

Completes the write-through cache pattern for GateEvalContext so gate evaluations survive MCP server restarts. ## Changes **T01** (`crates/assay-core/src/gate/session.rs`): - `pub fn find_context_for_spec(assay_dir, spec_name) -> Result<Option<GateEvalContext>>` - Scans `.assay/gate_sessions/*.json` in reverse chronological order, returns most recent match - Skips unreadable files with `tracing::warn!` (never propagates individual errors) - 4 unit tests: match, no-match, empty-dir, most-recent-wins **T02** (`crates/assay-mcp/src/server.rs`): - Wired `find_context_for_spec` into `gate_run` handler before `create_session()` - On resume: replaces `command_results` with fresh results, re-persists, inserts into HashMap, spawns timeout task - On disk scan failure: logs warn, falls through to `create_session()` defensively - Contract test `test_gate_run_resumes_session_from_disk` proves the full restart-resume cycle ## Verification - `cargo test -p assay-core -- find_context_for_spec`: 4 ✅ - `cargo test -p assay-core -- save_and_load`: 3 ✅ - `cargo test -p assay-mcp -- gate_run_resumes`: 1 ✅ - `cargo test -p assay-mcp -- gate_run`: 16 ✅ - `just ready` (1579 tests): ✅ Closes R100 — GateEvalContext persistence to disk

wollax added 2 commits

2026-04-01 12:55:53 +00:00

feat(S01/T01): add find_context_for_spec to assay-core with unit tests 60dc6a0186

feat(S01/T02): wire resume-on-miss into gate_run handler with contract test

CI / Validate plugins (pull_request) Successful in 3s

Details

CI / Check (stable) (pull_request) Successful in 3m13s

Details

60b56c128b

wollax added 1 commit

2026-04-01 13:08:15 +00:00

fix(S01): address review findings — in-memory-first lookup, pending_criteria filtering, regression tests

CI / Validate plugins (pull_request) Successful in 3s

Details

CI / Check (stable) (pull_request) Successful in 4m46s

Details

a2c32c47ca

- C2: Check in-memory HashMap before scanning disk. Disk scan is now only triggered
  on actual miss (cold start / post-restart), not on every gate_run. This prevents
  a live in-memory session from being silently replaced by its stale on-disk snapshot,
  which would discard accumulated agent_evaluations.

- I2: pending_criteria now filters out already-evaluated criteria on resume (same logic
  as gate_report). New sessions are unaffected (no evaluations yet).

- I3: Removed the redundant re-persist inside the disk-resume arm. A single write-through
  at the end of the handler is sufficient.

- Tracing: renamed 'spec' field to 'spec_name' for consistency with other handlers.

- Tests added:
  - test_gate_run_creates_new_session_after_finalize: gate_run after gate_finalize
    must create a fresh session, not resume the finalized one (disk file deleted).
  - test_gate_run_preserves_in_memory_session_evaluations: live in-memory session
    must not be bypassed by disk scan (agent_evaluations preserved across gate_run).
  - Updated existing resume test to also assert pending_criteria excludes
    already-evaluated criteria.

wollax commented

2026-04-01 13:09:02 +00:00

Author

Owner

Review summary (multi-agent, 3 reviewers)

Reviewed by: correctness, API/observability, performance reviewers in parallel.

Critical findings addressed in this PR:

C2 (critical): In-memory HashMap was bypassed on every gate_run — disk scanned unconditionally even for live sessions, which would discard accumulated agent_evaluations. Fixed: check HashMap first; disk scan only on actual miss (cold start / post-restart). See fix(S01) commit.
I2 (important): pending_criteria returned stale full list even on resume. Fixed: filter out already-evaluated criteria, same logic as gate_report.
I3 (important): Redundant double-write in resume path. Fixed: one write-through at end.
I5 (important): No test for gate_run after gate_finalize. Fixed: test_gate_run_creates_new_session_after_finalize.
Added test_gate_run_preserves_in_memory_session_evaluations proving the HashMap-first path.

False positive from review (C1): Reviewers flagged finalized sessions being re-opened. gate_finalize already deletes the disk file — confirmed in source. Not a bug.

Backlogged (minor, not merge blockers):

WOL-164: test for corrupted session file being skipped
WOL-165: gate_sessions unbounded growth — add eviction
WOL-166: normalize spec_name tracing field across handlers

Final state: 1582 tests, just ready green, all critical review findings addressed.

## Review summary (multi-agent, 3 reviewers) **Reviewed by:** correctness, API/observability, performance reviewers in parallel. **Critical findings addressed in this PR:** - **C2 (critical):** In-memory HashMap was bypassed on every `gate_run` — disk scanned unconditionally even for live sessions, which would discard accumulated `agent_evaluations`. Fixed: check HashMap first; disk scan only on actual miss (cold start / post-restart). See `fix(S01)` commit. - **I2 (important):** `pending_criteria` returned stale full list even on resume. Fixed: filter out already-evaluated criteria, same logic as `gate_report`. - **I3 (important):** Redundant double-write in resume path. Fixed: one write-through at end. - **I5 (important):** No test for gate_run after gate_finalize. Fixed: `test_gate_run_creates_new_session_after_finalize`. - Added `test_gate_run_preserves_in_memory_session_evaluations` proving the HashMap-first path. **False positive from review (C1):** Reviewers flagged finalized sessions being re-opened. `gate_finalize` already deletes the disk file — confirmed in source. Not a bug. **Backlogged (minor, not merge blockers):** - WOL-164: test for corrupted session file being skipped - WOL-165: gate_sessions unbounded growth — add eviction - WOL-166: normalize `spec_name` tracing field across handlers **Final state:** 1582 tests, `just ready` green, all critical review findings addressed.