feat(M018/S01): GateEvalContext persistence — resume-on-miss for gate_run #3
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "kata/root/M018/S01"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Completes the write-through cache pattern for GateEvalContext so gate evaluations survive MCP server restarts.
Changes
T01 (
crates/assay-core/src/gate/session.rs):pub fn find_context_for_spec(assay_dir, spec_name) -> Result<Option<GateEvalContext>>.assay/gate_sessions/*.jsonin reverse chronological order, returns most recent matchtracing::warn!(never propagates individual errors)T02 (
crates/assay-mcp/src/server.rs):find_context_for_specintogate_runhandler beforecreate_session()command_resultswith fresh results, re-persists, inserts into HashMap, spawns timeout taskcreate_session()defensivelytest_gate_run_resumes_session_from_diskproves the full restart-resume cycleVerification
cargo test -p assay-core -- find_context_for_spec: 4 ✅cargo test -p assay-core -- save_and_load: 3 ✅cargo test -p assay-mcp -- gate_run_resumes: 1 ✅cargo test -p assay-mcp -- gate_run: 16 ✅just ready(1579 tests): ✅Closes R100 — GateEvalContext persistence to disk
- C2: Check in-memory HashMap before scanning disk. Disk scan is now only triggered on actual miss (cold start / post-restart), not on every gate_run. This prevents a live in-memory session from being silently replaced by its stale on-disk snapshot, which would discard accumulated agent_evaluations. - I2: pending_criteria now filters out already-evaluated criteria on resume (same logic as gate_report). New sessions are unaffected (no evaluations yet). - I3: Removed the redundant re-persist inside the disk-resume arm. A single write-through at the end of the handler is sufficient. - Tracing: renamed 'spec' field to 'spec_name' for consistency with other handlers. - Tests added: - test_gate_run_creates_new_session_after_finalize: gate_run after gate_finalize must create a fresh session, not resume the finalized one (disk file deleted). - test_gate_run_preserves_in_memory_session_evaluations: live in-memory session must not be bypassed by disk scan (agent_evaluations preserved across gate_run). - Updated existing resume test to also assert pending_criteria excludes already-evaluated criteria.Review summary (multi-agent, 3 reviewers)
Reviewed by: correctness, API/observability, performance reviewers in parallel.
Critical findings addressed in this PR:
gate_run— disk scanned unconditionally even for live sessions, which would discard accumulatedagent_evaluations. Fixed: check HashMap first; disk scan only on actual miss (cold start / post-restart). Seefix(S01)commit.pending_criteriareturned stale full list even on resume. Fixed: filter out already-evaluated criteria, same logic asgate_report.test_gate_run_creates_new_session_after_finalize.test_gate_run_preserves_in_memory_session_evaluationsproving the HashMap-first path.False positive from review (C1): Reviewers flagged finalized sessions being re-opened.
gate_finalizealready deletes the disk file — confirmed in source. Not a bug.Backlogged (minor, not merge blockers):
spec_nametracing field across handlersFinal state: 1582 tests,
just readygreen, all critical review findings addressed.