Terminal Runtime Hardening
- Status: Proposed
- Date: 2026-04-09
- Decider: Architect
Context and Problem Statement
Wardian's terminal layer is now carrying too much provider-specific behavior and still has visible quality issues across providers:
- scrolling performance degrades under heavy output
- remounts rely on ad hoc in-memory preservation instead of a clear replay model
- terminal capability handling is partly provider-specific, especially for OpenCode
- theme and color behavior remain inconsistent because Wardian does not yet present a sufficiently complete terminal capability surface
Wardian already uses the correct broad primitives:
portable-ptyon the Rust side as the cross-platform PTY layerxterm.json the frontend as the terminal emulator
So the problem is not that Wardian chose the wrong base stack. The problem is that the runtime contract between PTY, transport, emulator, and renderer is still too thin and too ad hoc.
The current implementation also risks long-term technical debt by continuing to patch provider-specific terminal quirks in AgentTerminal.tsx.
Decision
Wardian will harden the terminal runtime in a first pass focused on in-app correctness and rendering quality, while explicitly avoiding a full app-restart persistence layer.
This pass will:
- improve rendering correctness during resize and remount
- preserve terminal state across UI remounts within a running app session
- centralize terminal capability emulation into a provider-neutral layer
- reduce provider-specific terminal branches
- improve PTY output buffering so large output bursts do not degrade scrolling as severely
Architecture
1. Keep the Current Base Stack
Wardian will keep:
portable-ptyfor backend PTY managementxterm.jsfor frontend terminal emulation
This is intentionally closer to VS Code's architectural shape without trying to reproduce Electron-specific infrastructure directly.
Wardian does not need to replace portable-pty with another PTY abstraction. The equivalent of VS Code's node-pty is already present in the current Rust/Tauri architecture.
2. Separate Parsed Terminal State from Mounted Renderer State
Wardian will not treat the mounted xterm renderer as the source of truth.
The session model will be split into:
- a detached parser terminal that continuously receives PTY output
- a mounted view terminal used only for visible rendering and input
When a view remounts, Wardian should reuse the live renderer if it is still valid. If the renderer must be recreated, Wardian restores it from serialized parser state instead of replaying raw PTY chunks directly into a fresh xterm view.
This keeps renderer lifecycle bugs from becoming terminal-state bugs and is closer to the VS Code split between terminal state ownership and terminal rendering.
Mounted terminal views should prefer xterm's WebGL renderer when available. WebGL enables xterm's custom glyph path for block and box-drawing characters, which is required for provider TUIs that render pixel-art/status UI with block glyphs. If WebGL initialization or context retention fails, the terminal must fall back to the built-in DOM renderer without breaking the session.
3. Introduce a Terminal Capability Broker
Wardian will replace most provider-specific terminal query/reply handling with a capability broker that owns terminal emulation for standard terminal queries and responses.
This broker will be responsible for:
- device status reports / cursor position replies
- terminal pixel-size and resize replies
- DECRQM handling
- palette and standard color queries
- focus in/out handling
- synchronized output toggles
- other standard capability negotiations that providers expect from a modern terminal
The broker should support at least:
- OSC palette handling already needed by OpenCode
- foreground/background color queries such as OSC 10/11 if present in real traces
- normalization for terminal redraw patterns that are standard escape-sequence compositions but produce poor scrollback in embedded xterm views
Provider-specific logic should only remain where a provider genuinely departs from standard terminal behavior.
4. Add Explicit In-Memory Replay Ownership
Wardian will keep terminal replay only for the lifetime of the running app process in this pass.
That means:
- terminal state survives pane remounts, layout changes, and view switches
- terminal state does not yet survive full app restart
Replay ownership should become explicit and parser-owned:
- the detached parser terminal is the canonical in-app state owner
- mounted terminal views are reattached when possible and restored from serialized parser state only when recreation is required
- remounting should not depend on raw PTY replay
- remounting should not rely on replaying raw PTY chunks into a brand-new renderer
This is meant to make remount behavior deterministic and easier to debug.
5. Improve PTY Output Transport and Buffering
Wardian's PTY transport should be tightened so the frontend is not overly dependent on repeated tiny poll/drain cycles.
The first pass should improve:
- batching of PTY output chunks
- replay-friendly buffering
- resistance to output bursts that currently degrade scrolling and repaint behavior
- UTF-8 decoding across PTY read boundaries
This should remain compatible with the current Tauri command/event model, but the data path should become more deliberate and less fragile.
PTY reads are byte streams, not character streams. The backend must preserve an incomplete UTF-8 sequence between reads instead of applying lossy per-chunk decoding. Otherwise a multi-byte glyph split by portable-pty can become replacement characters before xterm sees it, producing visible Wardian-only rendering differences even when the provider emits valid terminal output.
6. Normalize Home-Redraw TUI Scrollback
Several provider TUIs redraw by moving the cursor home and repainting the screen. In a compact embedded terminal, two patterns need explicit handling:
- Some TUIs clear by writing many
EL + newlinesequences and then homing the cursor. Wardian should normalize that to a clear-and-home operation so resize redraws do not become duplicated scrollback. - Some synchronized-output TUIs repaint from cursor-home while leaving the cursor near the bottom of the screen. Before row-shrinking resizes, Wardian should locally home the parser and renderer cursors so xterm does not promote the old transient frame into scrollback before the provider redraws.
- After a resize, a synchronized home-redraw that is mostly already present in the parser buffer should be treated as a duplicate repaint and dropped. This prevents long transcript redraws from being appended as new history when the provider is only repainting for the new geometry.
- Codex's inline TUI can emit a sliding home-redraw viewport. Wardian should run Codex in its documented
--no-alt-screenmode and reconstruct dropped overlapping frame lines into xterm scrollback so users can scroll through prior output.
The Codex frame journal is intentionally provider-scoped because applying the same reconstruction to every home-redraw TUI corrupts Claude's mascot/status rendering.
Scope
Included
- detached parser-terminal plus mounted view-terminal lifecycle
- provider-neutral terminal capability broker
- in-memory replay across UI remounts within a running app
- PTY buffering improvements for smoother scrolling
- targeted native tests for PTY/runtime behavior
Excluded
- full restart persistence like VS Code's headless replay across app relaunch
- replacing
portable-pty - a full dedicated external PTY host process
- provider-specific terminal customization beyond what is required to bridge genuinely non-standard behavior
Testing Strategy
This work cannot be validated by browser-only Playwright.
The required evidence for terminal claims is:
- frontend unit coverage for replay/capability handling
- backend Rust tests for PTY/runtime buffer logic where applicable
- native Tauri runtime E2E for real PTY behavior
- native rendering parity tests that compare Wardian's parser rows against a headless xterm fed the same byte-equivalent frame
- real-provider native validation when a provider-specific terminal behavior is involved
Browser smoke tests remain useful for layout regressions, but they are not sufficient evidence for terminal correctness.
The current audit tooling is split by evidence type:
e2e-native/tests/terminal-rendering-native.test.mjsproves deterministic mock-provider rendering parity against headless xterm, including split UTF-8, resize, scroll, and pause.e2e-native/tests/real-provider-rendering-native.test.mjsis opt-in and captures exact Wardian-rendered screenshots plus parser rows for Codex, Claude, Gemini, and OpenCode. It writes local artifacts undere2e/screenshots/real-provider-rendering/and records the isolatedWARDIAN_HOMEplus provider config override in the run manifest. For real-provider rendering audits, the test defaultsWARDIAN_HOMEtotarget/wardian-e2e-real-provider-homewhen the caller has not set it. This keeps the run isolated without placing Codex's projectedCODEX_HOMEunder the OS temp directory, which Codex release builds warn about before drawing the TUI.- Windows-specific outside-terminal capture is available through
scripts/capture-outside-provider-rendering.ps1. It captures Windows Terminal screenshots undere2e/screenshots/outside-provider-rendering/for initial, resized, scrolled, paused, and interrupted states, forcesTERM=xterm-256colorandCOLORTERM=truecolor, records the provider invocation, and writes aterminal-size.jsonRawUI probe plus aterminal-ansi-query.jsonANSI probe so the reported character and pixel geometry can be compared with Wardian's parser rows and DOM rectangles. The ANSI probe recordsCSI 6 n,CSI 14 t, andCSI 18 tresponses so cursor-position handshakes can be compared alongside pixel and character geometry. When columns and rows are supplied, it also passes Windows Terminal's documented--size <columns>,<rows>launch option before the shell command so the native terminal starts at the intended character geometry. It also launches with--suppressApplicationTitleso child OSC title writes do not replace the outside capture's tab title; this makes the outside chrome closer to Wardian's stable agent-card title. The script accepts-FontZoomStepsand sends Windows Terminal zoom keys before provider startup, then recordsfont_zoom_stepsin both the size probe and manifest. It also recordsinitial_wait_secondsso transient provider rows, such as Claude's startup effort notification, can be compared against captures taken in the same startup window. The script still gates provider startup until the parent process has resized and zoomed the native Windows Terminal window, because launching the provider before those adjustments causes providers to receive Windows Terminal's defaultCSI 14 t/CSI 18 tanswers. The scrolled-top state now sendsCtrl+Home, Windows Terminal'sCtrl+Shift+Homescroll-to-top chord, repeatedCtrl+Shift+PageUp/Ctrl+Shift+Up, and repeated mouse-wheel input over the terminal content, because some provider TUIs keep keyboard focus in their input widget and do not let a single keyboard shortcut move Windows Terminal's visible scrollback. After each screenshot, the script also uses Windows Terminal select-all/copy to writeinitial.txt,resized.txt,scrolled-top.txt,paused.txt, andinterrupted.txtmachine-readable text snapshots. Those snapshots are useful for text comparison, but they can include scrollback beyond the visible viewport and therefore are not a replacement for the screenshot when proving exact visible state. The paused state captures the unchanged visible buffer before any interrupt input, matching Wardian pause's preserved-buffer behavior; the interrupted state remains a separate Ctrl+C artifact. Simple ASCII audit input is typed through Windows Terminal keystrokes instead of pasted, so Claude/Codex captures avoid bracketed-paste redraw differences; clipboard paste remains the fallback for text outside the safe SendKeys subset. When a Wardian OpenCode session id is supplied, the outside capture mirrors Wardian's interactive launch shape withopencode --session <session-id> <habitat-workspace>. When a Wardian Codex session id is supplied, the outside capture mirrors Wardian's interactive launch with the projectedCODEX_HOME, resolvescodex.cmdwhen available, and uses the same current core args Wardian logs on Windows:-c windows.sandbox="unelevated" --dangerously-bypass-approvals-and-sandbox --no-alt-screen --cd <workspace>. Codex outside captures also pass-c tui.show_tooltips=falseto suppress provider-owned rotating startup tips during parity audits. The same outside harness now resolvesclaude.exefor Claude captures, accepts the Wardian session name, setsWARDIAN_SESSION_IDandCLAUDE_CODE_ADDITIONAL_DIRECTORIES_CLAUDE_MD=1, writes the Wardian permission hook settings under the supplied Wardian home, passes those settings with--settings <settings-file>, includes Wardian's common and per-agent--add-dirroots when they exist, and launches with Wardian's--verbose --input-format stream-json --output-format stream-json --session-id <session-id> --name <session-name>shape. It also clears Windows Terminal'sWT_SESSIONandWT_PROFILE_IDbefore provider startup because Wardian's PTY child does not receive those identity variables in the audit harness. Gemini captures resolvegemini.cmdwhen available so the outside launch uses the same Windows command shim Wardian uses instead of PowerShell's.ps1shim. - Windows-specific deterministic outside-frame capture is available through
scripts/capture-outside-terminal-frame.ps1. It renders the same text frame used by the native mock-provider rendering test in Windows Terminal, launches with--size <columns>,<rows>when geometry is supplied, accepts-FontZoomSteps, gates frame rendering until parent-side resize/zoom is complete, and writes matchingterminal-size.jsonandterminal-ansi-query.jsonprobes, so renderer differences can be inspected without provider startup/session noise. The capturee2e/screenshots/outside-terminal-frame/2026-05-11T11-04-22Zconfirms a50x19Windows Terminal launch without a forced pixel window size. The capturee2e/screenshots/outside-terminal-frame/2026-05-11T11-20-56Zconfirms deterministic frame zoom evidence withfont_zoom_steps: 2,CSI 18 t -> 41x16, andCSI 14 t -> 410x320. - Wardian-specific deterministic geometry sweep is available through
e2e-native/tests/terminal-geometry-sweep-native.test.mjswithWARDIAN_E2E_TERMINAL_GEOMETRY_SWEEP=1. It uses the mock provider, forces a stacked grid, captures real Wardian WebView screenshots, and records xterm debug metrics across configurable app widths, font settings, and row heights. The sweepe2e/screenshots/terminal-geometry-sweep/2026-05-11T11-32-38-723Zproves that Wardian can reproduce Windows Terminal's default50x19/500x380text area in a wider stacked card: app window1130x720, grid row height480, andCascadia Mono, Consolas, monospaceat17.5pxproduce xtermcols: 50,rows: 19,cssCellWidth: 10,cssCellHeight: 20, andscreenRect: 500x380. Real provider cards need the same width and font profile but a taller row height because their card chrome consumes more vertical space. The real-provider rune2e/screenshots/real-provider-rendering/2026-05-11T14-12-45-718Zused app window1130x760, grid row height520, and the same17.5pxCascadia stack; Codex, Claude, Gemini, and OpenCode all reported50x19,10x20cells, andscreenRect: 500x380for initial and settled states. The same run then intentionally resized the app to980x680, producing35x19,10x20cells, and a350x380screen rect for resized, scrolled-top, and paused states. This is not the normal default grid-card geometry; it is an explicit parity surface for auditing.
Real-provider sign-off is not complete until Wardian and outside captures are made under comparable terminal geometry, theme, launch context, prompt state, and lifecycle state. The current capture tools make mismatches visible; they do not by themselves prove parity.
The current real-provider audit has proven that Wardian and Windows Terminal can be captured with matching 50x19 character geometry, but exact visual parity is still not achieved. The remaining differences are explicit and reproducible:
- Wardian's normal grid card does not match Windows Terminal's default text-area geometry. The audit surface now proves Wardian can match it, but only by widening the stacked card and increasing the provider card row height. Outer window dimensions remain an unreliable proxy for terminal geometry: the evidence must use xterm debug metrics in Wardian and ANSI
CSI 14 t/CSI 18 tprobes outside Wardian. - Wardian uses the app terminal theme, font stack, xterm Unicode handling, and WebView rendering pipeline; Windows Terminal uses its configured native theme, font, glyph shaping, cursor, and scrollbar behavior. Fresh Wardian Codex evidence in
e2e/screenshots/real-provider-rendering/2026-05-11T10-42-27-332Z/codex/resized.jsonrecords xterm renderer metrics offontFamily: Consolas, "Courier New", monospace,fontSize: 14,cssCellWidth: 7, andcssCellHeight: 17. The matching outside Windows Terminal rune2e/screenshots/outside-provider-rendering/2026-05-11T10-43-34Z/codexrecordsCSI 18 t -> 50x19andCSI 14 t -> 500x380, which is effectively a10x20text area. This explains why matching50x19fixes wrapping but not visual parity. - The Wardian real-provider audit can now vary terminal font settings through
WARDIAN_E2E_TERMINAL_FONT_SIZEandWARDIAN_E2E_TERMINAL_FONT_FAMILY, and records the selected font profile in its manifest. Two Codex probes make the current constraint explicit:e2e/screenshots/real-provider-rendering/2026-05-11T11-09-00-789Z/codex/resized.jsonusesCascadia Mono, Consolas, monospaceat12px. It preserves50columns, but xterm still reports a7x14cell, far from Windows Terminal's~10x20.e2e/screenshots/real-provider-rendering/2026-05-11T11-09-39-426Z/codex/resized.jsonuses the same family at16px. It moves xterm to9x19, much closer to Windows Terminal's native cell metrics, but the 400px card can only fit39x17, so text wrapping no longer matches the outside50x19capture.- Outside Windows Terminal font-zoom probes show the same tradeoff from the other side.
e2e/screenshots/outside-provider-rendering/2026-05-11T11-15-02Z/codexrecordsfont_zoom_steps: 1,CSI 18 t -> 45x18, andCSI 14 t -> 450x360;e2e/screenshots/outside-provider-rendering/2026-05-11T11-14-35Z/codexrecordsfont_zoom_steps: 2,41x16, and410x320;e2e/screenshots/outside-provider-rendering/2026-05-11T11-14-02Z/codexrecordsfont_zoom_steps: 3,37x15, and370x300. The opposite direction,e2e/screenshots/outside-provider-rendering/2026-05-11T11-13-25Z/codex, recordsfont_zoom_steps: -3,64x25, and640x500. Zooming Windows Terminal toward Wardian's pixel footprint changes the effective character geometry before Codex renders, while zooming away from it increases both the reported text area and pixel area. - A wider Wardian stacked-card surface can match the native Windows Terminal text area exactly for deterministic content.
e2e/screenshots/terminal-geometry-sweep/2026-05-11T11-32-38-723Z/width-1130.jsonrecords xterm50x19,10x20cells, and a500x380screen rect. Provider cards require the taller real-provider surface captured ine2e/screenshots/real-provider-rendering/2026-05-11T11-38-32-112Z, where all four providers reported the same50x19/500x380metrics. Therefore, exact geometry parity is possible only when Wardian's audit surface is widened and its terminal font can use the measured fractional size; the normal default grid card still cannot simultaneously match Windows Terminal's 50-column wrapping and native default cell metrics.
- Providers are not always launched from identical prompt/session contexts. Codex can show different startup text, permissions/configuration warnings, or persisted prompt state between Wardian and a direct shell launch. Even with the corrected Codex outside invocation and the same projected
CODEX_HOME, Codex startup text is provider-owned and can vary between captures: the same Wardian/outside comparison can show different rotating tips and default prompt suggestions such as/fast, Codex App promotion,Summarize recent commits,Improve documentation in @filename, orFind and fix a bug in @filename. - OpenCode and Codex are especially sensitive to Wardian's habitat/config projection. OpenCode receives Wardian-generated
OPENCODE_CONFIG_DIR/OPENCODE_CONFIG, a habitat cwd, a projected workspace target, and on resumed sessions--session <session-id>; Codex receives a habitatCODEX_HOME. Plain outside-terminal provider launches remain useful user-visible evidence, but they are not launch-equivalent to Wardian unless the capture explicitly supplies the same Wardian home/session context. - Fresh outside same-session captures for the
2026-05-11T11-38-32-112ZWardian run prove the outside terminal can be placed at the same measured geometry for all providers:e2e/screenshots/outside-provider-rendering/2026-05-11T11-40-15Z/codex,2026-05-11T11-40-38Z/claude,2026-05-11T11-41-00Z/gemini, and2026-05-11T11-41-22Z/opencodeeach report RawUI50x19, ANSICSI 18 t -> 50x19, and ANSICSI 14 t -> 500x380. The same-session captures narrow the remaining differences to provider state and renderer behavior: Codex still rotates the default prompt suggestion, Claude still rotates prompt suggestions, Gemini exposes different scrollback/notice visibility, and OpenCode is the closest visual match once launched with Wardian's habitat session and config. - Fixed-input captures show which prompt-suggestion differences are controllable. Wardian Codex
e2e/screenshots/real-provider-rendering/2026-05-11T11-44-57-757Z/codex/resized-card.pngand outside Codexe2e/screenshots/outside-provider-rendering/2026-05-11T11-45-34Z/codex/resized.pngboth show the same Codex tip and the fixed inputrender parity check; Wardian Claudee2e/screenshots/real-provider-rendering/2026-05-11T11-46-15-038Z/claude/resized-card.pngand outside Claudee2e/screenshots/outside-provider-rendering/2026-05-11T11-49-44Z/claude/resized.pnglikewise align on the prompt row. Wardian OpenCodee2e/screenshots/real-provider-rendering/2026-05-11T11-48-59-525Z/opencode/resized-card.pngand outside OpenCodee2e/screenshots/outside-provider-rendering/2026-05-11T11-50-37Z/opencode/resized.pngalign on the same fixed input and visible assistant frame. These captures leave renderer/chrome differences as the primary visible delta for Codex, Claude, and OpenCode. - Gemini's fixed-input provider-state mismatch was traced to Wardian's fresh spawn path, not terminal geometry. Before the fix, Wardian called
obtain_session_id, which ran Gemini headlessly withIntroduce yourself, then launched the visible PTY with--resume; the visible terminal inherited that bootstrap conversation in scrollback. Gemini now uses a Wardian-generated session ID for fresh spawns and lets the visible PTY'sinitevent populateresume_session, so fresh interactive rendering no longer preloads the bootstrap transcript. The confirming Wardian rune2e/screenshots/real-provider-rendering/2026-05-11T12-01-02-691Z/gemini/scrolled-top.jsonrecordsviewportY: 0,50x19,500x380, and top-of-buffer text beginning at the Gemini logo/version and deprecated system-configuration warning instead ofIntroduce yourself. The matching outside capturee2e/screenshots/outside-provider-rendering/2026-05-11T12-04-14Z/geminirecords RawUI50x19, ANSICSI 18 t -> 50x19, and ANSICSI 14 t -> 500x380. - The real-provider Wardian audit now waits for provider-specific input readiness before injecting the fixed prompt, then waits until the fixed prompt is visible in xterm's parser rows before taking screenshots. This closed a Gemini capture artifact where the audit sent
render parity checkbefore Gemini's input widget was ready, causing Wardian to show the placeholder while Windows Terminal showed the typed prompt. The all-provider Wardian rune2e/screenshots/real-provider-rendering/2026-05-11T12-18-07-758Zrecords50x19and visible fixed input for Codex, Claude, Gemini, and OpenCode. The matching outside runse2e/screenshots/outside-provider-rendering/2026-05-11T12-19-25Z/codex,2026-05-11T12-19-50Z/claude,2026-05-11T12-20-14Z/gemini, and2026-05-11T12-20-38Z/opencodeall record RawUI50x19, ANSICSI 18 t -> 50x19, and ANSICSI 14 t -> 500x380. - In the fixed-input evidence from
2026-05-11T12-18-07-758Z, Claude, Gemini, and OpenCode match at the terminal text-content level under the audited state, but exact rendered appearance still differs because Wardian uses WebView/xterm rendering and app card chrome while Windows Terminal uses native chrome, font rasterization, cursor rendering, scrollbar rendering, and glyph handling. Gemini is a concrete example: both sides show the extension notice, update box, andrender parity check, but the information glyph, cursor, row placement inside the native window, and scrollbar/chrome treatment differ. Codex did not match at text-content level in that run because provider-owned rotating startup tips differed between Wardian and outside launches; the Wardian run2026-05-11T12-18-07-758Z/codex/resized-card.pngshows the/fasttip while the outside run2026-05-11T12-19-25Z/codex/resized.pngshows the Codex App tip. - Codex tip rotation is now controlled in the audit path rather than treated as terminal rendering evidence. The native Wardian audit passes
custom_args = "-c tui.show_tooltips=false"for Codex, and the outside Windows Terminal harness passes the same Codex config. The confirming Wardian rune2e/screenshots/real-provider-rendering/2026-05-11T12-34-29-300Z/codex/resized-card.pngrecords50x19,500x380, and no startup tip in parser rows; the matching outside capturee2e/screenshots/outside-provider-rendering/2026-05-11T12-38-21Z/codex/resized.pngrecords RawUI50x19, ANSICSI 18 t -> 50x19, ANSICSI 14 t -> 500x380, the same fixed input, and the same visible Codex startup frame without a rotating tip. This removes the Codex text-content mismatch for the audited state. It does not remove the remaining renderer/chrome deltas between WebView/xterm and Windows Terminal. - Codex's helper-alias warning is also provider-owned and can be introduced by the audit harness itself. The
2026-05-11T12-42-31-820ZWardian run usedWARDIAN_HOMEunder the OS temp directory. Codex's release startup path refuses to create helper aliases whenCODEX_HOMEis under that temp root, so the matching outside capturee2e/screenshots/outside-provider-rendering/2026-05-11T12-44-29Z/codex/resized.pngshowed aRefusing to create helper binaries under temporary dirwarning that did not appear in Wardian's parser rows. The real-provider audit now uses a repo-local ignoredtarget/wardian-e2e-real-provider-homeby default. The fresh Wardian rune2e/screenshots/real-provider-rendering/2026-05-11T12-51-34-329Zand outside capturese2e/screenshots/outside-provider-rendering/2026-05-11T12-52-57Z/codex,2026-05-11T12-53-22Z/claude,2026-05-11T12-53-46Z/gemini, and2026-05-11T12-54-10Z/opencodeall record50x19character geometry and500x380pixel text areas. Under the resized and scrolled-top audited states, Codex, Gemini, and OpenCode now match at the terminal text-content level. - Claude's apparent
medium · /efforttext-content mismatch was traced to capture timing, not Wardian inventing text or launching Claude with a different session context. Trace-enabled Wardian evidence ine2e/screenshots/real-provider-rendering/2026-05-11T13-26-19-563Zandtarget/wardian-e2e-real-provider-home/debug/terminal-traces/921360a7-16c1-48f5-9a7a-4561658f9231.logproves the row is provider-emitted on the PTY output stream after Claude receives Wardian'sCSI 1 ; 1 Rcursor-position response. Minified Claude 2.1.138 code shows this effort row is registered as a high-priorityeffort-levelnotification withtimeoutMs: 10000. The outside captures that omitted it waited 12 seconds before typing/capturing, so they landed after Claude removed the notification. The short-wait rerune2e/screenshots/outside-provider-rendering/2026-05-11T13-50-55Z/claudeused the same session, settings file,--add-dirroots,50x19character geometry,500x380pixel text area, and parseable outsideCSI 1 ; 1 R; a pixel diff against the long-wait capture isolates the new text to the footer rows where the effort notification renders. The outside harness now recordsinitial_wait_secondsinterminal-size.jsonandmanifest.jsonso transient provider rows can be compared against captures taken in the same startup window. - Claude follow-up captures still matter for launch-context parity: the outside harness mirrors Wardian hook/env setup without the earlier
Invalid JSON provided to --settingsartifact (e2e/screenshots/outside-provider-rendering/2026-05-11T13-06-42Z/claude), clears Windows Terminal identity env (2026-05-11T13-08-22Z/claude), types rather than pastes the audit input (2026-05-11T13-16-08Z/claude), and includes Wardian's common/per-agent--add-dirroots (2026-05-11T13-42-17Z/claude). A raw JSON--settingsoutside launch was tested and rejected because PowerShell/Windows Terminal producedInvalid JSON provided to --settings(e2e/screenshots/outside-provider-rendering/2026-05-11T13-44-39Z/claude); valid outside captures must keep using a settings file even though Wardian can pass raw JSON directly throughCommandBuilder. Two targeted Wardian DSR experiments were rejected and reverted: dropping Claude's DSR reply entirely and translating it to the observed outsideESC[[Cresponse both stalled Claude after its initialCSI 6 nquery (df2e3f3f-33b1-4a9a-9ed7-650e2bf87abc.logand0c5e78b9-04c3-4b36-a254-86d505aedcbeevidence). - The corrected OpenCode same-session outside capture
e2e/screenshots/outside-provider-rendering/2026-05-11T10-31-30Z/opencodereaches the same assistant response/input-panel state and matching 50-column text wrapping as Wardian'se2e/screenshots/real-provider-rendering/2026-05-11T10-25-48-165Z/opencode/resized-card.png. Wardian's parser state for that capture hasbaseY: 0,viewportY: 0, androws: 19, so the text is not hidden in scrollback. The remaining OpenCode delta is visual renderer behavior: Windows Terminal uses native chrome, font/glyph/cursor/scrollbar metrics, while Wardian uses the app xterm/WebView card styling. - The corrected Claude outside capture
e2e/screenshots/outside-provider-rendering/2026-05-11T10-54-07Z/clauderesolvesclaude.exe, launches with the same stream-json session id and name used by Wardian, and reaches the same named session title line as Wardian'se2e/screenshots/real-provider-rendering/2026-05-11T10-47-25-455Z/claude/resized-card.png. The remaining Claude differences are provider-owned prompt suggestions and renderer metrics: Wardian showedTry "fix typecheck errors"while Windows Terminal showedTry "how does <filepath> work?", and Windows Terminal still renders with native cell/chrome metrics while Wardian renders through xterm/WebView. - The corrected Gemini outside capture
e2e/screenshots/outside-provider-rendering/2026-05-11T11-06-33Z/geminiresolvesgemini.cmd, recordsCSI 18 t -> 50x19, and keeps the Windows Terminal tab title atWardianOutside-gemini-...despite child OSC title writes. Gemini still exposes provider/environment state differences: the outside capture visibly shows the extensions update notice, while Wardian may hide the same notice below or above the current xterm viewport depending on scroll position and redraw timing. - The Wardian real-provider audit can inject a fixed unsubmitted input string before capture. Outside Windows Terminal captures now type simple ASCII input through keystrokes and fall back to clipboard paste for more complex strings, but the outside capture manifest's
input_textremains only evidence of the requested capture condition; the screenshot or parser rows must still show whether the TUI accepted the text. - Wardian pause now keeps the selected agent card visible so the paused terminal state is captured as user-visible evidence, and the real-provider audit requires a card screenshot for every captured state. Earlier outside captures used
Ctrl+Cas the closest interruption analogue, but that was not an exact lifecycle match: Wardiane2e/screenshots/real-provider-rendering/2026-05-11T12-51-34-329Z/codex/paused-card.pngpreservedrender parity check, while outsidee2e/screenshots/outside-provider-rendering/2026-05-11T12-52-57Z/codex/interrupted.pngchanged the prompt row to Codex's post-interrupt suggestion text. The outside harness now capturespaused.pngbefore sending Ctrl+C, so preserved-buffer pause parity can be evaluated separately from provider-owned interruption behavior. The focused option test verifies the newinitial,resized,scrolled-top,paused, andinterruptedmanifest sequence, and the valid outside Claude rune2e/screenshots/outside-provider-rendering/2026-05-11T14-03-31Z/clauderecords the paused/interrupted split with matched50x19/500x380geometry and an unchangedscrolled-toptopausedvisible buffer. The Codex and OpenCode reruns against the older2026-05-11T12-51-34-329ZWardian home are invalid because that home no longer contains their projected provider habitats, so they are retained only as negative evidence for stale-session capture risk. - The fresh all-provider paused evidence is
e2e/screenshots/real-provider-rendering/2026-05-11T14-12-45-718Zplus outside capturese2e/screenshots/outside-provider-rendering/2026-05-11T14-15-13Z/codex,2026-05-11T14-15-30Z/claude,2026-05-11T14-15-46Z/gemini, and2026-05-11T14-16-02Z/opencode. These outside captures use the fresh Wardian session ids and repo-localtarget/wardian-e2e-real-provider-home, and their RawUI probes report35x19, matching Wardian's post-resize paused parser geometry. Pixel comparisons showscrolled-toptopausedis unchanged except cursor/scrollbar slivers or provider redraw noise, whilepausedtointerruptedis a separate provider interruption artifact. They also reveal a Windows Terminal limitation at this narrow resize target: the ANSI probe reports the physical terminal area rather than the forced RawUI size (103x32under the forced980x680pixel window, and48x19when launched without a forced pixel size) and returnsESC[[Crather than a parseableCSI row ; col Rcursor-position response. Claude therefore does not show the same footermedium · /effortrow at the35x19/48x19outside sizes, while the earlier50x19short-wait capture does. The fresh run also exposed and fixed an audit-harness instability: after one provider's resized capture, the next provider could spawn at the narrower980x680window and wrap Gemini's ready prompt across rows. The audit now resets the window before each provider spawn and normalizes whitespace when waiting for provider readiness text, so row wrapping does not masquerade as a missing prompt. - The stable all-provider comparison surface is
e2e/screenshots/real-provider-rendering/2026-05-11T18-38-33-479Zplus outside capturese2e/screenshots/outside-provider-rendering/2026-05-11T18-40-41Z/codex,2026-05-11T18-41-08Z/claude,2026-05-11T18-41-33Z/gemini, and2026-05-11T18-41-58Z/opencode. This run keeps Wardian at50x19/500x380for initial, resized, scrolled-top, and paused states by resizing the app height without changing the terminal text geometry, waits12000msafter injectingrender parity check, and uses the repo-localtarget/wardian-e2e-real-provider-home-50-stable-allhome. The outside captures use the fresh Wardian session ids,font_zoom_steps: 3,initial_wait_seconds: 12, and report RawUI50x19, ANSICSI 18 t -> 50x19, ANSICSI 14 t -> 500x380, andCSI 1 ; 1 Rfor every provider. Under that stable50x19surface, Codex, Claude, Gemini, and OpenCode show the same visible provider text in the resized state; renderer and chrome differences remain expected because Wardian uses WebView/xterm inside an agent card while the outside captures use native Windows Terminal chrome and rasterization. The outside pixel comparisons also showscrolled-toptopausedis unchanged except cursor/scrollbar slivers or provider redraw noise, whilepausedtointerruptedremains a separate Ctrl+C provider artifact. auditRenderingEvidenceine2e-native/lib/rendering-audit.mjsnow verifies this stable all-provider evidence chain mechanically. The verifier reads the Wardian manifest, each outside manifest, screenshot paths, PowerShell RawUI probes, ANSICSI 14 t/CSI 18 t/CSI 6 nprobes, session ids, Wardian home paths, audit input text, and Wardian parser rows. Against the stable evidence above it reported 221 checks and 0 failures: all four providers have matching session ids, matching50x19character geometry, matching500x380text-area geometry, parseable outside cursor-position replies, required initial/resized/scrolled-top/paused/interrupted screenshots, and Wardian paused parser rows identical to the scrolled-top parser rows. The verifier can require outside text snapshots and can compare selected outside copied-text states against Wardian parser rows after trimming trailing cell padding and leading/trailing blank rows. This verifier intentionally does not OCR Windows Terminal screenshots, so screenshots remain the visual evidence for chrome, cursor, scrollbar, and rasterization differences.- Text-snapshot reruns exposed and then closed the Claude scrolled-top text mismatch. Fresh Codex outside capture
e2e/screenshots/outside-provider-rendering/2026-05-11T18-53-51Z/codexcopied resized/scrolled-top/paused text that matches Wardian parser rows exactly after normalizing leading/trailing blank rows. Fresh Claude captures first showedClaude Code v2.1.138pluscurrent: 2.1.138 · latest: 2.1.139outside while Wardian was already renderingv2.1.139; rerunning after the Claude self-update completed made resized text match, but Wardian2026-05-11T18-59-08-927Z/claude/scrolled-top-card.pngstill exposed a stale topClaude Codebox border through xterm scrollback. The root cause was a provider clear-by-newlines redraw sequence split across PTY chunks: per-chunk normalization missed it, so xterm treated the clear as literal newlines and promoted a transient Claude welcome row into scrollback. Wardian now normalizes fullscreen clear-by-newlines at the joined PTY batch boundary. The confirming Wardian rune2e/screenshots/real-provider-rendering/2026-05-11T19-26-18-946Zrecords ClaudebaseY: 0for initial, resized, scrolled-top, and paused states. The matching outside capturee2e/screenshots/outside-provider-rendering/2026-05-11T19-26-57Z/claudestarts at the same65x21text geometry and resizes to the same36x19wrapping surface; copied text for initial, resized, scrolled-top, and paused matches Wardian parser rows exactly after trimming trailing cell padding and leading/trailing blank rows. The outside capture script also now widens the PowerShell RawUI buffer before wideningWindowSize, avoiding theWindow cannot be wider than the screen bufferartifact when launching wider initial captures. - A fresh all-provider stable run after the batch-clear fix is
e2e/screenshots/real-provider-rendering/2026-05-11T19-34-01-431Z, using absolute repo-localWARDIAN_HOMEtarget/wardian-e2e-real-provider-home-stable-all-text-abs. A prior run with relativeWARDIAN_E2E_REAL_RENDERING_HOMEfailed OpenCode before rendering because its projected habitat workspace path was resolved under the agent habitat; all real-provider rendering runs that include OpenCode should use an absolute test home. Matching outside captures with the correct home aree2e/screenshots/outside-provider-rendering/2026-05-11T19-38-45Z/codex,2026-05-11T19-39-20Z/claude,2026-05-11T19-39-55Z/gemini, and2026-05-11T19-40-29Z/opencode. Codex and Claude copied text matches Wardian parser rows for initial/resized/scrolled-top/paused under this stable50x19surface. Gemini copied text includes whole-buffer scrollback and soft-wrapped logical rows, soauditRenderingEvidencenow has an explicit visual-row comparison path: it wraps outside copied text to the audited column count and anchors to the bottom viewport for initial/resized captures or the top viewport for scrolled-top/paused captures. Against2026-05-11T19-34-01-431Zplus outside2026-05-11T19-39-55Z/gemini, this visual-row verifier reports 70 checks and 0 failures for initial, resized, scrolled-top, and paused states. A mouse-drag visible-selection experiment was rejected because Windows Terminal selection copied stale or incorrect text and disturbed provider state. - OpenCode's remaining text mismatch was provider lifecycle nondeterminism, not xterm rendering. Wardian previously obtained an OpenCode session id through a headless bootstrap path, while the outside capture launched or reattached through a different OpenCode lifecycle and exposed bootstrap/random home rows in scrollback. OpenCode now uses a Wardian-generated id for fresh visible PTY startup, like Gemini, and the outside capture only passes
--sessionwhen the captured id is a realses_...OpenCode id. The audit path also seeds an isolatedXDG_STATE_HOMEunder the run'sWARDIAN_HOMEwithtips_hidden: true, suppressing OpenCode's provider-owned random home tip without changing the user's global OpenCode state. The confirming Wardian run ise2e/screenshots/real-provider-rendering/2026-05-11T20-15-17-992Zwith outside capturee2e/screenshots/outside-provider-rendering/2026-05-11T20-16-08Z/opencode; exact copied-text comparison for initial, resized, scrolled-top, and paused reports 70 checks and 0 failures. - Provider text parity is now mechanically verified across the stable evidence set. Codex uses Wardian
2026-05-11T19-34-01-431Zplus outside2026-05-11T19-38-45Z/codex, Claude uses Wardian2026-05-11T19-34-01-431Zplus outside2026-05-11T19-39-20Z/claude, Gemini uses Wardian2026-05-11T19-34-01-431Zplus outside2026-05-11T19-39-55Z/geminiwith visual-row normalization, and OpenCode uses Wardian2026-05-11T20-15-17-992Zplus outside2026-05-11T20-16-08Z/opencode. Each provider reports 70 audit checks and 0 failures over initial, resized, scrolled-top, and paused states, including matching session ids,50x19character geometry,500x380text-area geometry, required screenshots/text snapshots, matching audit input, and matching Wardian paused parser rows. - Wardian's native audit no longer treats
.xterm-viewport.scrollTop = 0as proof that a scrolled screenshot moved. With the WebGL renderer and Wardian's current xterm styling, DOMscrollHeightcan equalclientHeighteven while xterm's parser has scrollback. The debug-only audit hook now calls xterm's own scroll-to-top API on the parser and renderer and waits forviewportY: 0before capturingscrolled-top. Fresh Gemini evidence ine2e/screenshots/real-provider-rendering/2026-05-11T10-57-03-613Z/gemini/scrolled-top.jsonrecordsviewportY: 0and the paired card screenshot visibly shows the top-of-buffer frame rather than repeating the resized bottom viewport. - The deterministic Wardian rendering audit now uses the same debug scroll API and compares Wardian parser rows against a headless xterm rendered at the same
viewportY. This prevents a passing scroll assertion from relying on DOM scroll state that may not correspond to the rendered terminal viewport.
Consequences
- Positive: scrolling and rendering quality should improve across all providers, not just OpenCode
- Positive: terminal handling becomes less provider-specific and easier to maintain
- Positive: remount behavior becomes deterministic and resilient
- Positive: OpenCode theme/capability debugging can move onto a cleaner terminal foundation
- Negative: the terminal stack becomes more structured and therefore more code moves through shared abstractions
- Negative: parser/view divergence must be prevented by keeping resize and capability handling synchronized
- Negative: full app-restart persistence remains intentionally out of scope for this pass