Terminal Runtime Hardening

Status: Proposed
Date: 2026-04-09
Decider: Architect

Context and Problem Statement

Wardian's terminal layer is now carrying too much provider-specific behavior and still has visible quality issues across providers:

scrolling performance degrades under heavy output
remounts rely on ad hoc in-memory preservation instead of a clear replay model
terminal capability handling is partly provider-specific, especially for OpenCode
theme and color behavior remain inconsistent because Wardian does not yet present a sufficiently complete terminal capability surface

Wardian already uses the correct broad primitives:

portable-pty on the Rust side as the cross-platform PTY layer
xterm.js on the frontend as the terminal emulator

So the problem is not that Wardian chose the wrong base stack. The problem is that the runtime contract between PTY, transport, emulator, and renderer is still too thin and too ad hoc.

The current implementation also risks long-term technical debt by continuing to patch provider-specific terminal quirks in AgentTerminal.tsx.

Decision

Wardian will harden the terminal runtime in a first pass focused on in-app correctness and rendering quality, while explicitly avoiding a full app-restart persistence layer.

This pass will:

improve rendering correctness during resize and remount
preserve terminal state across UI remounts within a running app session
centralize terminal capability emulation into a provider-neutral layer
reduce provider-specific terminal branches
improve PTY output buffering so large output bursts do not degrade scrolling as severely

Architecture

1. Keep the Current Base Stack

Wardian will keep:

portable-pty for backend PTY management
xterm.js for frontend terminal emulation

This is intentionally closer to VS Code's architectural shape without trying to reproduce Electron-specific infrastructure directly.

Wardian does not need to replace portable-pty with another PTY abstraction. The equivalent of VS Code's node-pty is already present in the current Rust/Tauri architecture.

2. Separate Parsed Terminal State from Mounted Renderer State

Wardian will not treat the mounted xterm renderer as the source of truth.

The session model will be split into:

a detached parser terminal that continuously receives PTY output
a mounted view terminal used only for visible rendering and input

When a view remounts, Wardian should reuse the live renderer if it is still valid. If the renderer must be recreated, Wardian restores it from serialized parser state instead of replaying raw PTY chunks directly into a fresh xterm view.

This keeps renderer lifecycle bugs from becoming terminal-state bugs and is closer to the VS Code split between terminal state ownership and terminal rendering.

Mounted terminal views should prefer xterm's WebGL renderer when available. WebGL enables xterm's custom glyph path for block and box-drawing characters, which is required for provider TUIs that render pixel-art/status UI with block glyphs. If WebGL initialization or context retention fails, the terminal must fall back to the built-in DOM renderer without breaking the session.

3. Introduce a Terminal Capability Broker

Wardian will replace most provider-specific terminal query/reply handling with a capability broker that owns terminal emulation for standard terminal queries and responses.

This broker will be responsible for:

device status reports / cursor position replies
terminal pixel-size and resize replies
DECRQM handling
palette and standard color queries
focus in/out handling
synchronized output toggles
other standard capability negotiations that providers expect from a modern terminal

The broker should support at least:

OSC palette handling already needed by OpenCode
foreground/background color queries such as OSC 10/11 if present in real traces
normalization for terminal redraw patterns that are standard escape-sequence compositions but produce poor scrollback in embedded xterm views

Provider-specific logic should only remain where a provider genuinely departs from standard terminal behavior.

4. Add Explicit In-Memory Replay Ownership

Wardian will keep terminal replay only for the lifetime of the running app process in this pass.

That means:

terminal state survives pane remounts, layout changes, and view switches
terminal state does not yet survive full app restart

Replay ownership should become explicit and parser-owned:

the detached parser terminal is the canonical in-app state owner
mounted terminal views are reattached when possible and restored from serialized parser state only when recreation is required
remounting should not depend on raw PTY replay
remounting should not rely on replaying raw PTY chunks into a brand-new renderer

This is meant to make remount behavior deterministic and easier to debug.

5. Improve PTY Output Transport and Buffering

Wardian's PTY transport should be tightened so the frontend is not overly dependent on repeated tiny poll/drain cycles.

The first pass should improve:

batching of PTY output chunks
replay-friendly buffering
resistance to output bursts that currently degrade scrolling and repaint behavior
UTF-8 decoding across PTY read boundaries

This should remain compatible with the current Tauri command/event model, but the data path should become more deliberate and less fragile.

PTY reads are byte streams, not character streams. The backend must preserve an incomplete UTF-8 sequence between reads instead of applying lossy per-chunk decoding. Otherwise a multi-byte glyph split by portable-pty can become replacement characters before xterm sees it, producing visible Wardian-only rendering differences even when the provider emits valid terminal output.

6. Normalize Home-Redraw TUI Scrollback

Several provider TUIs redraw by moving the cursor home and repainting the screen. In a compact embedded terminal, two patterns need explicit handling:

Some TUIs clear by writing many EL + newline sequences and then homing the cursor. Wardian should normalize that to a clear-and-home operation so resize redraws do not become duplicated scrollback.
Some synchronized-output TUIs repaint from cursor-home while leaving the cursor near the bottom of the screen. Before row-shrinking resizes, Wardian should locally home the parser and renderer cursors so xterm does not promote the old transient frame into scrollback before the provider redraws.
After a resize, a synchronized home-redraw that is mostly already present in the parser buffer should be treated as a duplicate repaint and dropped. This prevents long transcript redraws from being appended as new history when the provider is only repainting for the new geometry.
Codex's inline TUI can emit a sliding home-redraw viewport. Wardian should run Codex in its documented --no-alt-screen mode and reconstruct dropped overlapping frame lines into xterm scrollback so users can scroll through prior output.

The Codex frame journal is intentionally provider-scoped because applying the same reconstruction to every home-redraw TUI corrupts Claude's mascot/status rendering.

Scope

Included

detached parser-terminal plus mounted view-terminal lifecycle
provider-neutral terminal capability broker
in-memory replay across UI remounts within a running app
PTY buffering improvements for smoother scrolling
targeted native tests for PTY/runtime behavior

Excluded

full restart persistence like VS Code's headless replay across app relaunch
replacing portable-pty
a full dedicated external PTY host process
provider-specific terminal customization beyond what is required to bridge genuinely non-standard behavior

Testing Strategy

This work cannot be validated by browser-only Playwright.

The required evidence for terminal claims is:

frontend unit coverage for replay/capability handling
backend Rust tests for PTY/runtime buffer logic where applicable
native Tauri runtime E2E for real PTY behavior
native rendering parity tests that compare Wardian's parser rows against a headless xterm fed the same byte-equivalent frame
real-provider native validation when a provider-specific terminal behavior is involved

Browser smoke tests remain useful for layout regressions, but they are not sufficient evidence for terminal correctness.

The current audit tooling is split by evidence type:

e2e-native/tests/terminal-rendering-native.test.mjs proves deterministic mock-provider rendering parity against headless xterm, including split UTF-8, resize, scroll, and pause.
e2e-native/tests/real-provider-rendering-native.test.mjs is opt-in and captures exact Wardian-rendered screenshots plus parser rows for Codex, Claude, Gemini, and OpenCode. It writes local artifacts under e2e/screenshots/real-provider-rendering/ and records the isolated WARDIAN_HOME plus provider config override in the run manifest. For real-provider rendering audits, the test defaults WARDIAN_HOME to target/wardian-e2e-real-provider-home when the caller has not set it. This keeps the run isolated without placing Codex's projected CODEX_HOME under the OS temp directory, which Codex release builds warn about before drawing the TUI.
Windows-specific outside-terminal capture is available through scripts/capture-outside-provider-rendering.ps1. It captures Windows Terminal screenshots under e2e/screenshots/outside-provider-rendering/ for initial, resized, scrolled, paused, and interrupted states, forces TERM=xterm-256color and COLORTERM=truecolor, records the provider invocation, and writes a terminal-size.json RawUI probe plus a terminal-ansi-query.json ANSI probe so the reported character and pixel geometry can be compared with Wardian's parser rows and DOM rectangles. The ANSI probe records CSI 6 n, CSI 14 t, and CSI 18 t responses so cursor-position handshakes can be compared alongside pixel and character geometry. When columns and rows are supplied, it also passes Windows Terminal's documented --size <columns>,<rows> launch option before the shell command so the native terminal starts at the intended character geometry. It also launches with --suppressApplicationTitle so child OSC title writes do not replace the outside capture's tab title; this makes the outside chrome closer to Wardian's stable agent-card title. The script accepts -FontZoomSteps and sends Windows Terminal zoom keys before provider startup, then records font_zoom_steps in both the size probe and manifest. It also records initial_wait_seconds so transient provider rows, such as Claude's startup effort notification, can be compared against captures taken in the same startup window. The script still gates provider startup until the parent process has resized and zoomed the native Windows Terminal window, because launching the provider before those adjustments causes providers to receive Windows Terminal's default CSI 14 t / CSI 18 t answers. The scrolled-top state now sends Ctrl+Home, Windows Terminal's Ctrl+Shift+Home scroll-to-top chord, repeated Ctrl+Shift+PageUp / Ctrl+Shift+Up, and repeated mouse-wheel input over the terminal content, because some provider TUIs keep keyboard focus in their input widget and do not let a single keyboard shortcut move Windows Terminal's visible scrollback. After each screenshot, the script also uses Windows Terminal select-all/copy to write initial.txt, resized.txt, scrolled-top.txt, paused.txt, and interrupted.txt machine-readable text snapshots. Those snapshots are useful for text comparison, but they can include scrollback beyond the visible viewport and therefore are not a replacement for the screenshot when proving exact visible state. The paused state captures the unchanged visible buffer before any interrupt input, matching Wardian pause's preserved-buffer behavior; the interrupted state remains a separate Ctrl+C artifact. Simple ASCII audit input is typed through Windows Terminal keystrokes instead of pasted, so Claude/Codex captures avoid bracketed-paste redraw differences; clipboard paste remains the fallback for text outside the safe SendKeys subset. When a Wardian OpenCode session id is supplied, the outside capture mirrors Wardian's interactive launch shape with opencode --session <session-id> <habitat-workspace>. When a Wardian Codex session id is supplied, the outside capture mirrors Wardian's interactive launch with the projected CODEX_HOME, resolves codex.cmd when available, and uses the same current core args Wardian logs on Windows: -c windows.sandbox="unelevated" --dangerously-bypass-approvals-and-sandbox --no-alt-screen --cd <workspace>. Codex outside captures also pass -c tui.show_tooltips=false to suppress provider-owned rotating startup tips during parity audits. The same outside harness now resolves claude.exe for Claude captures, accepts the Wardian session name, sets WARDIAN_SESSION_ID and CLAUDE_CODE_ADDITIONAL_DIRECTORIES_CLAUDE_MD=1, writes the Wardian permission hook settings under the supplied Wardian home, passes those settings with --settings <settings-file>, includes Wardian's common and per-agent --add-dir roots when they exist, and launches with Wardian's --verbose --input-format stream-json --output-format stream-json --session-id <session-id> --name <session-name> shape. It also clears Windows Terminal's WT_SESSION and WT_PROFILE_ID before provider startup because Wardian's PTY child does not receive those identity variables in the audit harness. Gemini captures resolve gemini.cmd when available so the outside launch uses the same Windows command shim Wardian uses instead of PowerShell's .ps1 shim.
Windows-specific deterministic outside-frame capture is available through scripts/capture-outside-terminal-frame.ps1. It renders the same text frame used by the native mock-provider rendering test in Windows Terminal, launches with --size <columns>,<rows> when geometry is supplied, accepts -FontZoomSteps, gates frame rendering until parent-side resize/zoom is complete, and writes matching terminal-size.json and terminal-ansi-query.json probes, so renderer differences can be inspected without provider startup/session noise. The capture e2e/screenshots/outside-terminal-frame/2026-05-11T11-04-22Z confirms a 50x19 Windows Terminal launch without a forced pixel window size. The capture e2e/screenshots/outside-terminal-frame/2026-05-11T11-20-56Z confirms deterministic frame zoom evidence with font_zoom_steps: 2, CSI 18 t -> 41x16, and CSI 14 t -> 410x320.
Wardian-specific deterministic geometry sweep is available through e2e-native/tests/terminal-geometry-sweep-native.test.mjs with WARDIAN_E2E_TERMINAL_GEOMETRY_SWEEP=1. It uses the mock provider, forces a stacked grid, captures real Wardian WebView screenshots, and records xterm debug metrics across configurable app widths, font settings, and row heights. The sweep e2e/screenshots/terminal-geometry-sweep/2026-05-11T11-32-38-723Z proves that Wardian can reproduce Windows Terminal's default 50x19 / 500x380 text area in a wider stacked card: app window 1130x720, grid row height 480, and Cascadia Mono, Consolas, monospace at 17.5px produce xterm cols: 50, rows: 19, cssCellWidth: 10, cssCellHeight: 20, and screenRect: 500x380. Real provider cards need the same width and font profile but a taller row height because their card chrome consumes more vertical space. The real-provider run e2e/screenshots/real-provider-rendering/2026-05-11T14-12-45-718Z used app window 1130x760, grid row height 520, and the same 17.5px Cascadia stack; Codex, Claude, Gemini, and OpenCode all reported 50x19, 10x20 cells, and screenRect: 500x380 for initial and settled states. The same run then intentionally resized the app to 980x680, producing 35x19, 10x20 cells, and a 350x380 screen rect for resized, scrolled-top, and paused states. This is not the normal default grid-card geometry; it is an explicit parity surface for auditing.

Real-provider sign-off is not complete until Wardian and outside captures are made under comparable terminal geometry, theme, launch context, prompt state, and lifecycle state. The current capture tools make mismatches visible; they do not by themselves prove parity.

The current real-provider audit has proven that Wardian and Windows Terminal can be captured with matching 50x19 character geometry, but exact visual parity is still not achieved. The remaining differences are explicit and reproducible:

Wardian's normal grid card does not match Windows Terminal's default text-area geometry. The audit surface now proves Wardian can match it, but only by widening the stacked card and increasing the provider card row height. Outer window dimensions remain an unreliable proxy for terminal geometry: the evidence must use xterm debug metrics in Wardian and ANSI CSI 14 t / CSI 18 t probes outside Wardian.
Wardian uses the app terminal theme, font stack, xterm Unicode handling, and WebView rendering pipeline; Windows Terminal uses its configured native theme, font, glyph shaping, cursor, and scrollbar behavior. Fresh Wardian Codex evidence in e2e/screenshots/real-provider-rendering/2026-05-11T10-42-27-332Z/codex/resized.json records xterm renderer metrics of fontFamily: Consolas, "Courier New", monospace, fontSize: 14, cssCellWidth: 7, and cssCellHeight: 17. The matching outside Windows Terminal run e2e/screenshots/outside-provider-rendering/2026-05-11T10-43-34Z/codex records CSI 18 t -> 50x19 and CSI 14 t -> 500x380, which is effectively a 10x20 text area. This explains why matching 50x19 fixes wrapping but not visual parity.
The Wardian real-provider audit can now vary terminal font settings through WARDIAN_E2E_TERMINAL_FONT_SIZE and WARDIAN_E2E_TERMINAL_FONT_FAMILY, and records the selected font profile in its manifest. Two Codex probes make the current constraint explicit:
- e2e/screenshots/real-provider-rendering/2026-05-11T11-09-00-789Z/codex/resized.json uses Cascadia Mono, Consolas, monospace at 12px. It preserves 50 columns, but xterm still reports a 7x14 cell, far from Windows Terminal's ~10x20.
- e2e/screenshots/real-provider-rendering/2026-05-11T11-09-39-426Z/codex/resized.json uses the same family at 16px. It moves xterm to 9x19, much closer to Windows Terminal's native cell metrics, but the 400px card can only fit 39x17, so text wrapping no longer matches the outside 50x19 capture.
- Outside Windows Terminal font-zoom probes show the same tradeoff from the other side. e2e/screenshots/outside-provider-rendering/2026-05-11T11-15-02Z/codex records font_zoom_steps: 1, CSI 18 t -> 45x18, and CSI 14 t -> 450x360; e2e/screenshots/outside-provider-rendering/2026-05-11T11-14-35Z/codex records font_zoom_steps: 2, 41x16, and 410x320; e2e/screenshots/outside-provider-rendering/2026-05-11T11-14-02Z/codex records font_zoom_steps: 3, 37x15, and 370x300. The opposite direction, e2e/screenshots/outside-provider-rendering/2026-05-11T11-13-25Z/codex, records font_zoom_steps: -3, 64x25, and 640x500. Zooming Windows Terminal toward Wardian's pixel footprint changes the effective character geometry before Codex renders, while zooming away from it increases both the reported text area and pixel area.
- A wider Wardian stacked-card surface can match the native Windows Terminal text area exactly for deterministic content. e2e/screenshots/terminal-geometry-sweep/2026-05-11T11-32-38-723Z/width-1130.json records xterm 50x19, 10x20 cells, and a 500x380 screen rect. Provider cards require the taller real-provider surface captured in e2e/screenshots/real-provider-rendering/2026-05-11T11-38-32-112Z, where all four providers reported the same 50x19 / 500x380 metrics. Therefore, exact geometry parity is possible only when Wardian's audit surface is widened and its terminal font can use the measured fractional size; the normal default grid card still cannot simultaneously match Windows Terminal's 50-column wrapping and native default cell metrics.
Providers are not always launched from identical prompt/session contexts. Codex can show different startup text, permissions/configuration warnings, or persisted prompt state between Wardian and a direct shell launch. Even with the corrected Codex outside invocation and the same projected CODEX_HOME, Codex startup text is provider-owned and can vary between captures: the same Wardian/outside comparison can show different rotating tips and default prompt suggestions such as /fast, Codex App promotion, Summarize recent commits, Improve documentation in @filename, or Find and fix a bug in @filename.
OpenCode and Codex are especially sensitive to Wardian's habitat/config projection. OpenCode receives Wardian-generated OPENCODE_CONFIG_DIR / OPENCODE_CONFIG, a habitat cwd, a projected workspace target, and on resumed sessions --session <session-id>; Codex receives a habitat CODEX_HOME. Plain outside-terminal provider launches remain useful user-visible evidence, but they are not launch-equivalent to Wardian unless the capture explicitly supplies the same Wardian home/session context.
Fresh outside same-session captures for the 2026-05-11T11-38-32-112Z Wardian run prove the outside terminal can be placed at the same measured geometry for all providers: e2e/screenshots/outside-provider-rendering/2026-05-11T11-40-15Z/codex, 2026-05-11T11-40-38Z/claude, 2026-05-11T11-41-00Z/gemini, and 2026-05-11T11-41-22Z/opencode each report RawUI 50x19, ANSI CSI 18 t -> 50x19, and ANSI CSI 14 t -> 500x380. The same-session captures narrow the remaining differences to provider state and renderer behavior: Codex still rotates the default prompt suggestion, Claude still rotates prompt suggestions, Gemini exposes different scrollback/notice visibility, and OpenCode is the closest visual match once launched with Wardian's habitat session and config.
Fixed-input captures show which prompt-suggestion differences are controllable. Wardian Codex e2e/screenshots/real-provider-rendering/2026-05-11T11-44-57-757Z/codex/resized-card.png and outside Codex e2e/screenshots/outside-provider-rendering/2026-05-11T11-45-34Z/codex/resized.png both show the same Codex tip and the fixed input render parity check; Wardian Claude e2e/screenshots/real-provider-rendering/2026-05-11T11-46-15-038Z/claude/resized-card.png and outside Claude e2e/screenshots/outside-provider-rendering/2026-05-11T11-49-44Z/claude/resized.png likewise align on the prompt row. Wardian OpenCode e2e/screenshots/real-provider-rendering/2026-05-11T11-48-59-525Z/opencode/resized-card.png and outside OpenCode e2e/screenshots/outside-provider-rendering/2026-05-11T11-50-37Z/opencode/resized.png align on the same fixed input and visible assistant frame. These captures leave renderer/chrome differences as the primary visible delta for Codex, Claude, and OpenCode.
Gemini's fixed-input provider-state mismatch was traced to Wardian's fresh spawn path, not terminal geometry. Before the fix, Wardian called obtain_session_id, which ran Gemini headlessly with Introduce yourself, then launched the visible PTY with --resume; the visible terminal inherited that bootstrap conversation in scrollback. Gemini now uses a Wardian-generated session ID for fresh spawns and lets the visible PTY's init event populate resume_session, so fresh interactive rendering no longer preloads the bootstrap transcript. The confirming Wardian run e2e/screenshots/real-provider-rendering/2026-05-11T12-01-02-691Z/gemini/scrolled-top.json records viewportY: 0, 50x19, 500x380, and top-of-buffer text beginning at the Gemini logo/version and deprecated system-configuration warning instead of Introduce yourself. The matching outside capture e2e/screenshots/outside-provider-rendering/2026-05-11T12-04-14Z/gemini records RawUI 50x19, ANSI CSI 18 t -> 50x19, and ANSI CSI 14 t -> 500x380.
The real-provider Wardian audit now waits for provider-specific input readiness before injecting the fixed prompt, then waits until the fixed prompt is visible in xterm's parser rows before taking screenshots. This closed a Gemini capture artifact where the audit sent render parity check before Gemini's input widget was ready, causing Wardian to show the placeholder while Windows Terminal showed the typed prompt. The all-provider Wardian run e2e/screenshots/real-provider-rendering/2026-05-11T12-18-07-758Z records 50x19 and visible fixed input for Codex, Claude, Gemini, and OpenCode. The matching outside runs e2e/screenshots/outside-provider-rendering/2026-05-11T12-19-25Z/codex, 2026-05-11T12-19-50Z/claude, 2026-05-11T12-20-14Z/gemini, and 2026-05-11T12-20-38Z/opencode all record RawUI 50x19, ANSI CSI 18 t -> 50x19, and ANSI CSI 14 t -> 500x380.
In the fixed-input evidence from 2026-05-11T12-18-07-758Z, Claude, Gemini, and OpenCode match at the terminal text-content level under the audited state, but exact rendered appearance still differs because Wardian uses WebView/xterm rendering and app card chrome while Windows Terminal uses native chrome, font rasterization, cursor rendering, scrollbar rendering, and glyph handling. Gemini is a concrete example: both sides show the extension notice, update box, and render parity check, but the information glyph, cursor, row placement inside the native window, and scrollbar/chrome treatment differ. Codex did not match at text-content level in that run because provider-owned rotating startup tips differed between Wardian and outside launches; the Wardian run 2026-05-11T12-18-07-758Z/codex/resized-card.png shows the /fast tip while the outside run 2026-05-11T12-19-25Z/codex/resized.png shows the Codex App tip.
Codex tip rotation is now controlled in the audit path rather than treated as terminal rendering evidence. The native Wardian audit passes custom_args = "-c tui.show_tooltips=false" for Codex, and the outside Windows Terminal harness passes the same Codex config. The confirming Wardian run e2e/screenshots/real-provider-rendering/2026-05-11T12-34-29-300Z/codex/resized-card.png records 50x19, 500x380, and no startup tip in parser rows; the matching outside capture e2e/screenshots/outside-provider-rendering/2026-05-11T12-38-21Z/codex/resized.png records RawUI 50x19, ANSI CSI 18 t -> 50x19, ANSI CSI 14 t -> 500x380, the same fixed input, and the same visible Codex startup frame without a rotating tip. This removes the Codex text-content mismatch for the audited state. It does not remove the remaining renderer/chrome deltas between WebView/xterm and Windows Terminal.
Codex's helper-alias warning is also provider-owned and can be introduced by the audit harness itself. The 2026-05-11T12-42-31-820Z Wardian run used WARDIAN_HOME under the OS temp directory. Codex's release startup path refuses to create helper aliases when CODEX_HOME is under that temp root, so the matching outside capture e2e/screenshots/outside-provider-rendering/2026-05-11T12-44-29Z/codex/resized.png showed a Refusing to create helper binaries under temporary dir warning that did not appear in Wardian's parser rows. The real-provider audit now uses a repo-local ignored target/wardian-e2e-real-provider-home by default. The fresh Wardian run e2e/screenshots/real-provider-rendering/2026-05-11T12-51-34-329Z and outside captures e2e/screenshots/outside-provider-rendering/2026-05-11T12-52-57Z/codex, 2026-05-11T12-53-22Z/claude, 2026-05-11T12-53-46Z/gemini, and 2026-05-11T12-54-10Z/opencode all record 50x19 character geometry and 500x380 pixel text areas. Under the resized and scrolled-top audited states, Codex, Gemini, and OpenCode now match at the terminal text-content level.
Claude's apparent medium · /effort text-content mismatch was traced to capture timing, not Wardian inventing text or launching Claude with a different session context. Trace-enabled Wardian evidence in e2e/screenshots/real-provider-rendering/2026-05-11T13-26-19-563Z and target/wardian-e2e-real-provider-home/debug/terminal-traces/921360a7-16c1-48f5-9a7a-4561658f9231.log proves the row is provider-emitted on the PTY output stream after Claude receives Wardian's CSI 1 ; 1 R cursor-position response. Minified Claude 2.1.138 code shows this effort row is registered as a high-priority effort-level notification with timeoutMs: 10000. The outside captures that omitted it waited 12 seconds before typing/capturing, so they landed after Claude removed the notification. The short-wait rerun e2e/screenshots/outside-provider-rendering/2026-05-11T13-50-55Z/claude used the same session, settings file, --add-dir roots, 50x19 character geometry, 500x380 pixel text area, and parseable outside CSI 1 ; 1 R; a pixel diff against the long-wait capture isolates the new text to the footer rows where the effort notification renders. The outside harness now records initial_wait_seconds in terminal-size.json and manifest.json so transient provider rows can be compared against captures taken in the same startup window.
Claude follow-up captures still matter for launch-context parity: the outside harness mirrors Wardian hook/env setup without the earlier Invalid JSON provided to --settings artifact (e2e/screenshots/outside-provider-rendering/2026-05-11T13-06-42Z/claude), clears Windows Terminal identity env (2026-05-11T13-08-22Z/claude), types rather than pastes the audit input (2026-05-11T13-16-08Z/claude), and includes Wardian's common/per-agent --add-dir roots (2026-05-11T13-42-17Z/claude). A raw JSON --settings outside launch was tested and rejected because PowerShell/Windows Terminal produced Invalid JSON provided to --settings (e2e/screenshots/outside-provider-rendering/2026-05-11T13-44-39Z/claude); valid outside captures must keep using a settings file even though Wardian can pass raw JSON directly through CommandBuilder. Two targeted Wardian DSR experiments were rejected and reverted: dropping Claude's DSR reply entirely and translating it to the observed outside ESC[[C response both stalled Claude after its initial CSI 6 n query (df2e3f3f-33b1-4a9a-9ed7-650e2bf87abc.log and 0c5e78b9-04c3-4b36-a254-86d505aedcbe evidence).
The corrected OpenCode same-session outside capture e2e/screenshots/outside-provider-rendering/2026-05-11T10-31-30Z/opencode reaches the same assistant response/input-panel state and matching 50-column text wrapping as Wardian's e2e/screenshots/real-provider-rendering/2026-05-11T10-25-48-165Z/opencode/resized-card.png. Wardian's parser state for that capture has baseY: 0, viewportY: 0, and rows: 19, so the text is not hidden in scrollback. The remaining OpenCode delta is visual renderer behavior: Windows Terminal uses native chrome, font/glyph/cursor/scrollbar metrics, while Wardian uses the app xterm/WebView card styling.
The corrected Claude outside capture e2e/screenshots/outside-provider-rendering/2026-05-11T10-54-07Z/claude resolves claude.exe, launches with the same stream-json session id and name used by Wardian, and reaches the same named session title line as Wardian's e2e/screenshots/real-provider-rendering/2026-05-11T10-47-25-455Z/claude/resized-card.png. The remaining Claude differences are provider-owned prompt suggestions and renderer metrics: Wardian showed Try "fix typecheck errors" while Windows Terminal showed Try "how does <filepath> work?", and Windows Terminal still renders with native cell/chrome metrics while Wardian renders through xterm/WebView.
The corrected Gemini outside capture e2e/screenshots/outside-provider-rendering/2026-05-11T11-06-33Z/gemini resolves gemini.cmd, records CSI 18 t -> 50x19, and keeps the Windows Terminal tab title at WardianOutside-gemini-... despite child OSC title writes. Gemini still exposes provider/environment state differences: the outside capture visibly shows the extensions update notice, while Wardian may hide the same notice below or above the current xterm viewport depending on scroll position and redraw timing.
The Wardian real-provider audit can inject a fixed unsubmitted input string before capture. Outside Windows Terminal captures now type simple ASCII input through keystrokes and fall back to clipboard paste for more complex strings, but the outside capture manifest's input_text remains only evidence of the requested capture condition; the screenshot or parser rows must still show whether the TUI accepted the text.
Wardian pause now keeps the selected agent card visible so the paused terminal state is captured as user-visible evidence, and the real-provider audit requires a card screenshot for every captured state. Earlier outside captures used Ctrl+C as the closest interruption analogue, but that was not an exact lifecycle match: Wardian e2e/screenshots/real-provider-rendering/2026-05-11T12-51-34-329Z/codex/paused-card.png preserved render parity check, while outside e2e/screenshots/outside-provider-rendering/2026-05-11T12-52-57Z/codex/interrupted.png changed the prompt row to Codex's post-interrupt suggestion text. The outside harness now captures paused.png before sending Ctrl+C, so preserved-buffer pause parity can be evaluated separately from provider-owned interruption behavior. The focused option test verifies the new initial, resized, scrolled-top, paused, and interrupted manifest sequence, and the valid outside Claude run e2e/screenshots/outside-provider-rendering/2026-05-11T14-03-31Z/claude records the paused/interrupted split with matched 50x19 / 500x380 geometry and an unchanged scrolled-top to paused visible buffer. The Codex and OpenCode reruns against the older 2026-05-11T12-51-34-329Z Wardian home are invalid because that home no longer contains their projected provider habitats, so they are retained only as negative evidence for stale-session capture risk.
The fresh all-provider paused evidence is e2e/screenshots/real-provider-rendering/2026-05-11T14-12-45-718Z plus outside captures e2e/screenshots/outside-provider-rendering/2026-05-11T14-15-13Z/codex, 2026-05-11T14-15-30Z/claude, 2026-05-11T14-15-46Z/gemini, and 2026-05-11T14-16-02Z/opencode. These outside captures use the fresh Wardian session ids and repo-local target/wardian-e2e-real-provider-home, and their RawUI probes report 35x19, matching Wardian's post-resize paused parser geometry. Pixel comparisons show scrolled-top to paused is unchanged except cursor/scrollbar slivers or provider redraw noise, while paused to interrupted is a separate provider interruption artifact. They also reveal a Windows Terminal limitation at this narrow resize target: the ANSI probe reports the physical terminal area rather than the forced RawUI size (103x32 under the forced 980x680 pixel window, and 48x19 when launched without a forced pixel size) and returns ESC[[C rather than a parseable CSI row ; col R cursor-position response. Claude therefore does not show the same footer medium · /effort row at the 35x19 / 48x19 outside sizes, while the earlier 50x19 short-wait capture does. The fresh run also exposed and fixed an audit-harness instability: after one provider's resized capture, the next provider could spawn at the narrower 980x680 window and wrap Gemini's ready prompt across rows. The audit now resets the window before each provider spawn and normalizes whitespace when waiting for provider readiness text, so row wrapping does not masquerade as a missing prompt.
The stable all-provider comparison surface is e2e/screenshots/real-provider-rendering/2026-05-11T18-38-33-479Z plus outside captures e2e/screenshots/outside-provider-rendering/2026-05-11T18-40-41Z/codex, 2026-05-11T18-41-08Z/claude, 2026-05-11T18-41-33Z/gemini, and 2026-05-11T18-41-58Z/opencode. This run keeps Wardian at 50x19 / 500x380 for initial, resized, scrolled-top, and paused states by resizing the app height without changing the terminal text geometry, waits 12000ms after injecting render parity check, and uses the repo-local target/wardian-e2e-real-provider-home-50-stable-all home. The outside captures use the fresh Wardian session ids, font_zoom_steps: 3, initial_wait_seconds: 12, and report RawUI 50x19, ANSI CSI 18 t -> 50x19, ANSI CSI 14 t -> 500x380, and CSI 1 ; 1 R for every provider. Under that stable 50x19 surface, Codex, Claude, Gemini, and OpenCode show the same visible provider text in the resized state; renderer and chrome differences remain expected because Wardian uses WebView/xterm inside an agent card while the outside captures use native Windows Terminal chrome and rasterization. The outside pixel comparisons also show scrolled-top to paused is unchanged except cursor/scrollbar slivers or provider redraw noise, while paused to interrupted remains a separate Ctrl+C provider artifact.
auditRenderingEvidence in e2e-native/lib/rendering-audit.mjs now verifies this stable all-provider evidence chain mechanically. The verifier reads the Wardian manifest, each outside manifest, screenshot paths, PowerShell RawUI probes, ANSI CSI 14 t / CSI 18 t / CSI 6 n probes, session ids, Wardian home paths, audit input text, and Wardian parser rows. Against the stable evidence above it reported 221 checks and 0 failures: all four providers have matching session ids, matching 50x19 character geometry, matching 500x380 text-area geometry, parseable outside cursor-position replies, required initial/resized/scrolled-top/paused/interrupted screenshots, and Wardian paused parser rows identical to the scrolled-top parser rows. The verifier can require outside text snapshots and can compare selected outside copied-text states against Wardian parser rows after trimming trailing cell padding and leading/trailing blank rows. This verifier intentionally does not OCR Windows Terminal screenshots, so screenshots remain the visual evidence for chrome, cursor, scrollbar, and rasterization differences.
Text-snapshot reruns exposed and then closed the Claude scrolled-top text mismatch. Fresh Codex outside capture e2e/screenshots/outside-provider-rendering/2026-05-11T18-53-51Z/codex copied resized/scrolled-top/paused text that matches Wardian parser rows exactly after normalizing leading/trailing blank rows. Fresh Claude captures first showed Claude Code v2.1.138 plus current: 2.1.138 · latest: 2.1.139 outside while Wardian was already rendering v2.1.139; rerunning after the Claude self-update completed made resized text match, but Wardian 2026-05-11T18-59-08-927Z/claude/scrolled-top-card.png still exposed a stale top Claude Code box border through xterm scrollback. The root cause was a provider clear-by-newlines redraw sequence split across PTY chunks: per-chunk normalization missed it, so xterm treated the clear as literal newlines and promoted a transient Claude welcome row into scrollback. Wardian now normalizes fullscreen clear-by-newlines at the joined PTY batch boundary. The confirming Wardian run e2e/screenshots/real-provider-rendering/2026-05-11T19-26-18-946Z records Claude baseY: 0 for initial, resized, scrolled-top, and paused states. The matching outside capture e2e/screenshots/outside-provider-rendering/2026-05-11T19-26-57Z/claude starts at the same 65x21 text geometry and resizes to the same 36x19 wrapping surface; copied text for initial, resized, scrolled-top, and paused matches Wardian parser rows exactly after trimming trailing cell padding and leading/trailing blank rows. The outside capture script also now widens the PowerShell RawUI buffer before widening WindowSize, avoiding the Window cannot be wider than the screen buffer artifact when launching wider initial captures.
A fresh all-provider stable run after the batch-clear fix is e2e/screenshots/real-provider-rendering/2026-05-11T19-34-01-431Z, using absolute repo-local WARDIAN_HOME target/wardian-e2e-real-provider-home-stable-all-text-abs. A prior run with relative WARDIAN_E2E_REAL_RENDERING_HOME failed OpenCode before rendering because its projected habitat workspace path was resolved under the agent habitat; all real-provider rendering runs that include OpenCode should use an absolute test home. Matching outside captures with the correct home are e2e/screenshots/outside-provider-rendering/2026-05-11T19-38-45Z/codex, 2026-05-11T19-39-20Z/claude, 2026-05-11T19-39-55Z/gemini, and 2026-05-11T19-40-29Z/opencode. Codex and Claude copied text matches Wardian parser rows for initial/resized/scrolled-top/paused under this stable 50x19 surface. Gemini copied text includes whole-buffer scrollback and soft-wrapped logical rows, so auditRenderingEvidence now has an explicit visual-row comparison path: it wraps outside copied text to the audited column count and anchors to the bottom viewport for initial/resized captures or the top viewport for scrolled-top/paused captures. Against 2026-05-11T19-34-01-431Z plus outside 2026-05-11T19-39-55Z/gemini, this visual-row verifier reports 70 checks and 0 failures for initial, resized, scrolled-top, and paused states. A mouse-drag visible-selection experiment was rejected because Windows Terminal selection copied stale or incorrect text and disturbed provider state.
OpenCode's remaining text mismatch was provider lifecycle nondeterminism, not xterm rendering. Wardian previously obtained an OpenCode session id through a headless bootstrap path, while the outside capture launched or reattached through a different OpenCode lifecycle and exposed bootstrap/random home rows in scrollback. OpenCode now uses a Wardian-generated id for fresh visible PTY startup, like Gemini, and the outside capture only passes --session when the captured id is a real ses_... OpenCode id. The audit path also seeds an isolated XDG_STATE_HOME under the run's WARDIAN_HOME with tips_hidden: true, suppressing OpenCode's provider-owned random home tip without changing the user's global OpenCode state. The confirming Wardian run is e2e/screenshots/real-provider-rendering/2026-05-11T20-15-17-992Z with outside capture e2e/screenshots/outside-provider-rendering/2026-05-11T20-16-08Z/opencode; exact copied-text comparison for initial, resized, scrolled-top, and paused reports 70 checks and 0 failures.
Provider text parity is now mechanically verified across the stable evidence set. Codex uses Wardian 2026-05-11T19-34-01-431Z plus outside 2026-05-11T19-38-45Z/codex, Claude uses Wardian 2026-05-11T19-34-01-431Z plus outside 2026-05-11T19-39-20Z/claude, Gemini uses Wardian 2026-05-11T19-34-01-431Z plus outside 2026-05-11T19-39-55Z/gemini with visual-row normalization, and OpenCode uses Wardian 2026-05-11T20-15-17-992Z plus outside 2026-05-11T20-16-08Z/opencode. Each provider reports 70 audit checks and 0 failures over initial, resized, scrolled-top, and paused states, including matching session ids, 50x19 character geometry, 500x380 text-area geometry, required screenshots/text snapshots, matching audit input, and matching Wardian paused parser rows.
Wardian's native audit no longer treats .xterm-viewport.scrollTop = 0 as proof that a scrolled screenshot moved. With the WebGL renderer and Wardian's current xterm styling, DOM scrollHeight can equal clientHeight even while xterm's parser has scrollback. The debug-only audit hook now calls xterm's own scroll-to-top API on the parser and renderer and waits for viewportY: 0 before capturing scrolled-top. Fresh Gemini evidence in e2e/screenshots/real-provider-rendering/2026-05-11T10-57-03-613Z/gemini/scrolled-top.json records viewportY: 0 and the paired card screenshot visibly shows the top-of-buffer frame rather than repeating the resized bottom viewport.
The deterministic Wardian rendering audit now uses the same debug scroll API and compares Wardian parser rows against a headless xterm rendered at the same viewportY. This prevents a passing scroll assertion from relying on DOM scroll state that may not correspond to the rendered terminal viewport.

Consequences

Positive: scrolling and rendering quality should improve across all providers, not just OpenCode
Positive: terminal handling becomes less provider-specific and easier to maintain
Positive: remount behavior becomes deterministic and resilient
Positive: OpenCode theme/capability debugging can move onto a cleaner terminal foundation
Negative: the terminal stack becomes more structured and therefore more code moves through shared abstractions
Negative: parser/view divergence must be prevented by keeping resize and capability handling synchronized
Negative: full app-restart persistence remains intentionally out of scope for this pass

Terminal Runtime Hardening ​

Context and Problem Statement ​

Decision ​

Architecture ​

1. Keep the Current Base Stack ​

2. Separate Parsed Terminal State from Mounted Renderer State ​

3. Introduce a Terminal Capability Broker ​

4. Add Explicit In-Memory Replay Ownership ​

5. Improve PTY Output Transport and Buffering ​

6. Normalize Home-Redraw TUI Scrollback ​

Scope ​

Included ​

Excluded ​

Testing Strategy ​

Consequences ​