How ConstructiVision defends every code change through simulated user testing, multi-VM golden baseline comparison, and an untouchable oracle.
AI-assisted development is genuinely transformative. In a single session, an AI engineering agent can SSH into multiple virtual machines, edit dozens of source files, write registry keys, commit code, and update documentation. Work that would take a human engineer days can happen in minutes.
ConstructiVision is a safety-adjacent construction tool. It generates lift-point calculations, weld connection specifications, and tilt-up panel geometry used on real job sites. An undetected regression doesn't produce a bad user review. The blast radius is measured in concrete and steel — potentially weeks after the software shipped, during a pour or a lift, with no ctrl-Z available.
The pre-committed failure mode analysis — RPN scores, S/O/D ratings, and full bug traceability — lives on the Risk Management & DFMEA page. When a product's top failure modes carry S=10 structural consequences, engineering-grade quality methods are not optional. The question here is narrower: how do you reliably detect those failures before any code leaves the build?
Conventional wisdom says: write lots of unit tests at the base of the pyramid, fewer integration tests in the middle, and a small number of end-to-end tests at the top. For most software, this is correct.
ConstructiVision runs as AutoLISP code inside AutoCAD 2000, a 2001-era CAD application. There is:
(assert), no deftest, no mock injection API..lsp source files.The inability to write unit tests led directly to simulated user validation: automated scripts that drive a real copy of AutoCAD exactly the way a real user does, on real production drawings, and capture screenshots of every result. This is the highest-quality test type in the testing literature — and we run it on every build across a multi-OS VM matrix.
No single layer is trusted in isolation. The system requires all three to agree before a test is declared passing.
AutoIT scripts drive AutoCAD exactly as a real user would — opening production drawings, sending the exact progcont bitmask values that menu macros use, waiting for dialogs, and capturing screenshots. Not a mock. Not a stub. The real application, the real data, the real code path.
Tesseract OCR extracts text from every captured screenshot. This checks not just "did a dialog appear?" but "is the text in it correct?" — semantic validation of UI state. A dialog box for the wrong function looks structurally identical but reads differently.
OCR output is compared character-by-character against the golden baseline from VM 103 at ≥95% accuracy. A test passes only when the running build produces text that matches what the unmodified 2008 production environment produces.
Layers 1 and 2 were built. Then Bug 25 was discovered: the AutoIT validation log marks error dialogs as PASS — it checks "did a dialog appear?" not "is the dialog correct?" Layer 3 (OCR comparison) was added specifically to catch what Layer 1's log misses. This is Measurement System Analysis applied to a test harness — most projects never question whether their test framework is telling the truth.
VM 103 is the original 2008 GSCI workstation. It runs v7.0 production exactly as it shipped. It has never been modified. It is physically isolated from all development activity. No AI agent has write access to it.
The golden OCR baselines come from VM 103. No matter what development changes, no matter how fast AI works, VM 103's state is the unchanging definition of correct. The comparison is between what AI-modified code produces and what the unmodified production environment produces. AI cannot rationalize its way around a comparison it cannot influence.
In .github/copilot-instructions.md, Critical Rule #10 reads:
A human developer understands the spirit of a test. An AI optimizing toward "make the test pass" will take the path of least resistance — and the path of least resistance is editing the test, not fixing the code. Without an explicit prohibition, an AI could:
The product would be broken. The test suite would say it's fine. The golden baseline would be silently corrupted. Critical Rule #10 closes this escape hatch permanently.
Three bugs from the Feb–Mar 2026 test campaign illustrate why UI-level testing is not just convenient — it is the only layer at which these failures become visible:
Symptom: setvars Function cancelled error on VM 103.
Cause: Missing Group1 / Pop13 entries in HKCU\...\Profiles\<<Unnamed Profile>>\Menus — an installation artifact invisible to any code-level analysis.
How it was found: AutoIT → OCR extracted the exact error text from the screenshot. A unit test of the AutoLISP source would never touch the Windows registry. This failure lives entirely in the deployment and OS integration layer.
Symptom: AutoCAD crash (0xC0000005) at LocalizeReservedPlotStyleStrings+533 on VM 104 during VLX load.
Cause: The compiled plugin loaded before the printer/plot subsystem finished initializing — a race condition between OS subsystems that no static analysis can detect.
How it was found: Screenshot captured by AutoIT showed the crash dialog. Crash dump at reports/ocr-output/vm104-feb28/acadstk.dmp preserved the full stack trace. A linter reads source code; it cannot observe process initialization order.
Symptom: "File not found" when opening CSBsite1.dwg on VM 104.
Cause: Missing Project Settings\CV\RefSearchPath registry key — a gap in the manual installation process that the automated installer would have covered.
How it was found: AutoIT ran the "Edit Existing Drawing" sequence. The resulting dialog was wrong. SSH reg query on VM 102 vs VM 104 identified the missing subkey. No code analysis can compare registry state across VMs.
Registry configuration, startup timing, and cross-environment path differences are integration properties of the deployed system — not of the source code in isolation. Simulated user validation is the only test methodology that runs the complete system and observes what a user actually sees.
The same AutoIT script runs against four distinct platform configurations. The same progcont bitmask either routes to the correct dialog on all four, or the test isolates exactly which platform has the mismatch.
| VM | OS | Build | What It Validates |
|---|---|---|---|
| 102 | XP SP3 | V11 source | XP + NTFS junction + sparse checkout pipeline |
| 104 | XP SP3 | PB11 VLX mode | Known-good compiled build — the golden reference execution path |
| 108 | Win10 x32 | TB11 source-mode | Win10 registry profile, AutoCAD startup configuration |
| 109 | Win10 x64 | TB11 | Wow6432Node COM path + 64-bit registry redirection |
Zero extra test code for four configurations. The test is the same. The oracle is the same. The only variable is the platform — which is exactly what you want when validating a product that must run across Windows XP SP3 through Windows 10 x64.
86% of the bugs found in testing were predicted before coding began. Every bug filed against this build carries a mandatory DFMEA ID linking it back to a version-controlled risk entry with S/O/D ratings. The full traceability chain is at Risk Management & DFMEA →
Each validation run creates its own timestamped output directory:
Stale screenshots from a previous passing run cannot mask a current failure. This is a common failure mode in screenshot-based test suites — a test crashes before capturing any screenshots, the old screenshots are still on disk, the comparison passes against the previous run's images, and the breakage is invisible. The timestamped directory architecture prevents this at the infrastructure level, not the assertion level.
Each layer exists because a specific failure was found or predicted:
AI cannot modify test fixtures to make broken code pass. The test is the definition of correct; fixes go in the code.
The unmodified 2008 production environment defines what correct looks like. AI has no write access. Comparison cannot be gamed.
Drives AutoCAD with exact production menu macros on real drawings. Catches deployment, registry, and runtime failures invisible to code analysis.
Reads what's actually on screen. Added after Bug 25 proved the AutoIT log marks error dialogs as PASS. Trust but verify — then add a second verifier.
Compares against VM 103 OCR at character level. Tolerates OCR noise while catching wrong dialogs, missing content, and error messages.
DFMEA pre-commitment, RPN scores, 86% prediction coverage, and bidirectional bug traceability — on a dedicated page. Risk & DFMEA →
The Feb 26–28 sprint produced 3 bugs fixed, 5 VMs touched, and full documentation committed — in approximately 6 AI-assisted sessions. The validation rails didn't prevent that velocity. They're what made it responsible to attempt it on a product where the pre-committed DFMEA failure modes carry S=10 severity ratings.
ConstructiVision is not a monolithic program. It is 126 modules loaded dynamically at runtime, routing user intent through a bitmask dispatcher into dozens of independent dialog chains. Every global variable written by one module can be read — correctly or incorrectly — by any downstream module. This section maps those dependencies, traces the critical path from user input to panel book output, and defines which validations must run when any given module changes.
The highlighted path below is the production workflow. Every node in the critical path (red) must pass its associated validation test before a build is considered releasable. Side paths (gray) have their own validation requirements defined in the serialization table below.
Each row names a module role, who calls it, what it calls, and the global state it reads vs. writes. A state value written by one module and read by the next is a dependency edge — the exact type of coupling that an AI code change can silently break.
| Module Role | Called By | Calls | State Read | State Written |
|---|---|---|---|---|
| Entry Module startup entry point |
CAD startup suite; menu macro | environment init, all feature modules | route-key, drawing-type | working-dir, drawing-flag, drawing-type, project-name, all 58 named constants |
| Environment Init CAD system variables |
Entry Module | CAD setvar × 40+ | install-dir, working-dir | 35+ CAD system variables (units, snap, echo, file dialogs, etc.) |
| Options Hub feature selection dialog |
Entry Module (default route) | Detail Entry, Print/Export (via load-directive) | drawing-flag, working-dir, install-dir | selection-key, load-directive, drawing-flag, working-dir, completion-flag |
| Detail Entry primary data input dialog |
Options Hub (via load-directive), Entry Module | State I/O, field validators, dialog utilities, Revision History, project helpers | master-state, project-name, working-dir, scale-factor, selection-key | completion-flag, drawing-name, attribute-vars (all panel fields via dynamic set) |
| State I/O persist & load panel state |
Detail Entry, Entry Module | field validators, site dialog, project name helper | master-state, project-name, working-dir | master-state (authoritative alist — written on save, read by all downstream) |
| Geometry Engine draw panel geometry |
Panel Inspector, Detail Entry | geometry helpers, corner mitering, scale resolver, connection placer | master-state, scale-factor, corner-points (4) | CAD drawing database entities (panel geometry) |
| Connection Module ⚠ RPN 360 — safety-critical |
Geometry Engine, Detail Entry (via selection routing) | connection placer, dialog utilities, viewport helper | master-state, connection vars, scale-factor | connection geometry entities, hardware schedule data (drives physical weld specs) |
| Feature Pages panel book page generation |
Horizontal Frame Dialog, Vertical Frame Dialog | dialog enable/disable helpers, viewport helper, field validators | master-state, feature lists, corner-points (4) | page-completion-flag, completion-flag, feature symbol vars (dynamic set) |
| Print / Export terminal output node |
Entry Module (print routes), Options Hub (via load-directive) | plotter discovery, style enumeration, paper size resolver, path finder, readiness check | batch config vars, working-dir, project-name | device list, paper list, style list, output-selection vars (device, paper, style, orientation) |
These are the 12 most-shared state values in the critical path: each one is a dependency edge where a write in one module must be correct before a downstream read will produce valid output. master-state is the authoritative state bundle — it flows through every dialog in the panel workflow.
| State Value | Written By (what it captures) | Read By (what it drives) |
|---|---|---|
route-key |
Menu macros — the routing bitmask set by each menu item; 13 distinct values | Entry Module dispatcher — determines which dialog chain runs |
master-state |
State I/O module (on drawing open + save); Detail Entry (on accept) | Detail Entry, Geometry Engine, Connection Module, Feature Pages, Print/Export — virtually every downstream module |
selection-key |
Options Hub — button pressed by user (encodes feature sub-type) | Options Hub dispatch, Detail Entry sub-dialog routing |
load-directive |
Options Hub — encodes target type (typical/select × panel/site) | Entry Module (determines branch: Detail Entry vs. site dialog chain) |
drawing-flag |
Entry Module (detected on file open), Options Hub (user override) | Options Hub (enables context-dependent buttons only if drawing exists) |
working-dir |
Entry Module (init), Options Hub (on project select) | Detail Entry, State I/O, Print/Export, Batch Queue — file path anchor for all I/O |
attribute-vars |
Detail Entry — dynamically set for all panel attribute fields | Geometry Engine, Panel Inspector, schedule page modules — geometry and schedule generation |
completion-flag |
Every dialog module — set on accept, cleared on cancel | Entry Module flow control — determines whether to proceed to next dialog or abort |
scale-factor |
Entry Module (read from drawing or project settings) | Geometry Engine, Detail Entry, Connection Module — all geometry scaling |
drawing-type |
Entry Module — 3-tier detection: named dictionary → layer names → nil | Entry Module dispatcher (routing predicate), Options Hub (button state) |
page-state |
Feature Pages module — page completion and zoom state | Horizontal/Vertical Frame dialogs — panel book feature page rendering |
output-selection |
Print/Export module — selected device, paper, style, orientation | Print/Export execution engine; Batch Queue — output configuration |
Not every code change requires a full regression run. This table defines the minimum required validation for each module group. FULL SUITE means all 11 screenshots × 4 VMs. CRITICAL PATH means default route + Detail Entry + Print paths only. TARGETED means the specific dialog sequence only.
| Module Group Changed | Min. Test Scope | Rationale |
|---|---|---|
| Startup chain (Entry Module, CAD startup hook, menu loader, Environment Init) | FULL SUITE | 99 modules depend on the initialization these files perform. A change here can silently affect any downstream function. All 4 VM configurations must pass: legacy OS and modern OS × 32-bit/64-bit have divergent registry paths and startup timing. |
| Options Hub (dialog + layout) | FULL SUITE | The feature routing hub. Routes to every downstream feature. All dialog buttons must produce the correct dialog chain on all platforms. A silent routing regression here affects every feature. |
| Detail Entry dialog + State I/O module | CRITICAL PATH + sub-dialogs | Primary data entry. Changes here can corrupt master-state — the authoritative state bundle read by every downstream module. Must run Detail Entry + connection + feature pages + print in sequence. |
| Connection Module (dialog + geometry + hardware spec) | FULL SUITE RPN 360 | Safety-critical. DFMEA S=10: wrong hardware specification → structural failure during panel lift. Any change requires full four-platform validation. OCR must confirm correct hardware values in all screenshot captures. |
| Geometry Engine + Panel Inspector + layout helpers | CRITICAL PATH | Panel geometry generation. Changes here affect panel book dimensions and lift-point data. Must run full panel draw sequence and verify panel book page output via OCR. |
| Feature Pages module + horizontal/vertical frame dialogs | TARGETED: feature pages | Panel book feature page iteration. Run the feature page sequence for both frame types. Verify page-state transitions via OCR screenshot comparison. |
| Print / Export module + output dialog | CRITICAL PATH: print sequences | Terminal output node. Must run Print All, Print Select, and panel book export paths on all platforms. The module enumerates OS-installed output devices — behavior differs across legacy OS / modern OS and 32-bit / 64-bit. Verify device selection UI text via OCR. |
| Shared dialog utilities (field update, validation, enable/disable) | CRITICAL PATH | Shared utilities called by Detail Entry, Feature Pages, Connection Module. Changes ripple to all dialogs that use them. Run Detail Entry dialog + first feature page as minimum coverage. |
| Materials List dialog | TARGETED: Route G | Isolated routing branch — changes affect only the materials workflow. Verify dialog content via OCR. |
| Revision History dialog | TARGETED: Route H | Isolated routing branch. Verify dialog appears and accepts input. OCR-verify form field labels against golden baseline. |
| Utility Calculator | TARGETED: Route A | Separate standalone utility, not in the panel workflow. No master-state dependency. Verify via targeted route launch. |
| Batch Queue module | TARGETED: Route E | Batch print/process queue. Shares output-selection state with Print/Export — verify no variable collision if both modules changed in same commit. |
The numbers below illustrate why simulated user validation is the only tractable approach. The combinatorial space is too large for any exhaustive manual test plan — but the au3 script collapses it to a fixed, automated, reproducible set of execution paths.
A change to the State I/O module (which writes the master-state bundle) demands a full critical-path run because every downstream module reads master-state. A change to an isolated utility route demands only a targeted run because that branch touches no shared state. The call graph and state map above are the mechanistic justification for each row in the serialization table — not convention, not intuition, but the actual dependency structure of the code.
This is what DFSS means when it says quality is engineered in, not inspected in. The test plan was derived from the architecture. Any AI change that modifies a shared state value automatically inherits the validation requirements of every module that reads it.
This validation system is built on Design for Six Sigma (DFSS) — specifically the CDOV process (Concept → Design → Optimize → Verify) from Creveling, Slutsky & Antis (2003). DFSS treats quality as a measurable engineering property: Y = f(X), where Y is "the product generates correct output on a real job site" and X is the set of measurable parameters (registry state, dialog content, startup timing, OS version). Every AutoIT test measures Y. Every OCR comparison validates whether Y matches the golden baseline. The DFMEA maps which X values carry S ≥ 9 consequences — see Risk Management & DFMEA for the full risk analysis.
← Automation System Risk & DFMEA → Deployment Pipeline →
SimpleStruct — Powered by ConstructiVision | Quality-First AI-Assisted Engineering