Quality & Validation

How ConstructiVision defends every code change through simulated user testing, multi-VM golden baseline comparison, and an untouchable oracle.

The Problem With Moving Fast

AI-assisted development is genuinely transformative. In a single session, an AI engineering agent can SSH into multiple virtual machines, edit dozens of source files, write registry keys, commit code, and update documentation. Work that would take a human engineer days can happen in minutes.

Speed Without a Trusted Oracle Is How You Destroy a Codebase Fast

ConstructiVision is a safety-adjacent construction tool. It generates lift-point calculations, weld connection specifications, and tilt-up panel geometry used on real job sites. An undetected regression doesn't produce a bad user review. The blast radius is measured in concrete and steel — potentially weeks after the software shipped, during a pour or a lift, with no ctrl-Z available.

The pre-committed failure mode analysis — RPN scores, S/O/D ratings, and full bug traceability — lives on the Risk Management & DFMEA page. When a product's top failure modes carry S=10 structural consequences, engineering-grade quality methods are not optional. The question here is narrower: how do you reliably detect those failures before any code leaves the build?


The Test Pyramid — Inverted By Constraint

Conventional wisdom says: write lots of unit tests at the base of the pyramid, fewer integration tests in the middle, and a small number of end-to-end tests at the top. For most software, this is correct.

ConstructiVision runs as AutoLISP code inside AutoCAD 2000, a 2001-era CAD application. There is:

We Were Forced to the Top of the Pyramid — and It Turns Out to Be the Most Powerful Place to Be

The inability to write unit tests led directly to simulated user validation: automated scripts that drive a real copy of AutoCAD exactly the way a real user does, on real production drawings, and capture screenshots of every result. This is the highest-quality test type in the testing literature — and we run it on every build across a multi-OS VM matrix.


Three Independent Validation Layers

No single layer is trusted in isolation. The system requires all three to agree before a test is declared passing.

LAYER 1

Simulated User Input (AutoIT)

AutoIT scripts drive AutoCAD exactly as a real user would — opening production drawings, sending the exact progcont bitmask values that menu macros use, waiting for dialogs, and capturing screenshots. Not a mock. Not a stub. The real application, the real data, the real code path.

LAYER 2

OCR Semantic Validation

Tesseract OCR extracts text from every captured screenshot. This checks not just "did a dialog appear?" but "is the text in it correct?" — semantic validation of UI state. A dialog box for the wrong function looks structurally identical but reads differently.

LAYER 3

Golden Baseline Comparison

OCR output is compared character-by-character against the golden baseline from VM 103 at ≥95% accuracy. A test passes only when the running build produces text that matches what the unmodified 2008 production environment produces.

Layers 1 and 2 were built. Then Bug 25 was discovered: the AutoIT validation log marks error dialogs as PASS — it checks "did a dialog appear?" not "is the dialog correct?" Layer 3 (OCR comparison) was added specifically to catch what Layer 1's log misses. This is Measurement System Analysis applied to a test harness — most projects never question whether their test framework is telling the truth.


VM 103 — The Untouchable Oracle

VM 103 is the original 2008 GSCI workstation. It runs v7.0 production exactly as it shipped. It has never been modified. It is physically isolated from all development activity. No AI agent has write access to it.

The golden OCR baselines come from VM 103. No matter what development changes, no matter how fast AI works, VM 103's state is the unchanging definition of correct. The comparison is between what AI-modified code produces and what the unmodified production environment produces. AI cannot rationalize its way around a comparison it cannot influence.


The Rule That Exists Specifically Because of AI

In .github/copilot-instructions.md, Critical Rule #10 reads:

// Critical Rule #10 — NON-NEGOTIABLE
NEVER modify the AutoIT validation scripts.

The .au3 files are the test fixtures — they produce known-good results
on VMs 102, 103, and PB11 on 104. If the au3 fails against a build,
the build is broken, not the test.

All fixes must be in the CSV build code.
// This rule is absolute and non-negotiable.

Why This Rule Is Written For AI, Not Humans

A human developer understands the spirit of a test. An AI optimizing toward "make the test pass" will take the path of least resistance — and the path of least resistance is editing the test, not fixing the code. Without an explicit prohibition, an AI could:

  1. Receive a failing test result
  2. Reason that "the test expectation looks wrong"
  3. Adjust the au3 to match the broken behavior
  4. Report the test as passing
  5. Commit and push

The product would be broken. The test suite would say it's fine. The golden baseline would be silently corrupted. Critical Rule #10 closes this escape hatch permanently.


What Simulated User Validation Catches That Nothing Else Can

Three bugs from the Feb–Mar 2026 test campaign illustrate why UI-level testing is not just convenient — it is the only layer at which these failures become visible:

Bug 19 — Missing Registry Keys

Symptom: setvars Function cancelled error on VM 103.

Cause: Missing Group1 / Pop13 entries in HKCU\...\Profiles\<<Unnamed Profile>>\Menus — an installation artifact invisible to any code-level analysis.

How it was found: AutoIT → OCR extracted the exact error text from the screenshot. A unit test of the AutoLISP source would never touch the Windows registry. This failure lives entirely in the deployment and OS integration layer.

Bug 20 — Startup Timing Crash

Symptom: AutoCAD crash (0xC0000005) at LocalizeReservedPlotStyleStrings+533 on VM 104 during VLX load.

Cause: The compiled plugin loaded before the printer/plot subsystem finished initializing — a race condition between OS subsystems that no static analysis can detect.

How it was found: Screenshot captured by AutoIT showed the crash dialog. Crash dump at reports/ocr-output/vm104-feb28/acadstk.dmp preserved the full stack trace. A linter reads source code; it cannot observe process initialization order.

Bug 21 — Missing Project Path

Symptom: "File not found" when opening CSBsite1.dwg on VM 104.

Cause: Missing Project Settings\CV\RefSearchPath registry key — a gap in the manual installation process that the automated installer would have covered.

How it was found: AutoIT ran the "Edit Existing Drawing" sequence. The resulting dialog was wrong. SSH reg query on VM 102 vs VM 104 identified the missing subkey. No code analysis can compare registry state across VMs.

All Three Bugs Are Invisible Below the UI Level

Registry configuration, startup timing, and cross-environment path differences are integration properties of the deployed system — not of the source code in isolation. Simulated user validation is the only test methodology that runs the complete system and observes what a user actually sees.


Multi-VM Platform Matrix — One Test, Four Configurations

The same AutoIT script runs against four distinct platform configurations. The same progcont bitmask either routes to the correct dialog on all four, or the test isolates exactly which platform has the mismatch.

VM OS Build What It Validates
102 XP SP3 V11 source XP + NTFS junction + sparse checkout pipeline
104 XP SP3 PB11 VLX mode Known-good compiled build — the golden reference execution path
108 Win10 x32 TB11 source-mode Win10 registry profile, AutoCAD startup configuration
109 Win10 x64 TB11 Wow6432Node COM path + 64-bit registry redirection

Zero extra test code for four configurations. The test is the same. The oracle is the same. The only variable is the platform — which is exactly what you want when validating a product that must run across Windows XP SP3 through Windows 10 x64.


Risk pre-commitment lives on its own page

86% of the bugs found in testing were predicted before coding began. Every bug filed against this build carries a mandatory DFMEA ID linking it back to a version-controlled risk entry with S/O/D ratings. The full traceability chain is at Risk Management & DFMEA →


One Subtle Detail That Prevents Silent Failures

Each validation run creates its own timestamped output directory:

C:\CV-Validation\VM102\run-20260228-143012\00-baseline-drawing-loaded.bmp
C:\CV-Validation\VM102\run-20260228-143012\01-csv-entry-dialog.bmp
C:\CV-Validation\VM102\run-20260228-143012\validation-log.txt
C:\CV-Validation\VM102\latest-run.txt ← always points to current run

Stale screenshots from a previous passing run cannot mask a current failure. This is a common failure mode in screenshot-based test suites — a test crashes before capturing any screenshots, the old screenshots are still on disk, the comparison passes against the previous run's images, and the breakage is invisible. The timestamped directory architecture prevents this at the infrastructure level, not the assertion level.


The Complete Defense Stack

Each layer exists because a specific failure was found or predicted:

RULE

Critical Rule #10 — Fixture Immutability

AI cannot modify test fixtures to make broken code pass. The test is the definition of correct; fixes go in the code.

ORACLE

VM 103 — Untouchable Baseline

The unmodified 2008 production environment defines what correct looks like. AI has no write access. Comparison cannot be gamed.

LAYER 1

AutoIT — Simulated Real User

Drives AutoCAD with exact production menu macros on real drawings. Catches deployment, registry, and runtime failures invisible to code analysis.

LAYER 2

OCR — Semantic Content Check

Reads what's actually on screen. Added after Bug 25 proved the AutoIT log marks error dialogs as PASS. Trust but verify — then add a second verifier.

LAYER 3

Golden Diff — ≥95% Character Match

Compares against VM 103 OCR at character level. Tolerates OCR noise while catching wrong dialogs, missing content, and error messages.

RISK

Pre-Committed Risk Model

DFMEA pre-commitment, RPN scores, 86% prediction coverage, and bidirectional bug traceability — on a dedicated page. Risk & DFMEA →

This Framework Doesn't Slow Down AI Development — It's What Makes It Safe to Go Fast

The Feb 26–28 sprint produced 3 bugs fixed, 5 VMs touched, and full documentation committed — in approximately 6 AI-assisted sessions. The validation rails didn't prevent that velocity. They're what made it responsible to attempt it on a product where the pre-committed DFMEA failure modes carry S=10 severity ratings.


Architecture Dependency Map & Validation Serialization

ConstructiVision is not a monolithic program. It is 126 modules loaded dynamically at runtime, routing user intent through a bitmask dispatcher into dozens of independent dialog chains. Every global variable written by one module can be read — correctly or incorrectly — by any downstream module. This section maps those dependencies, traces the critical path from user input to panel book output, and defines which validations must run when any given module changes.

126
source modules in active build
99
modules auto-loaded at startup
184
named functions defined across the codebase
101
unique global variables shared across modules
58
named constants defined at startup
44
dialog definition files
13
routing paths through the dispatcher (1 default + 12 feature routes)
4
VM platform configurations in test matrix

Critical Path: User Input → Panel Book Print/Export

The highlighted path below is the production workflow. Every node in the critical path (red) must pass its associated validation test before a build is considered releasable. Side paths (gray) have their own validation requirements defined in the serialization table below.

User Menu Action
selects feature
Entry Module Startup & Init
loads modules
sets environment
Routing Dispatcher Bitmask dispatch
13 branches
R:route-key
Options Hub Feature Selection
R:context flags
W:selection, directive
Detail Entry Data Input
R:master-state
W:attributes, ok
State I/O Persist & Load
R/W:master-state
Geometry Engine Draw Output
R:master-state, scale
Connection Module Structural Specs
RPN 360 ⚠
Feature Pages Panel Book
R:master-state
W:page-state
Print / Export OUTPUT
R:batch config
W:device selection
Side branches from routing dispatcher (separate validation paths):
Utility Calculator Route A
Project Setup Routes B / C
Legacy Drawing Import Route D
Batch Queue Route E
Layer Manager Route F
Materials List Route G
Revision History Route H

Module Call Graph — Calls & Called-By for Critical Path Nodes

Each row names a module role, who calls it, what it calls, and the global state it reads vs. writes. A state value written by one module and read by the next is a dependency edge — the exact type of coupling that an AI code change can silently break.

Module Role Called By Calls State Read State Written
Entry Module
startup entry point
CAD startup suite; menu macro environment init, all feature modules route-key, drawing-type working-dir, drawing-flag, drawing-type, project-name, all 58 named constants
Environment Init
CAD system variables
Entry Module CAD setvar × 40+ install-dir, working-dir 35+ CAD system variables (units, snap, echo, file dialogs, etc.)
Options Hub
feature selection dialog
Entry Module (default route) Detail Entry, Print/Export (via load-directive) drawing-flag, working-dir, install-dir selection-key, load-directive, drawing-flag, working-dir, completion-flag
Detail Entry
primary data input dialog
Options Hub (via load-directive), Entry Module State I/O, field validators, dialog utilities, Revision History, project helpers master-state, project-name, working-dir, scale-factor, selection-key completion-flag, drawing-name, attribute-vars (all panel fields via dynamic set)
State I/O
persist & load panel state
Detail Entry, Entry Module field validators, site dialog, project name helper master-state, project-name, working-dir master-state (authoritative alist — written on save, read by all downstream)
Geometry Engine
draw panel geometry
Panel Inspector, Detail Entry geometry helpers, corner mitering, scale resolver, connection placer master-state, scale-factor, corner-points (4) CAD drawing database entities (panel geometry)
Connection Module
⚠ RPN 360 — safety-critical
Geometry Engine, Detail Entry (via selection routing) connection placer, dialog utilities, viewport helper master-state, connection vars, scale-factor connection geometry entities, hardware schedule data (drives physical weld specs)
Feature Pages
panel book page generation
Horizontal Frame Dialog, Vertical Frame Dialog dialog enable/disable helpers, viewport helper, field validators master-state, feature lists, corner-points (4) page-completion-flag, completion-flag, feature symbol vars (dynamic set)
Print / Export
terminal output node
Entry Module (print routes), Options Hub (via load-directive) plotter discovery, style enumeration, paper size resolver, path finder, readiness check batch config vars, working-dir, project-name device list, paper list, style list, output-selection vars (device, paper, style, orientation)

Global State Map — Dependencies Across the Critical Path

These are the 12 most-shared state values in the critical path: each one is a dependency edge where a write in one module must be correct before a downstream read will produce valid output. master-state is the authoritative state bundle — it flows through every dialog in the panel workflow.

State Value Written By (what it captures) Read By (what it drives)
route-key Menu macros — the routing bitmask set by each menu item; 13 distinct values Entry Module dispatcher — determines which dialog chain runs
master-state State I/O module (on drawing open + save); Detail Entry (on accept) Detail Entry, Geometry Engine, Connection Module, Feature Pages, Print/Export — virtually every downstream module
selection-key Options Hub — button pressed by user (encodes feature sub-type) Options Hub dispatch, Detail Entry sub-dialog routing
load-directive Options Hub — encodes target type (typical/select × panel/site) Entry Module (determines branch: Detail Entry vs. site dialog chain)
drawing-flag Entry Module (detected on file open), Options Hub (user override) Options Hub (enables context-dependent buttons only if drawing exists)
working-dir Entry Module (init), Options Hub (on project select) Detail Entry, State I/O, Print/Export, Batch Queue — file path anchor for all I/O
attribute-vars Detail Entry — dynamically set for all panel attribute fields Geometry Engine, Panel Inspector, schedule page modules — geometry and schedule generation
completion-flag Every dialog module — set on accept, cleared on cancel Entry Module flow control — determines whether to proceed to next dialog or abort
scale-factor Entry Module (read from drawing or project settings) Geometry Engine, Detail Entry, Connection Module — all geometry scaling
drawing-type Entry Module — 3-tier detection: named dictionary → layer names → nil Entry Module dispatcher (routing predicate), Options Hub (button state)
page-state Feature Pages module — page completion and zoom state Horizontal/Vertical Frame dialogs — panel book feature page rendering
output-selection Print/Export module — selected device, paper, style, orientation Print/Export execution engine; Batch Queue — output configuration

Validation Serialization — Change This Module, Run These Tests

Not every code change requires a full regression run. This table defines the minimum required validation for each module group. FULL SUITE means all 11 screenshots × 4 VMs. CRITICAL PATH means default route + Detail Entry + Print paths only. TARGETED means the specific dialog sequence only.

Module Group Changed Min. Test Scope Rationale
Startup chain (Entry Module, CAD startup hook, menu loader, Environment Init) FULL SUITE 99 modules depend on the initialization these files perform. A change here can silently affect any downstream function. All 4 VM configurations must pass: legacy OS and modern OS × 32-bit/64-bit have divergent registry paths and startup timing.
Options Hub (dialog + layout) FULL SUITE The feature routing hub. Routes to every downstream feature. All dialog buttons must produce the correct dialog chain on all platforms. A silent routing regression here affects every feature.
Detail Entry dialog + State I/O module CRITICAL PATH + sub-dialogs Primary data entry. Changes here can corrupt master-state — the authoritative state bundle read by every downstream module. Must run Detail Entry + connection + feature pages + print in sequence.
Connection Module (dialog + geometry + hardware spec) FULL SUITE RPN 360 Safety-critical. DFMEA S=10: wrong hardware specification → structural failure during panel lift. Any change requires full four-platform validation. OCR must confirm correct hardware values in all screenshot captures.
Geometry Engine + Panel Inspector + layout helpers CRITICAL PATH Panel geometry generation. Changes here affect panel book dimensions and lift-point data. Must run full panel draw sequence and verify panel book page output via OCR.
Feature Pages module + horizontal/vertical frame dialogs TARGETED: feature pages Panel book feature page iteration. Run the feature page sequence for both frame types. Verify page-state transitions via OCR screenshot comparison.
Print / Export module + output dialog CRITICAL PATH: print sequences Terminal output node. Must run Print All, Print Select, and panel book export paths on all platforms. The module enumerates OS-installed output devices — behavior differs across legacy OS / modern OS and 32-bit / 64-bit. Verify device selection UI text via OCR.
Shared dialog utilities (field update, validation, enable/disable) CRITICAL PATH Shared utilities called by Detail Entry, Feature Pages, Connection Module. Changes ripple to all dialogs that use them. Run Detail Entry dialog + first feature page as minimum coverage.
Materials List dialog TARGETED: Route G Isolated routing branch — changes affect only the materials workflow. Verify dialog content via OCR.
Revision History dialog TARGETED: Route H Isolated routing branch. Verify dialog appears and accepts input. OCR-verify form field labels against golden baseline.
Utility Calculator TARGETED: Route A Separate standalone utility, not in the panel workflow. No master-state dependency. Verify via targeted route launch.
Batch Queue module TARGETED: Route E Batch print/process queue. Shares output-selection state with Print/Export — verify no variable collision if both modules changed in same commit.

Test Combination Mathematics

The numbers below illustrate why simulated user validation is the only tractable approach. The combinatorial space is too large for any exhaustive manual test plan — but the au3 script collapses it to a fixed, automated, reproducible set of execution paths.

progcont routing paths × VM platforms = 13 × 4 = 52 routing validation cases
panel sub-dialog sequences × VM platforms = 6 × 4 = 24 sub-dialog integration tests
screenshots per au3 run × VM platforms = 11 × 4 = 44 OCR-validated screenshots per cycle
global variables × routing paths = 101 × 13 = 1,313 potential variable-state interactions
functions × avg. call sites = 184 × ~3 = ~552 potential call paths modeled by the au3
DFMEA failure modes modeled = 21 failure modes, 86% predicted before testing began

Total au3 automation coverage: 52 routing + 24 sub-dialog + 44 OCR checks captured in one 572-line test script across 4 VM platforms

Why the Call Graph Determines Which Tests Must Run

A change to the State I/O module (which writes the master-state bundle) demands a full critical-path run because every downstream module reads master-state. A change to an isolated utility route demands only a targeted run because that branch touches no shared state. The call graph and state map above are the mechanistic justification for each row in the serialization table — not convention, not intuition, but the actual dependency structure of the code.

This is what DFSS means when it says quality is engineered in, not inspected in. The test plan was derived from the architecture. Any AI change that modifies a shared state value automatically inherits the validation requirements of every module that reads it.


Framework Alignment

This validation system is built on Design for Six Sigma (DFSS) — specifically the CDOV process (Concept → Design → Optimize → Verify) from Creveling, Slutsky & Antis (2003). DFSS treats quality as a measurable engineering property: Y = f(X), where Y is "the product generates correct output on a real job site" and X is the set of measurable parameters (registry state, dialog content, startup timing, OS version). Every AutoIT test measures Y. Every OCR comparison validates whether Y matches the golden baseline. The DFMEA maps which X values carry S ≥ 9 consequences — see Risk Management & DFMEA for the full risk analysis.

← Automation System Risk & DFMEA → Deployment Pipeline →


SimpleStruct — Powered by ConstructiVision | Quality-First AI-Assisted Engineering