OuiDire: an evidence workflow (sources, mechanisms, citations, encrypted vault)

Jan 1, 2026

What is the OuiDire.app about?

A workflow to turn psychiatric/legal PDFs into auditable claim cards with exportable citations.

Pipeline:

PDF → OCR/layout extraction → segmentation into claim cards
Each card: source tag (provenance) + mechanism macro verdict (8) + optional fine-grained tags (~30)
Output: copyable excerpt + citation anchor (optionally a short rationale)

Sources + mechanism macros (8) + tags (~30)

Source layer (provenance):

Hearsay / Oui-dire

Mechanism macros = verdict layer (export-grade outcomes):

Narrative deviation (patient’s words rewritten)
Fabrication / extrapolation
Biographical rewrite
Recycled psychiatric antecedents (RAP)
Internal contradictions
Critical omissions
Amplification
Canonisation (false story hardens through repetition)

Tags = instrumentation layer (mechanism-level cues), used for search/filtering, concise rationales, heuristics, and ML features. Examples of tag families: attribution verbs (“reported by”), time anchoring issues, hedging vs certainty inflation, contradiction markers, recycled-history signals, omission patterns.

Macros = “what kind of failure.” Tags = “how it manifests.”

Auditability (the core contract)

Each claim card has:

stable UID: ${doc_id}:${alle_id}
citation anchors (spans/offsets; ideally page/layout refs too)
optional short rationale (no prose)

Exports include:

excerpt text
doc_id (+ page when available)
claim UID
macro verdict (+ optional tags)
optional rationale

Goal: “where did this claim come from?” is answerable immediately.

Privacy here is leverage, not virtue

In this context, “privacy-first” isn’t a preference; it’s operational control. When civil rights can be suspended, the practical risks are losing access, losing copies, losing narrative control. On-device state and an encrypted vault are continuity tools: keep a usable record, keep exports reproducible, reduce third-party exposure. The point is simple: don’t let your file become (or remain) someone else’s uninspectable story.

Cloud-first, with clear boundaries

We start cloud-first for:

significantly superior speed
extraction quality (OCR/layout is hard, especially on scans)
persistence + sync (multi-device reliability; future controlled sharing)

Boundaries:

explicit user action (no hidden upload)
minimization (only what’s needed for the chosen function)
storage is encrypted client-side (vault is not plaintext “platform dossier”)
machine suggests; humans verify (exports follow verified layer)

OCR / extraction (Azure Document Intelligence)

invoked per document, on explicit user action
returns structured OCR/layout used for segmentation and anchors
outputs can be cached for auditability (on-device and/or in the vault)

Storage (Azure Vault)

The target is an optional vault that stores encrypted blobs:

client-side encryption (WebCrypto) before upload
vault stores: ciphertext + iv + salt + version
server handles auth + scoped access (e.g., SAS); it cannot read contents

Local-only fallback (later, degraded)

A strict local-only path can exist later as a fallback:

no cloud OCR
lower extraction quality on scans
maximum offline/privacy constraints

It’s a trade-off, not the primary path.

Human vs machine (clean learning signal)

Two parallel layers:

Machine: suggestions + confidence (fast utility)
Verified: human decisions (export-grade truth)

Track per claim:

confirm / reject / skip
per macro and tag families
per document type

This yields calibration/training signal without centralizing raw dossiers by default.

v0 → v1 → v2

v0:

claim cards + batching/pagination + checkpoint
tri-state decisions (neutral → ✅ → ✕ → neutral)
stable UIDs + working-state persistence
exports (excerpt + citation)

v1:

opt-in telemetry: suggestion/confidence → verified outcome
macro confusion matrices, tag usefulness, rule lift
no raw text by default

v2:

heuristics → reranker pipeline
explicit intra-document contradiction checks
cross-document checks only on request
model remains subordinate to verified layer

Future themes (not necessarily in that order)

how anchors/spans are represented so citations stay stable through exports
tag families (~30) and mapping to macros
Azure Vault design (client-side encryption, sync boundaries, threat model)
“human vs machine” loop: what confirmed vs rejected suggestions teach us