OuiDire: an evidence workflow (sources, mechanisms, citations, encrypted vault)
What is the OuiDire.app about?
A workflow to turn psychiatric/legal PDFs into auditable claim cards with exportable citations.
Pipeline:
- PDF → OCR/layout extraction → segmentation into claim cards
- Each card: source tag (provenance) + mechanism macro verdict (8) + optional fine-grained tags (~30)
- Output: copyable excerpt + citation anchor (optionally a short rationale)
Sources + mechanism macros (8) + tags (~30)
Source layer (provenance):
- Hearsay / Oui-dire
Mechanism macros = verdict layer (export-grade outcomes):
- Narrative deviation (patient’s words rewritten)
- Fabrication / extrapolation
- Biographical rewrite
- Recycled psychiatric antecedents (RAP)
- Internal contradictions
- Critical omissions
- Amplification
- Canonisation (false story hardens through repetition)
Tags = instrumentation layer (mechanism-level cues), used for search/filtering, concise rationales, heuristics, and ML features. Examples of tag families: attribution verbs (“reported by”), time anchoring issues, hedging vs certainty inflation, contradiction markers, recycled-history signals, omission patterns.
Macros = “what kind of failure.” Tags = “how it manifests.”
Auditability (the core contract)
Each claim card has:
- stable UID:
${doc_id}:${alle_id} - citation anchors (spans/offsets; ideally page/layout refs too)
- optional short rationale (no prose)
Exports include:
- excerpt text
- doc_id (+ page when available)
- claim UID
- macro verdict (+ optional tags)
- optional rationale
Goal: “where did this claim come from?” is answerable immediately.
Privacy here is leverage, not virtue
In this context, “privacy-first” isn’t a preference; it’s operational control. When civil rights can be suspended, the practical risks are losing access, losing copies, losing narrative control. On-device state and an encrypted vault are continuity tools: keep a usable record, keep exports reproducible, reduce third-party exposure. The point is simple: don’t let your file become (or remain) someone else’s uninspectable story.
Cloud-first, with clear boundaries
We start cloud-first for:
- significantly superior speed
- extraction quality (OCR/layout is hard, especially on scans)
- persistence + sync (multi-device reliability; future controlled sharing)
Boundaries:
- explicit user action (no hidden upload)
- minimization (only what’s needed for the chosen function)
- storage is encrypted client-side (vault is not plaintext “platform dossier”)
- machine suggests; humans verify (exports follow verified layer)
OCR / extraction (Azure Document Intelligence)
- invoked per document, on explicit user action
- returns structured OCR/layout used for segmentation and anchors
- outputs can be cached for auditability (on-device and/or in the vault)
Storage (Azure Vault)
The target is an optional vault that stores encrypted blobs:
- client-side encryption (WebCrypto) before upload
- vault stores:
ciphertext + iv + salt + version - server handles auth + scoped access (e.g., SAS); it cannot read contents
Local-only fallback (later, degraded)
A strict local-only path can exist later as a fallback:
- no cloud OCR
- lower extraction quality on scans
- maximum offline/privacy constraints
It’s a trade-off, not the primary path.
Human vs machine (clean learning signal)
Two parallel layers:
- Machine: suggestions + confidence (fast utility)
- Verified: human decisions (export-grade truth)
Track per claim:
- confirm / reject / skip
- per macro and tag families
- per document type
This yields calibration/training signal without centralizing raw dossiers by default.
v0 → v1 → v2
v0:
- claim cards + batching/pagination + checkpoint
- tri-state decisions (neutral → ✅ → ✕ → neutral)
- stable UIDs + working-state persistence
- exports (excerpt + citation)
v1:
- opt-in telemetry: suggestion/confidence → verified outcome
- macro confusion matrices, tag usefulness, rule lift
- no raw text by default
v2:
- heuristics → reranker pipeline
- explicit intra-document contradiction checks
- cross-document checks only on request
- model remains subordinate to verified layer
Future themes (not necessarily in that order)
- how anchors/spans are represented so citations stay stable through exports
- tag families (~30) and mapping to macros
- Azure Vault design (client-side encryption, sync boundaries, threat model)
- “human vs machine” loop: what confirmed vs rejected suggestions teach us