Architecture

How RecoverLand is built

A full A–Z map of every script, what it owns, how it talks to its neighbours, and which thread runs it. One page to understand the plugin without reading a single Python file.

Vision

RecoverLand replaces a server-side audit trigger with a client-side capture pipeline driven by QGIS events. Because the plugin must work where no DB server runs (Shapefile, GeoPackage, SpatiaLite, Memory) and where the user has no DBA rights (most PostgreSQL setups), the trigger’s guarantees must be rebuilt in Python: atomicity, identity, durability, schema typing, concurrent access.

The result is six functional layers, each with a single responsibility. Domain logic stays pure Python (testable in isolation). Qt and QGIS APIs are confined to the outer rings. SQLite holds the journal. A handful of background threads keep the UI fluid.

Capture
Forward-only, observation-based

Listens to QGIS commit signals. Snapshots feature state before commit, computes the diff after commit, emits one AuditEvent per change.

Storage
Local SQLite, WAL, schema v5

Single file per project. Append-only event stream with five companion tables (sessions, datasources, aliases, settings, schema version).

Restore
Plan, preflight, apply, trace

Two modes (event-based and temporal). Plan-then-execute. STRICT applies via the QGIS editing buffer with rollback. BEST_EFFORT applies directly per entity.

Health
Self-monitoring

Integrity check at startup, WAL checkpoint, pending events recovered from disk, disk-space watchdog, retention purge with VACUUM.

The six layers

Each box is a layer, each layer has one role. Dependencies flow downward only: the UI knows about workflows, workflows know about core domain, core domain knows about infrastructure. Nothing flows back up.

L1 — Entry & lifecycle __init__.py · recover.py  —  plugin factory, signals wiring, backend bootstrap L2 — UI surface (Qt) recover_dialog · journal_info_bar · journal_maintenance · status_bar_widget widgets/ (time_slider, restore_mode_selector, restore_preflight_dialog, toggle_switch, themed_logo) L3 — Background threads local_search_thread · journal_stats_thread · version_fetch_thread · qgs_task_support L4 — Restore orchestration (main thread, chunked) restore_runner  —  QTimer-driven, three runners (event, strict rewind, undo) L5 — Core domain (39 modules) Contracts: audit_backend · restore_contracts Identity & data: identity · serialization · geometry_utils · schema_drift · audit_field_policy · support_policy Capture: edit_tracker · edit_buffer · write_queue Storage: sqlite_schema · sqlite_backend · journal_manager · datasource_registry · datasource_alias · local_settings Read: search_service · event_stream_repository · journal_audit · layer_stats_cache L6 — Infrastructure (transverse) compat · logger · sql_safety · observability · constants · time_format · user_identity
LayerOwnsForbidden
L1 EntryPlugin lifecycle, signal wiring, backend bootstrapUI rendering, SQL writes
L2 UIWidgets, dialogs, user interactionDirect DB I/O, blocking work > 50 ms
L3 ThreadsAsync reads, stats, fetchQGIS layer mutation (main thread only)
L4 Restore orchestrationChunking, progress, cancellationFeature-level matching logic
L5 Core domainAll business logic, all SQL, all snapshot logicQt widgets, blocking sleeps
L6 InfrastructureCompat shims, logging, safety assertsDomain knowledge

Module dependency graph

The full plugin in one frame. Each node is a Python file. Each line is a real import. Hover any node to focus it: it lights up in gold, its dependencies and consumers stay readable, everything else fades to grey. A short description is streamed in below, plus clickable chips to walk the graph.

Entry · Contracts UI Threads Capture Storage Read Restore Identity Health Infra
Hover a circle Tab to step through
Active module
Hover a module to explore
50 modules

Move your cursor over any node. The graph fades everything else and streams a description. Hover a chip below to jump.
How to read this graph: arrows go from consumer to producer (caller to callee). Each cluster is one functional area. The brightness rule is simple — gold filled means "focused", gold-outlined means "directly connected", faded grey means "in the graph but not connected to the focus". Cycles are forbidden by construction.

Threading model

Four threads run side by side. The UI thread owns all QGIS object mutations. The writer thread owns all SQL writes. Reader threads only do SELECT. No thread is shared across responsibilities.

UI thread (Qt main loop) edit_tracker QGIS signals recover_dialog user input + state restore_runner QTimer chunks restore_executor apply on QGIS layer Worker threads (QThread / QgsTask) write_queue (writer) batch 500 · retry · WAL ckpt stats thread aggregate counts search thread paginated SELECT SQLite journal (WAL mode) N readers + 1 writer concurrently safe PID-based file lock against duplicate QGIS instances
ThreadOwnsStarted by
UI (main)Widgets, QGIS layer reads/writes, signal handlers, QTimer-driven restore chunksQGIS itself
WriterDrains the event queue, batches INSERTs, runs WAL checkpoints, recovers pending eventsWriteQueue.start() from recover.py
StatsOne-shot aggregate queries for the smart bar (debounced 300 ms)journal_stats_thread
SearchOne-shot paginated SELECT with light projection (no BLOBs)local_search_thread
Why this matters: every QGIS layer mutation (commit, restore apply, addMapLayer) must run on the UI thread. Every SQL write must run on the writer thread (single writer keeps WAL coherent). Crossing these boundaries causes the kind of race conditions the plugin spent two years debugging.

Capture pipeline

From a user click in QGIS to a persisted row in SQLite. The path crosses three threads and seven modules. Every step has a clear contract.

1. User edits a feature in QGIS — commit triggers Qt signals beforeCommitChanges · editingStopped · featureAdded · geometryChanged · featureDeleted 2. edit_tracker.py (UI thread) Pre-commit: read OLD state of touched features → snapshot in edit_buffer Post-commit: read NEW state, compute delta, build AuditEvent 3a. serialization QVariant → JSON-safe types compute_update_delta(old, new) 3b. geometry_utils QgsGeometry → WKB (BLOB) CRS + geometry type captured 4. identity compute_feature_identity (PK or FID) compute_entity_fingerprint (SHA-256) 5. write_queue.enqueue(event) Queue 50k bound · if full: pending JSON on disk via integrity module Hands over to writer thread 6. SQLite audit_event (batch 500, WAL) 3 retries on transient errors · checkpoint every 60 s
StepModuleCostFailure mode
1 SignalQGIS itselfO(1)Some providers do not fire all signals (WFS-T, memory) → support_policy classifies them
2 Snapshotedit_tracker + edit_bufferO(N edited features)Buffer cap 10 000 features / 200 MB → flush + WARNING log
3 Serializeserialization + geometry_utilsO(attributes + geom size)Unknown QVariant types log a warning, fall back to string
4 FingerprintidentityO(1) per featureWeak identity on shapefile FID → classified MEDIUM, restore is best-effort
5 Enqueuewrite_queueO(1) non-blockingQueue full → pending JSON sidecar, replayed at next startup
6 Persistsqlite_schema + SQLiteO(batch size)SQLite locked → 3 retries → pending JSON sidecar

Restore pipeline

Restore is the hardest part of the plugin. It must reconstruct a past state in a present world that may have drifted. Two entry modes fork into one common applier.

Mode A — Event-based User selects rows in the table restore_planner.plan_event_restore() Mode B — Temporal Rewind User picks a cutoff date event_stream_repository → rewind_dedup RestorePlan (pure data, zero QGIS) List<PlannedAction> · List<Conflict> · AtomicityPolicy Preflight check volume + schema_drift + provider caps → GO / WARN / BLOCKED restore_preview (human-readable plan) User confirms in the preflight dialog restore_runner (UI thread, chunked) Drives execution via QTimer, emits progress signal STRICT — via editing buffer layer.startEditing() → apply chunk commit on success / rollback on error BEST_EFFORT — direct provider per-entity apply via provider partial success allowed restore_service (feature-by-feature) _find_by_snapshot → match → apply → emit trace audit_event — restored_from_event_id (trace)
Why restore is hard: the matching function _find_by_snapshot has six fallback levels (FID, PK, attrs full, attrs+geom, lenient match, max-FID heuristic). Each level was added because one provider broke the previous one. This is where 90% of the historical bugs live (see the tech debt section).

Module catalog — Entry & lifecycle

Two files. Together they own everything that happens between "QGIS starts the plugin" and "QGIS unloads the plugin".

FileLinesOwns
__init__.py30QGIS plugin factory. Compiles translations if needed. Returns RecoverPlugin.
recover.py538Detects duplicate installs, opens the journal, starts the writer queue, instantiates the edit tracker, wires QGIS project signals (layersAdded, cleared, readProject), schedules orphan cleanup, periodic disk-space check, status bar widget.

Module catalog — UI surface

Everything Qt. The dialog is the single point of entry to the plugin from a user perspective. The status bar widget is the always-visible indicator.

FileLinesRole
recover_dialog.py2 797Monolith. Mixes widget construction, restore orchestration, geometry preview lifecycle, smart bar wiring, state machine. Largest tech debt.
journal_info_bar.py236Smart bar at the top of the dialog: per-operation tile counters, color-coded health pill.
journal_maintenance.py309Maintenance dialog: retention config, manual purge, async VACUUM, integrity check, export.
status_bar_widget.py93Persistent indicator in the QGIS status bar (left-click toggles tracking, right-click opens dialog).
themed_action_icon.py123SVG toolbar icon recoloured to match QGIS light/dark theme at runtime.
widgets/themed_logo.py186Animated themed logo for the dialog header.
widgets/time_slider.py161Cutoff date slider for the temporal Rewind mode.
widgets/restore_mode_selector.py80Switch between Mode A (event) and Mode B (temporal).
widgets/restore_preflight_dialog.py94Confirmation dialog showing the plan summary before apply.
widgets/toggle_switch.py64iOS-style toggle switch used in several panels.

Module catalog — Background threads

Each thread has a single job and lives for one operation. None of them mutate QGIS objects.

FileLinesRole
local_search_thread.py74Runs search_events() in a worker thread. Emits result via Qt signal.
journal_stats_thread.py118Debounced (300 ms) aggregate query for the smart bar.
version_fetch_thread.py104Fetches post-cutoff events for the Rewind preview.
qgs_task_support.py64Abstraction over QThread / QgsTask. Same API on QGIS 3.40 (Qt5) and 4.x (Qt6).

Module catalog — Contracts & types

The vocabulary of the plugin. Pure data shapes and enums. Zero QGIS, zero Qt. Anything that crosses a layer boundary is one of these types.

FileLinesOwns
core/constants.py2Single constant PLUGIN_NAME.
core/audit_backend.py76Defines AuditEvent (21 fields), SearchCriteria, SearchResult, RestoreReport, and the abstract AuditBackend interface.
core/restore_contracts.py164Enums (RestoreMode, ConflictPolicy, AtomicityPolicy, PreflightVerdict), PlannedAction, RestorePlan, PreflightReport, volume limits, COMPENSATORY_OPS matrix.
core/support_policy.py136Per-provider capture/restore matrix. Decides if a layer is FULL/PARTIAL/INFO support and STRONG/MEDIUM/WEAK identity.
core/audit_field_policy.py51Single source of truth for "audit metadata" field names (date_modif, updated_at, gid, etc.). Used by capture, delta, and restore so they agree on what to ignore.

Module catalog — Identity & data

Everything that turns a live QGIS feature into stable bytes (and back). The reliability of every restore depends on these five modules agreeing on field and geometry comparisons.

FileLinesOwns
core/identity.py165Datasource and feature fingerprints. Normalizes PostgreSQL / MSSQL / Oracle / OGR URIs into canonical strings. Hashes identity to a stable SHA.
core/user_identity.py68Resolves the current user name: plugin config → RECOVERLAND_USER env → OS login → QGIS profile → unknown.
core/serialization.py189QVariant ⇄ JSON-safe values. compute_update_delta() for the old/new attribute diff. iter_mapped_attributes() applies the field mapping at restore time.
core/geometry_utils.py188WKB ⇆ QgsGeometry. Comparison, feature matching, CRS extraction, provider geometry probing.
core/schema_drift.py142Compares the field schema captured at audit time with the current layer schema. Produces matched / missing / added / type-changed report.

Module catalog — Capture path

The path that turns QGIS edits into rows in SQLite. Spans two threads.

FileLinesRole
core/edit_buffer.py213In-memory feature snapshots per session per layer. Bounded at 10 000 features / 200 MB. Only the first snapshot per feature is kept (the pre-edit state).
core/edit_tracker.py802Core of capture. Connects to six QGIS signals per layer, snapshots before commit, builds AuditEvents after commit, hands them to the write queue.
core/write_queue.py244Dedicated writer thread (RecoverLand-Writer). Bounded queue (50k events). Batch executemany of 500 rows. 3-retry policy on transient errors. Passive WAL checkpoint every 60 s. If queue overflows: pending JSON sidecar on disk.

Module catalog — Storage & registry

The SQLite layer: schema, opening/closing, file location, settings persistence, datasource bookkeeping.

FileLinesRole
core/sqlite_schema.py269DDL of the six tables, ten indexes. PRAGMAs (WAL, mmap, cache, busy_timeout). Migration ladder v1 → v5. Schema version table.
core/journal_manager.py324Locates or creates the SQLite file. Saved project → .recoverland/ next to the .qgz. Unsaved → QGIS profile under a content hash. PID-based file lock against duplicate QGIS instances. Read-only connections for worker threads.
core/sqlite_backend.py54Facade implementing AuditBackend. Delegates writes to write_queue, reads to search_service.
core/local_settings.py87Per-project settings persisted in backend_settings: retention days, max events, capture toggle, user override.
core/datasource_registry.py223Stores URI / provider / authcfg / CRS / geometry type at first commit. Used at restore time to recreate a layer when it is not currently loaded. Resolves DB credentials via QGIS saved connections (passwords never persisted).
core/datasource_alias.py125Links an old fingerprint to a new one when a layer moves (path change, renamed DB, provider switch). Transitive resolution bounded to 8 hops to prevent cycles.

Module catalog — Read & search

The query side of the journal. All reads are paginated and bounded. The lightweight projection skips BLOBs when the UI does not need them.

FileLinesRole
core/search_service.py272Paginated search with multi-criteria filtering. Lightweight mode strips geometry BLOBs. count_events, get_event_by_id, get_distinct_layers, get_distinct_users, summarize_scope.
core/event_stream_repository.py161Temporal queries for restore: entity stream, events after cutoff (DESC for reverse replay), count-only variants. All bounded by MAX_EVENTS_PER_RESTORE.
core/journal_audit.py143Single-query introspection: top N users, top N layers, per-operation counts, time range. Zero QGIS, safe for workers.
core/layer_stats_cache.py95Cache of min/max dates and operation types per datasource. Built in one GROUP BY. Thread-safe for reads after build.

Module catalog — Restore engine

The largest functional area. Five files share the work of turning past events into a present-day layer mutation, plus one helper for previews and one for in-canvas geometry display.

FileLinesRole
core/rewind_dedup.py229Receives N events post-cutoff. Filters trace events and invalidated ones. Eliminates user events already compensated by a trace. Collapses INSERT+DELETE chains on the same entity. Pure deterministic logic, zero QGIS.
core/restore_planner.py203Builds the RestorePlan. Mode A iterates selected events. Mode B calls the stream repository then the dedup. Runs the preflight (volume / drift / coverage). Pure data output.
core/restore_executor.py624Applies a plan on a QGIS layer. Checks provider capabilities (AddFeatures, DeleteFeatures, ChangeAttributeValues, ChangeGeometries). Two strategies: STRICT (editing buffer + rollback) and BEST_EFFORT (per-entity direct).
core/restore_service.py823Feature-by-feature primitives: re-insert deleted, revert updated, delete inserted. Hosts _find_by_snapshot with six fallback levels. Builds restore trace events (restored_from_event_id). Undo support.
core/workflow_service.py181Groups events by datasource fingerprint, finds the target layer in the project, orchestrates per-group restore and per-group undo. Cleans up temporary layers added during restore.
core/restore_preview.py77Formats RestorePlan and PreflightReport into a human-readable summary for the confirmation dialog. Zero QGIS.
core/geometry_preview.py76Displays the captured geometry of an audit event on the QGIS canvas as a QgsRubberBand. One preview at a time, cleaned on dialog close.

Module catalog — Health & maintenance

The plugin watches its own state: integrity at startup, disk space periodically, journal size against thresholds, retention purge with VACUUM. All operations are bounded and logged.

FileLinesRole
core/health_monitor.py206Evaluates HEALTHY / INFO / WARNING / CRITICAL based on size, event count, age. Produces translated user messages and remediation suggestions.
core/disk_monitor.py68Free-disk check on the journal volume. Triggers tracking disable below the critical threshold.
core/integrity.py261Startup integrity check: PRAGMA integrity_check, WAL checkpoint, schema version verification. Reads recoverland_pending.json (events that did not reach SQLite on the last run) and replays them.
core/retention.py187Purge by age and by volume. 5k-row batches. Async VACUUM under a mutex. Defaults: 365 days, 1M events max.
core/db_maintenance.py71Periodic ANALYZE, quick integrity check, grouped WAL checkpoint. Safe with concurrent readers.

Module catalog — Infrastructure

The cross-cutting layer. Every other module may use these. None of them know about the domain.

FileLinesRole
compat.py254Single source for Qt5 / Qt6 and QGIS 3.40 / 4.x divergence. All Qt.X, Qgis.X, QgsWkbTypes.Y, QgsVectorDataProvider.Capability.Z go through here. Direct access elsewhere is forbidden.
core/logger.py116Rotating file logger (5 × 5 MB) in the QGIS profile directory + QgsMessageLog mirror. flog(), qlog(), timed_op() context manager for elapsed-ms tracking, generate_trace_id() for correlation.
core/sql_safety.py28Defense-in-depth assertion. Any f-string SQL fragment passes through assert_safe_fragment(), which rejects unsafe characters. Values are always parameterised separately.
core/observability.py262CycleStats accumulator for restore/rewind cycles (raw / deduped / planned / applied / skipped / failed / elapsed_ms). log_cycle_summary() emits one summary line plus anomaly lines. log_state_transition() tracks critical flags. assert_invariant() escalates to CRITICAL on violation.
core/time_format.py97Relative ("3 hours ago"), short absolute ("May 12 14:30"), full ISO. compute_history_span() for the smart bar.

Tech debt map

The architecture is healthy on most axes. The debt is concentrated in three exact places. Naming them here makes them easier to attack later.

Debt 1
recover_dialog.py monolith

2 797 lines, five responsibilities (widget build, restore orchestration, geometry preview, smart bar, state machine). Every restore fix touches this file because the orchestration logic lives inside the UI.

Debt 2
Restore matching scattered

The single decision "which live feature matches this snapshot?" lives in both restore_service.py and restore_executor.py with subtly different rules. Most historical bugs (RW-11 to RW-19) come from this duplication.

Debt 3
core/__init__.py over-exports

191 symbols re-exported. Any module can from .core import X. The real dependency graph is invisible without grep.

Healthy
Everything else

Contracts, persistence, capture, compat layer, observability and 30+ small core modules pass the "one file = one responsibility" rule and stay under 300 lines.

Why the codebase is this size: replacing a 100-line PostgreSQL trigger requires re-implementing six guarantees the database gives for free — atomic transactions, server-side capture, stable row identity, durable persistence, concurrent access, typed schema. Together these cost ~3 000 lines of Python that cannot be shorter. Another ~1 500 lines are the three debt items above and are recoverable through focused refactoring.