Maintenance

Journal maintenance & diagnostics

How RecoverLand keeps its audit journal healthy, compact and reliable. Every purge is explicit, every compaction is confirmed, zero data loss.

Philosophy

The audit journal grows forever by design: every INSERT, UPDATE and DELETE on every tracked layer produces an immutable event. Without maintenance, a journal used daily grows at 1–5 MB/day depending on edit intensity.

RecoverLand takes a conservative stance:

Explicit only
No silent deletion

Events are never purged silently. Every delete requires user confirmation or an explicit auto-purge policy.

Non-destructive
Safe export first

The user can export a full backup before any purge. Export uses sqlite3.backup() — crash-safe, lock-free.

Self-diagnosing
Health on demand

Size, event count, disk space, schema version and SQLite integrity are checked and reported in one panel.

Async operations
Never blocks the UI

VACUUM runs in a dedicated thread. Purge batches avoid holding the SQLite write lock for more than milliseconds.

Journal lifecycle

From creation to compaction, the journal passes through four phases. Each phase has a clear trigger and a clear operation.

Growth Events accumulate WAL batches writes Monitoring Health thresholds Disk space check Purge Retention cutoff Batched DELETE Compaction VACUUM (async) Reclaim disk Integrity check (on demand or at startup) PRAGMA integrity_check · WAL checkpoint · schema version · pending recovery Result: is_healthy + issues[] + recovered_events + rejected Diagnostic analysis (BL-OPT-08) Entity distribution · BLOB ratios · geometry dedup · schema JSON redundancy INSERT+DELETE pairs · invalidated traces · optimization potential

Health monitoring

The health_monitor.py module evaluates journal health from two axes: size and event count. Each axis has three escalation levels. The worst of both wins.

Size axis < 50 MB — HEALTHY 50–200 MB — INFO 200–500 MB — WARN ≥ 500 MB — CRITICAL Event count axis < 100k — HEALTHY 100k–500k — INFO 500k–1M — WARNING ≥ 1M — CRITICAL Disk space > 500 MB — HEALTHY 100–500 MB — WARNING < 100 MB — CRITICAL
Disk critical: when free disk space drops below 100 MB, the health monitor sets should_disable_tracking = True. The plugin stops capturing new events to avoid filling the disk and corrupting both the journal and the QGIS project file.
LevelUI feedbackAction proposed
HEALTHYGreen label “Sain”None
INFOBlue temporal coverage messageNone
WARNINGOrange warning + size + countOpen maintenance, configure retention
CRITICALRed alert + suggestion textPurge immediately + VACUUM

Integrity check

The check_journal_integrity() function runs four sequential checks and returns an IntegrityResult NamedTuple. It can be triggered manually from the maintenance dialog or runs at startup.

1. PRAGMA integrity_check Verifies B-tree, page checksums, freelist 2. PRAGMA wal_checkpoint(PASSIVE) Flushes pending WAL pages to main DB 3. Schema version validation Detects downgrade or missing schema 4. Pending event recovery Reads recoverland_pending.json, validates each event, re-inserts valid ones IntegrityResult is_healthy: bool issues: List[str] recovered_events: int pending_rejected: int Each rejected event stays in pending file
Pending recovery: when the WriteQueue fails to persist events (SQLite locked, disk full), it saves them to recoverland_pending.json. At next startup, integrity check validates each event (required keys, known keys, valid operation_type) and re-inserts the good ones. Invalid events are kept in the file with a WARNING log.

Retention policy

The retention policy has two axes: time-based (retention_days) and count-based (max_events). Both are user-configurable in the maintenance dialog and persisted in QgsSettings.

RetentionPolicy retention_days: 365 + max_events: 1 000 000
SettingKeyDefaultRange
ConservationRecoverLand/retention_days365 jours7 – 3650
MaximumRecoverLand/max_events1 000 00010 000 – 10 000 000
Auto-purgeRecoverLand/auto_purgeFalseOn / Off
Auto-purge: when enabled, events outside the retention window are deleted at startup without user confirmation. Disabled by default. The user must activate it consciously in the maintenance dialog.

Purge engine

The purge engine in retention.py deletes events that fall outside the retention policy. It operates in batches of 5 000 to avoid holding the SQLite write lock for extended periods.

1. Compute cutoff timestamp UTC now - retention_days 2. Batch DELETE (5k rows/batch) WHERE created_at < cutoff LIMIT 5000 loop until < batch 3. Enforce max_events cap DELETE oldest if total > max_events 4. Post-purge maintenance ANALYZE + WAL checkpoint 5. Purge orphan datasources PurgeResult deleted_count: int vacuum_done: bool (always False here) error: str (empty on success)
Why batches? SQLite in WAL mode allows concurrent readers during writes, but the write lock is exclusive. Deleting 100k rows in one statement blocks all writes for seconds. By chunking to 5k, each lock hold is < 50 ms and the WriteQueue can interleave new events between batches.

Orphan trace purge

After a time-based purge, some restore trace events may reference deleted user events. The _purge_orphan_traces() function cleans these:

DELETE FROM audit_event
WHERE restored_from_event_id IS NOT NULL
  AND restored_from_event_id NOT IN
  (SELECT event_id FROM audit_event
   WHERE restored_from_event_id IS NULL)

Session-level purge

purge_by_session(session_id) deletes all events from a specific editing session. Used for test cleanup and debug scenarios.

VACUUM & compaction

SQLite DELETE does not reclaim disk space—it marks pages as free for reuse. Only VACUUM physically rebuilds the file. Because VACUUM rewrites the entire database, it must run in a dedicated thread.

UI thread Click “Compacter” Show progress (indeterminate) vacuum_async() RecoverLand-Vacuum thread WAL checkpoint (TRUNCATE) VACUUM (full rewrite) callback(success) UI callback (QTimer.singleShot) Refresh stats · hide progress bar _vacuum_lock threading.Lock() 1 VACUUM at a time
StepOperationWhy
1wal_checkpoint(TRUNCATE)Flush all WAL pages into the main DB file before VACUUM
2VACUUMRebuild the database file, releasing freed pages to the OS
3CallbackNotify UI on main thread (Qt thread safety via QTimer.singleShot(0))
Lock protection: if a second VACUUM attempt arrives while one is running, it logs a WARNING and returns immediately via _vacuum_lock.acquire(blocking=False). No crash, no deadlock.

Export (backup)

The export uses Python’s sqlite3.Connection.backup() API, which performs a page-level copy that is:

  • Crash-safe: the copy is atomic at the page level
  • Lock-free for readers: uses a separate read connection
  • Consistent: the backup is a point-in-time snapshot
create_read_connection() sqlite3.connect(dest) src_conn.backup(dst_conn) Close both
Why not shutil.copy2? A file copy while the journal is open (WAL mode active) can produce a corrupted backup if the -wal file is not copied in sync. The backup() API handles this atomically inside SQLite.

Journal analysis

The diagnostic engine (journal_diagnostics.py, BL-OPT-08) runs read-only queries to measure storage patterns and identify optimization potential. It produces a JournalDiagnosticReport displayed in a monospace text dialog.

6 diagnostic queries COUNT(*) active events GROUP BY entity_fingerprint → buckets BLOB presence: geometry_wkb / new_geom Geometry duplicates (old == new) field_schema_json: bytes vs distinct INSERT+DELETE pairs + invalidated JournalDiagnosticReport total_events / total_entities entity_distribution: {1, 2-5, 6-10, 11-50, 51+} blob_geom_count / blob_geom_null_count blob_new_geom_count / blob_new_geom_null_count geom_duplicate_count / geom_duplicate_bytes_saved schema_json_total / distinct_count / dedup_bytes invalidated_count / insert_delete_pairs elapsed_ms: query duration (< 500ms typical)

Metrics explained

MetricMeaningOptimization signal
Entity distributionHow many events per entity. “51+” bucket means entities with long update chains.High count in 51+ → UPDATE coalescing would save space (BL-OPT-01)
Geometry duplicatesEvents where geometry_wkb == new_geometry_wkb (attribute-only UPDATE stored both BLOBs)Many duplicates → skip BLOB on attr-only UPDATEs (BL-OPT-02)
Schema JSON redundancyTotal bytes in field_schema_json vs. distinct schemasHigh dedup savings → normalize to lookup table (BL-OPT-03)
Invalidated tracesRestore trace events already marked invalid> 0 → auto-purge them during maintenance (BL-OPT-07)
INSERT+DELETE pairsSame entity created and deleted in same session> 0 → offline compaction can annul them (BL-OPT-04)

Thread safety

The maintenance dialog interacts with three connection types. Each has clear threading constraints.

Read connection create_read_connection() Stats, integrity, diagnostics Write connection get_connection() Purge (batched DELETEs) Vacuum connection sqlite3.connect(path) Dedicated thread + lock Threading rules Reads: any thread (WAL allows concurrent readers) Writes: main thread only (journal.get_connection) · VACUUM: dedicated daemon thread with mutex

Invariants

The maintenance system guarantees these properties at all times:

Zero implicit loss
Every deletion is traced

No event is deleted without (a) user confirmation or (b) explicit auto-purge policy. Purge count is logged with flog() at INFO level.

Crash recovery
Pending file survives

If a crash occurs mid-write, unsaved events survive in recoverland_pending.json and are recovered at next startup.

No lock starvation
Batched operations

Purge in 5k batches. VACUUM in dedicated thread with non-blocking lock acquire. WriteQueue can always interleave.

Consistent backup
sqlite3.backup() API

Export is a page-level consistent snapshot. No manual file copy. No WAL corruption risk.

Module map: journal_maintenance.py (UI dialog) → core/retention.py (purge + VACUUM) · core/integrity.py (check + recovery) · core/health_monitor.py (thresholds + messages) · core/journal_diagnostics.py (analysis report)