"how numbers are stored and used in computers"
The Write-Ahead Log (WAL) is the core mechanism that ensures durability and crash recovery (the "D" in ACID). WAL records every change made to the database's data files in a sequential, append-only log before those changes are written to the actual heap or index storage. This logging strategy guarantees that, in the event of a crash or power failure, PostgreSQL can recover a consistent state by replaying committed changes from the WAL.
The WAL is stored as a sequence of 16MB segment files (pg_wal/
directory), each consisting of a series of records that describe low-level operations like tuple insertion, page splits in B-trees, or visibility hint updates. Every WAL record is tagged with a Log Sequence Number (LSN)—a byte-level address in the WAL stream. Each data page in the heap or index also stores the LSN of the last WAL record that modified it. This coupling allows PostgreSQL to detect whether a data page has been fully flushed to disk or if it needs to be reconstructed during recovery.
WAL provides a critical optimization: it allows deferred and batched flushing of data pages. Rather than writing each modified page immediately to disk, PostgreSQL can buffer and coalesce writes, improving throughput. The only requirement is that the corresponding WAL entries are flushed to disk (fsync
) before a transaction is considered committed. This is known as the WAL rule: "Write the log before the data." PostgreSQL's commit
operation therefore involves ensuring the WAL records for the transaction are flushed to disk, which is significantly cheaper than writing all modified data pages synchronously.
WAL also plays a central role in PostgreSQL’s crash recovery process. When the server starts after a crash, it performs redo recovery by scanning the WAL for unflushed changes and reapplying them to data pages that were not yet persisted. Because all changes are idempotent (safe to replay), this process reliably restores the database to a consistent state as of the last committed transaction. PostgreSQL does not use undo logs, so rollback is managed via MVCC and visibility rules rather than reversing changes during recovery.
The WAL system is also the foundation of replication and point-in-time recovery (PITR). In streaming replication, a primary server ships WAL records in near real-time to one or more replicas, which apply the changes to maintain an up-to-date copy of the database. PITR uses periodic base backups combined with WAL archives to restore the database to any specific moment in time, providing strong support for disaster recovery.
To manage WAL growth, PostgreSQL includes mechanisms such as WAL archiving, WAL recycling, and checkpointing. Checkpoints are periodic events where all dirty pages in memory are flushed to disk, and the system writes a checkpoint record to the WAL. This record serves as a recovery marker; during crash recovery, PostgreSQL can start scanning the WAL from the last checkpoint rather than the beginning of the log. Tuning checkpoint frequency and WAL segment retention is a key performance and reliability consideration for administrators.
Advanced features like logical replication and logical decoding also build on the WAL. Logical decoding reads WAL entries and converts them into a higher-level stream of logical changes (e.g., INSERT/UPDATE/DELETE operations), enabling fine-grained replication and change data capture. This capability is crucial for use cases like cross-version upgrades, audit logging, and data pipelines.
In essence, WAL in PostgreSQL is a high-performance, append-only write log that ensures data durability, enables crash recovery, and powers replication. It separates the act of making changes from the act of persisting them, allowing PostgreSQL to batch disk I/O while still providing strong transactional guarantees. Understanding WAL is fundamental to mastering PostgreSQL reliability, backup strategies, high availability, and performance tuning.