"how numbers are stored and used in computers"
A tuple is the fundamental representation of a row in a PostgreSQL table. Internally, each tuple is stored as a fixed-format data structure called a heap tuple in the table's heap file. While the term "row" is commonly used in SQL, "tuple" is used in PostgreSQL internals to describe the on-disk and in-memory structure that encapsulates both the data and the metadata needed to support Multi-Version Concurrency Control, indexing, and transaction visibility.
Each tuple contains several fields beyond the user-visible data. Crucial among these are the transaction ID fields—xmin
and xmax
—which indicate the inserting and deleting transactions, respectively. These fields are used to determine whether a tuple should be visible to a querying transaction. For instance, a tuple is visible if the querying transaction started after the inserting transaction (xmin
) committed, and either no deleting transaction (xmax
) is set or it belongs to a transaction that hasn’t yet committed. These checks allow PostgreSQL to present a consistent snapshot of the database to every transaction.
A tuple also stores command identifiers (cmin
, cmax
) which help determine visibility of changes within a single transaction, particularly when a row is modified multiple times. These fields are essential in maintaining correct snapshot isolation semantics, ensuring, for example, that a SELECT within a transaction does not see a tuple it has just deleted or updated unless explicitly allowed.
The user-defined columns of the row are stored immediately after the metadata fields, but not necessarily in a fixed layout. PostgreSQL uses a technique called TOAST (The Oversized-Attribute Storage Technique) to store large values such as big text or bytea fields outside the main heap tuple, replacing them with references to external TOAST tables. This helps keep tuples compact and avoids bloating the main table with large inline values.
Tuples also include system columns like ctid
, which uniquely identifies the physical location of the row within the table by indicating the block number and offset. The ctid
is particularly useful during updates, since PostgreSQL implements updates as delete-insert operations, and the ctid
allows indexes and foreign keys to point to the correct row version, even after modification.
Because tuples are never updated in place, old versions of tuples accumulate over time. These become dead tuples when they are no longer visible to any active transaction. Dead tuples still occupy disk space and can impact sequential scans and index efficiency. PostgreSQL's VACUUM
process is responsible for identifying and cleaning up these dead tuples to reclaim space and maintain performance.
From the perspective of indexes, tuples are not stored directly but are referenced via pointers like the ItemPointerData
, which contains the ctid
of the tuple. Index tuples also store visibility information to support index-only scans, which are enabled by the visibility map that tracks whether all tuples on a page are visible to all transactions.
In sum, tuples in PostgreSQL are much more than simple containers for row data. They are carefully structured records that encapsulate transaction metadata, system information, and the actual data payload in a layout optimized for MVCC, transactional integrity, and extensibility. A deep understanding of tuples is fundamental for anyone working with PostgreSQL internals, especially when diagnosing performance issues, interpreting execution plans, or developing custom storage extensions.