AmtocSoft Tech Insights: Database Transactions in 2026: ACID, Isolation Levels, MVCC, and Deadlock Prevention

Friday, April 17, 2026

Database Transactions in 2026: ACID, Isolation Levels, MVCC, and Deadlock Prevention

Introduction

Transaction bugs are among the most insidious failures in production systems. They almost never appear in your local test environment. They don't produce clean stack traces. They surface as subtle data corruption — a bank balance that is off by the exact amount of a concurrent withdrawal, a cart that charges the same order twice during a flash sale, an inventory count that goes negative despite a CHECK constraint. By the time you notice something is wrong, the transaction that caused the issue finished hours ago and left no obvious trace.

What makes this category of bug especially dangerous is its concurrency dependency. A function that transfers money between two accounts can pass a thousand unit tests and still corrupt data the moment two users call it at the same millisecond. You need real concurrent load to trigger these anomalies, and real concurrent load typically means production.

The underlying mechanics that prevent these bugs — ACID properties, isolation levels, and Multi-Version Concurrency Control — have been part of relational databases for decades. But knowing that "ACID means reliable" is not enough to write correct concurrent code. You need to understand exactly what each ACID property guarantees, what it does not guarantee, and what happens when you trade one guarantee for another.

Isolation levels, in particular, are widely misunderstood. Most developers know their database has them. Few know which anomalies each level actually prevents. Fewer still understand that PostgreSQL's Read Committed (the default) lets non-repeatable reads happen, or that Serializable uses Snapshot Isolation rather than actual serial execution.

MVCC — the mechanism PostgreSQL uses to implement isolation — explains behaviors that otherwise seem paradoxical: why reads never block writes, why VACUUM is a correctness requirement rather than optional maintenance, and why you can observe a row version that was deleted by another transaction without holding any lock.

This post works through all of it with concrete examples. Every anomaly is demonstrated with actual SQL. Every pattern has working code you can run against a real PostgreSQL instance. The goal is not a survey — it is a working mental model you can apply to every concurrent data access pattern you write.

1. ACID: What Each Property Actually Means in Practice

ACID is four words that each hide a large amount of implementation complexity. Here is what each property actually guarantees — and what it does not.

Atomicity: All or Nothing, Not Eventually Consistent Parts

Atomicity means a transaction either commits completely or has zero effect. If your BEGIN ... COMMIT block contains twenty SQL statements and statement fifteen fails, none of the first fourteen persist. The database rolls back to the state before BEGIN.

This is not the same as eventual consistency. Atomicity is a synchronous, binary guarantee. There is no "partial commit that will be cleaned up later." If you ROLLBACK — whether explicitly or because the connection drops — every change in that transaction disappears.

The classic bank transfer shows why this matters:

-- Without atomicity, a crash here causes permanent inconsistency
BEGIN;
  UPDATE accounts SET balance = balance - 500 WHERE id = 1;
  -- If the process crashes HERE, account 1 has lost $500 and account 2 gained nothing
  UPDATE accounts SET balance = balance + 500 WHERE id = 2;
COMMIT;

PostgreSQL implements atomicity through its Write-Ahead Log (WAL). Before any data page is modified on disk, a WAL record describing the change is written and flushed. If the server crashes mid-transaction, recovery replays the WAL — but only for committed transactions. Uncommitted WAL records are ignored during recovery.

Consistency: Application Invariants Are Your Responsibility

Consistency is the most misunderstood ACID property. The database enforces structural constraints: NOT NULL, CHECK, UNIQUE, FOREIGN KEY. It will not let you insert a row that violates a foreign key or a balance that violates a CHECK constraint.

But business rule consistency is your code's job. If you have the invariant "a user's total order value must never exceed their credit limit," the database will not enforce that automatically. Your transaction logic must read the current total, compute the new total, verify it does not exceed the limit, and only then insert the order — inside a single transaction with sufficient isolation.

This distinction matters because it defines where bugs live. A database can only be as consistent as the invariants you express. Leaving a business invariant unenforced in code and trusting the database to catch it is a common source of data corruption.

Isolation: Concurrent Transactions Should Not Interfere

Isolation is the property that directly governs concurrency bugs. Ideally, each transaction executes as if it were the only transaction in the system. Realistically, enforcing full isolation has a performance cost, so databases expose isolation levels that let you trade isolation for throughput.

The degree of isolation determines which concurrency anomalies are possible. We will cover these in detail in Section 2. For now: the default PostgreSQL isolation level — Read Committed — prevents dirty reads but allows non-repeatable reads. Most of the subtle bugs described in this post are permitted at Read Committed.

Durability: What "Committed" Actually Means

Durability means once the database returns success from a COMMIT, the data survives any subsequent crash. This sounds obvious but has implementation consequences.

PostgreSQL's durability guarantee depends on fsync. When a transaction commits, PostgreSQL calls fsync() to flush WAL records to disk before returning to the client. If fsync is disabled (a dangerous but sometimes-seen optimization), a crash can lose committed transactions.

"Committed" means the WAL record reached durable storage. It does not mean the data is immediately visible in the heap files — PostgreSQL may not have flushed the actual data pages yet. Recovery will replay the WAL and reconstruct those pages if needed.

ACID vs BASE: An Intentional Trade-off

BASE (Basically Available, Soft state, Eventually consistent) is not a degraded version of ACID — it is a different contract chosen for different workload profiles. Systems like Cassandra or DynamoDB choose availability and partition tolerance over consistency. This is appropriate when you need to write across multiple data centers and can tolerate briefly stale reads.

The problem is when teams accept eventual consistency by accident — using a message queue where they needed a synchronous transaction, or reading from a replica without understanding replication lag. The choice between ACID and BASE should be explicit, documented, and made with full understanding of which anomalies become possible.

Architecture

The diagram below shows two concurrent transactions — one with isolation enforced and one without — and how their intermediate states interact:

sequenceDiagram participant T1 as Transaction 1 (Transfer $500) participant DB as PostgreSQL participant T2 as Transaction 2 (Read Balance) Note over T1,T2: WITHOUT ISOLATION (No Transaction) T1->>DB: UPDATE accounts SET balance = balance - 500 WHERE id=1 T2->>DB: SELECT balance FROM accounts WHERE id=1 Note over T2: Reads intermediate state: $500 missing, not yet added to id=2 T1->>DB: UPDATE accounts SET balance = balance + 500 WHERE id=2 Note over T1,T2: WITH ISOLATION (Serializable) T1->>DB: BEGIN T1->>DB: UPDATE accounts SET balance = balance - 500 WHERE id=1 T2->>DB: BEGIN T2->>DB: SELECT balance FROM accounts WHERE id=1 Note over T2: Reads pre-transaction snapshot — sees original $500 balance T1->>DB: UPDATE accounts SET balance = balance + 500 WHERE id=2 T1->>DB: COMMIT T2->>DB: COMMIT Note over T2: Now sees committed post-transfer balance

2. Isolation Levels and the Anomalies They Permit

PostgreSQL implements four isolation levels defined by the SQL standard, though Read Uncommitted is treated identically to Read Committed internally.

Read Uncommitted: Dirty Reads

A dirty read occurs when transaction T2 reads a row that T1 has modified but not yet committed. If T1 rolls back, T2 has read data that never officially existed.

PostgreSQL does not implement dirty reads. Even at READ UNCOMMITTED, it falls back to Read Committed behavior. However, this anomaly exists in MySQL and SQL Server and is the reason READ UNCOMMITTED should essentially never be used.

Read Committed: Non-Repeatable Reads

PostgreSQL's default. Each statement within a transaction sees a fresh snapshot of committed data. This means:

No dirty reads (you only see committed data).
Non-repeatable reads are possible: if you SELECT balance twice in the same transaction, and another transaction commits a change between your two reads, you see different values.

-- Session 1
BEGIN;
SELECT balance FROM accounts WHERE id = 1;
-- Returns: 1000

-- Session 2 (runs concurrently)
BEGIN;
UPDATE accounts SET balance = 500 WHERE id = 1;
COMMIT;

-- Session 1 (still in same transaction)
SELECT balance FROM accounts WHERE id = 1;
-- Returns: 500  ← different value! Non-repeatable read.
COMMIT;

This is the anomaly that breaks the "read then decide" pattern. If your transaction reads a value to make a business decision, and that value can change before you act on it, your decision is based on stale data.

Repeatable Read: Phantom Reads

At REPEATABLE READ, PostgreSQL takes a snapshot at the start of the transaction (not per-statement as in Read Committed). The same SELECT returns the same rows no matter how many times you run it — but new rows inserted by other transactions can still appear.

-- Session 1
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SELECT COUNT(*) FROM orders WHERE user_id = 42;
-- Returns: 5

-- Session 2 (runs concurrently)
BEGIN;
INSERT INTO orders (user_id, amount) VALUES (42, 99.99);
COMMIT;

-- Session 1 (still in same transaction)
SELECT COUNT(*) FROM orders WHERE user_id = 42;
-- At REPEATABLE READ in PostgreSQL: returns 5 (snapshot isolation prevents phantom)
-- Note: PostgreSQL's REPEATABLE READ actually prevents phantom reads too,
-- but write skew is still possible (see Serializable section)
COMMIT;

Note: PostgreSQL's implementation of Repeatable Read is actually stronger than the SQL standard requires — it uses snapshot isolation which incidentally prevents phantom reads. However, write skew anomalies remain possible.

Serializable: Full Isolation

PostgreSQL uses Serializable Snapshot Isolation (SSI) — a technique that detects when a set of concurrent transactions would produce a result impossible under any serial execution, and aborts one of them.

-- Classic write skew: two doctors both check if someone else is on call
-- Session 1
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT COUNT(*) FROM on_call WHERE shift_id = 1;
-- Returns: 2 (two doctors on call)
-- Decides: it's safe to go off call since 2 > 1
UPDATE on_call SET status = 'off' WHERE doctor_id = 101 AND shift_id = 1;
COMMIT; -- May fail with: ERROR: could not serialize access

-- Session 2 (concurrent)
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT COUNT(*) FROM on_call WHERE shift_id = 1;
-- Also returns: 2
-- Also decides: safe to go off call
UPDATE on_call SET status = 'off' WHERE doctor_id = 102 AND shift_id = 1;
COMMIT; -- One of these commits will succeed; the other will fail

SSI tracks read/write dependencies between transactions. When it detects a cycle (T1 read what T2 will write, T2 read what T1 will write), one transaction is aborted with a serialization failure. Your application must be prepared to retry.

Anomaly Matrix

Choosing the right level: For most CRUD web applications, Read Committed is appropriate. For financial operations — balance checks, inventory deductions, anything with a "check then act" pattern — use Serializable and handle serialization failures with retries. The performance overhead of SSI is typically 5-15% in write-heavy workloads and near-zero for read-heavy workloads.

3. MVCC: How PostgreSQL Implements Isolation Without Locking Reads

Multi-Version Concurrency Control is the mechanism that lets PostgreSQL give readers and writers independent progress without coordination. Readers never block writers. Writers never block readers. This is the fundamental reason PostgreSQL can maintain high read throughput under concurrent write load.

Row Versions: xmin and xmax

Every row in a PostgreSQL heap has two hidden system columns:

xmin: the transaction ID that inserted this row version.
xmax: the transaction ID that deleted or updated this row version (zero if still live).

When you UPDATE a row, PostgreSQL does not modify the existing row. It marks the old row version with xmax = current_transaction_id and inserts a new row version with xmin = current_transaction_id. The old version remains on the heap until VACUUM removes it.

You can observe this directly:

-- Create a test table and observe xmin/xmax
CREATE TABLE mvcc_demo (id int, val text);
INSERT INTO mvcc_demo VALUES (1, 'original');

-- See the row's transaction metadata
SELECT id, val, xmin, xmax FROM mvcc_demo;
-- xmin shows the transaction that inserted this row
-- xmax is 0 (row is live, not deleted/updated)

-- Now update the row in a transaction
BEGIN;
UPDATE mvcc_demo SET val = 'updated' WHERE id = 1;
-- In another session, query the table:
-- SELECT id, val, xmin, xmax FROM mvcc_demo;
-- You'll see BOTH the old row (xmax = current_txid) and new row (xmin = current_txid)
COMMIT;

-- After commit, VACUUM can remove the old row version
VACUUM mvcc_demo;

Snapshot Visibility

At transaction start (or statement start for Read Committed), PostgreSQL records:
1. The current transaction ID.
2. The set of all currently active (in-progress) transaction IDs.
3. The highest transaction ID assigned so far.

A row version is visible to a snapshot if:
- Its xmin committed before the snapshot was taken (and is not in the active set).
- Its xmax is either zero, or has not committed by the time the snapshot was taken.

This is why a long-running transaction always sees the same snapshot of the data, even as other transactions commit. It does not hold any locks on those rows — it simply does not consider newer committed versions visible.

VACUUM: A Correctness Requirement

Because old row versions accumulate on the heap, VACUUM is not optional maintenance — it is a correctness requirement for two reasons:

Space reclamation: Without VACUUM, heap files grow without bound as updates accumulate dead tuples.
Transaction ID wraparound: PostgreSQL uses 32-bit transaction IDs. After ~2 billion transactions, IDs wrap around. PostgreSQL prevents this by running aggressive autovacuum on tables approaching the wraparound threshold. If VACUUM is blocked for too long, PostgreSQL will stop accepting writes and demand manual VACUUM — a production incident.

-- Monitor dead tuple accumulation
SELECT relname, n_live_tup, n_dead_tup,
       round(n_dead_tup::numeric / nullif(n_live_tup + n_dead_tup, 0) * 100, 2) AS dead_pct,
       last_autovacuum
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;

HOT Updates: Avoiding Index Churn

If you update a column that is not indexed, PostgreSQL can use a Heap Only Tuple (HOT) update: the new row version is placed on the same heap page as the old version, and no index entry is created for the new version. The index still points to the old row, which has a pointer to the new version.

HOT updates significantly reduce write amplification for tables with many indexes. They only apply when the updated columns have no indexes and the new row fits on the same page.

flowchart LR subgraph snap["Transaction Snapshot at T=100"] SN["xmin_snapshot = 100\nactive_txids = {98, 99}\nmax_txid = 102"] end subgraph row1["Row Version A (xmin=95, xmax=101)"] R1["val='original'\nInserted by txid 95\nDeleted by txid 101"] end subgraph row2["Row Version B (xmin=101, xmax=0)"] R2["val='updated'\nInserted by txid 101\nStill live"] end subgraph visibility["Visibility Check for Snapshot T=100"] V1["Row A: xmin=95 committed before snapshot? YES\nxmax=101 committed before snapshot? NO (101 > 100)\nResult: VISIBLE"] V2["Row B: xmin=101 committed before snapshot? NO (101 > 100)\nResult: NOT VISIBLE"] end snap --> visibility row1 --> V1 row2 --> V2

4. Deadlocks: Detection, Prevention, and Patterns

A deadlock occurs when two transactions are each waiting for a lock held by the other. Neither can proceed. Left unresolved, both would wait forever.

PostgreSQL Deadlock Detection

PostgreSQL's lock manager detects deadlocks by periodically checking the wait-for graph. The detection interval is controlled by deadlock_timeout (default: 1 second). When a cycle is detected, PostgreSQL cancels one transaction (the "victim") with:

ERROR: deadlock detected
DETAIL: Process 12345 waits for ShareLock on transaction 67890; blocked by process 11111.
        Process 11111 waits for ShareLock on transaction 12345; blocked by process 12345.
HINT: See server log for query details.

The victim transaction receives a rollback. Your application must detect this error and retry.

Reproducing a Deadlock

-- Session 1
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;  -- Locks row id=1
-- (wait before running next statement)

-- Session 2 (run while Session 1 is waiting)
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 2;  -- Locks row id=2
UPDATE accounts SET balance = balance + 100 WHERE id = 1;  -- Waits for Session 1's lock on id=1

-- Session 1 (now run)
UPDATE accounts SET balance = balance + 100 WHERE id = 2;  -- Waits for Session 2's lock on id=2
-- Deadlock! PostgreSQL cancels one of the transactions.
COMMIT;

Deadlock Prevention: Consistent Lock Ordering

The canonical prevention technique is to always acquire locks in the same global order across all transactions. If every transfer always locks the lower account ID first, no cycle can form:

import psycopg2
from contextlib import contextmanager

@contextmanager
def get_connection(dsn: str):
    conn = psycopg2.connect(dsn)
    try:
        yield conn
        conn.commit()
    except Exception:
        conn.rollback()
        raise
    finally:
        conn.close()

def transfer_funds(dsn: str, from_id: int, to_id: int, amount: float) -> None:
    """
    Transfer funds between accounts using consistent lock ordering to prevent deadlocks.
    Always acquires locks in ascending account ID order regardless of transfer direction.
    """
    # Sort IDs to ensure consistent lock ordering
    lock_first, lock_second = sorted([from_id, to_id])

    with get_connection(dsn) as conn:
        with conn.cursor() as cur:
            # Lock both rows in consistent order using SELECT FOR UPDATE
            cur.execute("""
                SELECT id, balance FROM accounts
                WHERE id IN (%s, %s)
                ORDER BY id  -- Critical: order matches our lock_first/lock_second ordering
                FOR UPDATE
            """, (lock_first, lock_second))

            rows = {row[0]: row[1] for row in cur.fetchall()}

            if rows[from_id] < amount:
                raise ValueError(f"Insufficient funds: balance {rows[from_id]}, requested {amount}")

            cur.execute("UPDATE accounts SET balance = balance - %s WHERE id = %s", (amount, from_id))
            cur.execute("UPDATE accounts SET balance = balance + %s WHERE id = %s", (amount, to_id))

SKIP LOCKED: Queue Consumer Pattern

SELECT FOR UPDATE SKIP LOCKED is a powerful pattern for building work queues on top of PostgreSQL. Each consumer claims rows that no other consumer has locked, without waiting:

-- Queue consumer: atomically claim and process one job
-- SKIP LOCKED means: if a row is locked by another consumer, skip it entirely
-- This allows multiple consumers to work the queue concurrently without contention

WITH claimed AS (
    SELECT id, payload, created_at
    FROM job_queue
    WHERE status = 'pending'
    ORDER BY created_at
    LIMIT 1
    FOR UPDATE SKIP LOCKED  -- Skip rows locked by other consumers
)
UPDATE job_queue
SET status = 'processing', claimed_at = now(), worker_id = pg_backend_pid()
FROM claimed
WHERE job_queue.id = claimed.id
RETURNING job_queue.*;

import psycopg2
import json
import time

def consume_jobs(dsn: str, worker_id: str) -> None:
    """
    Queue consumer using SKIP LOCKED for contention-free concurrent processing.
    Multiple instances can run this function simultaneously without deadlocks.
    """
    conn = psycopg2.connect(dsn)

    while True:
        with conn.cursor() as cur:
            # Claim exactly one job atomically
            cur.execute("""
                WITH claimed AS (
                    SELECT id, payload
                    FROM job_queue
                    WHERE status = 'pending'
                    ORDER BY created_at
                    LIMIT 1
                    FOR UPDATE SKIP LOCKED
                )
                UPDATE job_queue
                SET status = 'processing',
                    claimed_at = now(),
                    worker_id = %s
                FROM claimed
                WHERE job_queue.id = claimed.id
                RETURNING job_queue.id, job_queue.payload
            """, (worker_id,))

            job = cur.fetchone()
            conn.commit()

            if job is None:
                time.sleep(0.1)  # No work available, backoff
                continue

            job_id, payload = job
            try:
                process_job(json.loads(payload))
                # Mark as complete
                cur.execute("UPDATE job_queue SET status = 'done' WHERE id = %s", (job_id,))
                conn.commit()
            except Exception as e:
                # Mark as failed, allow retry
                cur.execute("""
                    UPDATE job_queue
                    SET status = 'failed', error = %s
                    WHERE id = %s
                """, (str(e), job_id))
                conn.commit()

def process_job(payload: dict) -> None:
    # Application-specific job processing
    pass

NOWAIT: Fail Fast on Lock Contention

Use NOWAIT when you want immediate failure rather than waiting:

-- Fail immediately if the row is locked rather than waiting
SELECT * FROM accounts WHERE id = 1 FOR UPDATE NOWAIT;
-- If locked: ERROR: could not obtain lock on row in relation "accounts"

This is useful in user-facing APIs where waiting for a lock would cause unacceptable latency. Handle the lock error at the application layer and return an appropriate response.

5. Optimistic vs Pessimistic Concurrency Control

Choosing between optimistic and pessimistic locking is one of the most consequential architectural decisions in concurrent data access design.

Pessimistic Concurrency Control

Pessimistic control assumes conflict is likely. You acquire a lock before reading, hold it through processing, and release at commit. SELECT FOR UPDATE is the standard mechanism:

-- Pessimistic: lock the row before reading it
BEGIN;
SELECT balance FROM accounts WHERE id = 1 FOR UPDATE;
-- Other transactions trying to UPDATE this row will wait here
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
COMMIT;
-- Lock released at COMMIT

Pessimistic locking is correct for high-contention scenarios — financial operations, inventory deductions, anything where conflict is the common case. Its downside is serializing access: only one transaction can work on a row at a time.

Optimistic Concurrency Control

Optimistic control assumes conflict is rare. Read without locking. When you are ready to commit, verify that the data you read has not changed. If it has, abort and retry.

The standard implementation uses a version column:

-- Schema with version column
CREATE TABLE products (
    id BIGSERIAL PRIMARY KEY,
    name TEXT NOT NULL,
    stock INT NOT NULL,
    version INT NOT NULL DEFAULT 1
);

import psycopg2
import time
from typing import Optional

def deduct_stock_optimistic(
    dsn: str,
    product_id: int,
    quantity: int,
    max_retries: int = 5
) -> bool:
    """
    Optimistic locking with version column.
    Detects concurrent modification by checking that version hasn't changed.
    Returns True on success, raises after max_retries exhausted.
    """
    conn = psycopg2.connect(dsn)
    conn.autocommit = False

    for attempt in range(max_retries):
        try:
            with conn.cursor() as cur:
                # Step 1: Read current state without locking
                cur.execute(
                    "SELECT stock, version FROM products WHERE id = %s",
                    (product_id,)
                )
                row = cur.fetchone()
                if row is None:
                    raise ValueError(f"Product {product_id} not found")

                current_stock, current_version = row

                if current_stock < quantity:
                    raise ValueError(f"Insufficient stock: have {current_stock}, need {quantity}")

                # Step 2: Update, but only if version still matches what we read
                # If another transaction modified the row, version will have changed
                # and this UPDATE will affect 0 rows — our signal to retry
                cur.execute("""
                    UPDATE products
                    SET stock = stock - %s,
                        version = version + 1
                    WHERE id = %s
                      AND version = %s  -- Optimistic lock check
                """, (quantity, product_id, current_version))

                rows_affected = cur.rowcount

                if rows_affected == 0:
                    # Version mismatch: another transaction modified the row
                    # Rollback and retry
                    conn.rollback()
                    backoff = 0.01 * (2 ** attempt)  # Exponential backoff
                    time.sleep(backoff)
                    continue

                conn.commit()
                return True

        except psycopg2.Error:
            conn.rollback()
            raise

    raise RuntimeError(f"Failed to deduct stock after {max_retries} attempts")

When to Use Each

Scenario	Recommendation	Reason
Financial transfers	Pessimistic (SELECT FOR UPDATE)	Conflict is common; retry cost is high
Inventory deduction (flash sale)	Pessimistic with SKIP LOCKED	High contention during peak load
Profile updates	Optimistic (version column)	Low contention; clean UX without lock waits
Configuration changes	Optimistic	Rare writes, easy to retry
Long-running transactions	Optimistic	Holding locks for seconds causes cascading waits
Batch data processing	Pessimistic with NOWAIT	Fail-fast is preferable to queuing behind slow batches

Optimistic locking achieves higher throughput when contention is low because no lock traffic touches the lock manager. Under high contention, it degrades badly — every transaction retries constantly, and the effective throughput drops below pessimistic locking.

6. Distributed Transactions

Distributed transactions are qualitatively harder than single-database transactions. When your operation spans two databases — or a database and a message queue — you lose the atomicity guarantee that a single BEGIN ... COMMIT provides.

Why You Cannot Just BEGIN in Two Databases

Consider an order placement that must:
1. Deduct inventory from a PostgreSQL database.
2. Publish an order_placed event to a Kafka topic.

There is no single COMMIT that covers both. If you write to PostgreSQL and then Kafka fails, your inventory is decremented but no order event was published. If Kafka succeeds and PostgreSQL crashes, the event exists but the inventory is not decremented.

Two-Phase Commit (2PC)

2PC is the classic solution: a coordinator sends PREPARE to all participants, waits for all to acknowledge readiness, then sends COMMIT or ROLLBACK. PostgreSQL supports 2PC natively via PREPARE TRANSACTION.

-- Coordinator: Phase 1 (Prepare)
BEGIN;
UPDATE inventory SET stock = stock - 1 WHERE product_id = 42;
PREPARE TRANSACTION 'order_12345';  -- Row is locked until COMMIT PREPARED or ROLLBACK PREPARED

-- Coordinator: Phase 2 (Commit, after all participants prepared)
COMMIT PREPARED 'order_12345';

-- Or rollback if any participant failed
ROLLBACK PREPARED 'order_12345';

2PC has a critical failure mode: if the coordinator crashes after sending PREPARE but before sending COMMIT, participants remain in a prepared state holding locks — indefinitely, until an operator resolves the situation. For this reason, 2PC is often avoided in favor of the Saga pattern.

The Outbox Pattern: Atomic Database + Message Queue

The outbox pattern solves the database/queue atomicity problem without 2PC. The key insight: write the event into an outbox table in the same transaction as your business data change. A separate relay process reads the outbox and publishes to the queue. If the relay crashes, it replays from the outbox. If the database transaction rolls back, the outbox row is never written.

-- Outbox table schema
CREATE TABLE outbox (
    id BIGSERIAL PRIMARY KEY,
    aggregate_type TEXT NOT NULL,      -- e.g., 'order'
    aggregate_id TEXT NOT NULL,        -- e.g., order UUID
    event_type TEXT NOT NULL,          -- e.g., 'order_placed'
    payload JSONB NOT NULL,
    created_at TIMESTAMPTZ DEFAULT now(),
    published_at TIMESTAMPTZ,          -- NULL = not yet published
    sequence_no BIGSERIAL              -- For ordered delivery
);

import psycopg2
import json
from datetime import datetime

def place_order(dsn: str, user_id: int, product_id: int, quantity: int) -> str:
    """
    Place an order using the outbox pattern.
    The order record and the outbox event are written in the same transaction.
    If this transaction commits, the event will eventually be published.
    If it rolls back, neither the order nor the event persists.
    """
    import uuid
    order_id = str(uuid.uuid4())

    conn = psycopg2.connect(dsn)
    conn.autocommit = False

    try:
        with conn.cursor() as cur:
            # Business operation 1: deduct inventory
            cur.execute("""
                UPDATE inventory
                SET stock = stock - %s
                WHERE product_id = %s AND stock >= %s
            """, (quantity, product_id, quantity))

            if cur.rowcount == 0:
                raise ValueError("Insufficient inventory")

            # Business operation 2: create order record
            cur.execute("""
                INSERT INTO orders (id, user_id, product_id, quantity, status, created_at)
                VALUES (%s, %s, %s, %s, 'pending', now())
            """, (order_id, user_id, product_id, quantity))

            # Outbox entry: written in SAME transaction as business data
            # This is the atomicity guarantee — both commit or neither does
            event_payload = {
                "order_id": order_id,
                "user_id": user_id,
                "product_id": product_id,
                "quantity": quantity,
                "timestamp": datetime.utcnow().isoformat()
            }
            cur.execute("""
                INSERT INTO outbox (aggregate_type, aggregate_id, event_type, payload)
                VALUES ('order', %s, 'order_placed', %s)
            """, (order_id, json.dumps(event_payload)))

        conn.commit()
        return order_id

    except Exception:
        conn.rollback()
        raise
    finally:
        conn.close()


def relay_outbox_to_kafka(dsn: str, kafka_producer) -> int:
    """
    Relay worker: reads unpublished outbox events and publishes them to Kafka.
    Uses SKIP LOCKED so multiple relay workers can run without contention.
    Idempotency: records published_at to prevent double-publishing.
    Returns count of events published.
    """
    conn = psycopg2.connect(dsn)
    conn.autocommit = False
    published_count = 0

    try:
        with conn.cursor() as cur:
            # Claim a batch of unpublished events
            cur.execute("""
                SELECT id, aggregate_type, aggregate_id, event_type, payload
                FROM outbox
                WHERE published_at IS NULL
                ORDER BY sequence_no
                LIMIT 100
                FOR UPDATE SKIP LOCKED
            """)
            events = cur.fetchall()

            for event_id, agg_type, agg_id, event_type, payload in events:
                # Publish to Kafka (outside the transaction)
                kafka_producer.produce(
                    topic=f"{agg_type}.events",
                    key=agg_id,
                    value=json.dumps({
                        "event_type": event_type,
                        "payload": payload
                    })
                )
                kafka_producer.flush()

                # Mark as published — only after Kafka confirms receipt
                cur.execute("""
                    UPDATE outbox
                    SET published_at = now()
                    WHERE id = %s
                """, (event_id,))

                published_count += 1

        conn.commit()
        return published_count

    except Exception:
        conn.rollback()
        raise
    finally:
        conn.close()

The outbox pattern delivers at-least-once semantics: if the relay crashes after publishing to Kafka but before marking published_at, it will republish on restart. Consumers must handle duplicate events (typically via idempotency keys).

Saga Pattern for Long-Running Operations

When an operation spans multiple services and cannot be wrapped in a single transaction, the Saga pattern models it as a sequence of local transactions, each with a compensating transaction that undoes its effect if a later step fails.

# Saga: place order across three services
# Each step has a compensating action

def order_saga(order_request: dict) -> None:
    steps_completed = []

    try:
        # Step 1: Reserve inventory
        inventory_reservation = reserve_inventory(order_request["product_id"], order_request["quantity"])
        steps_completed.append(("inventory", inventory_reservation["reservation_id"]))

        # Step 2: Charge payment
        payment = charge_payment(order_request["user_id"], order_request["total"])
        steps_completed.append(("payment", payment["charge_id"]))

        # Step 3: Create shipment
        shipment = create_shipment(order_request)
        steps_completed.append(("shipment", shipment["shipment_id"]))

    except Exception as e:
        # Compensate in reverse order
        for service, resource_id in reversed(steps_completed):
            if service == "inventory":
                release_inventory_reservation(resource_id)
            elif service == "payment":
                refund_payment(resource_id)
            elif service == "shipment":
                cancel_shipment(resource_id)
        raise

Sagas are eventually consistent — there is a window during which the saga is partially applied. Design your system so that partially-applied states are either invisible to users or explicitly surfaced (e.g., "order pending" rather than immediate confirmation).

Conclusion

Database transactions are not a checkbox feature — they are a precise contract between your application and the database about what anomalies are and are not possible. The default settings (Read Committed isolation, autocommit off) are sensible for most applications, but "most" is not "all."

The practical takeaways:

Choose isolation levels deliberately. Read Committed is correct for simple read-and-write operations, but any "read the current state, then decide" pattern needs at minimum Repeatable Read or Serializable. Add BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE to financial operations and be prepared to retry on serialization failure.

Understand what MVCC costs. Dead tuple accumulation is not a minor housekeeping concern — it affects query performance (table bloat), index bloat, and eventually transaction ID wraparound. Monitor n_dead_tup in pg_stat_user_tables. Ensure autovacuum is not chronically blocked by long-running transactions.

Practice deadlock prevention as discipline, not reaction. Consistent lock ordering is a team convention that should be enforced in code review. A single transaction that acquires locks in a different order from every other transaction is a latent deadlock waiting for the right concurrent load. The SKIP LOCKED pattern for queue consumers eliminates an entire class of contention by design.

Avoid distributed transactions where possible. The outbox pattern covers the most common cross-system consistency requirement (database + message queue) without 2PC's coordinator failure modes. For multi-service operations, sagas with compensating transactions are the pragmatic choice — they trade strict consistency for resilience and explicit failure handling.

Transaction correctness is not an advanced concern. It is a foundational property of any system that handles money, inventory, or any resource with finite supply. The cost of getting it right is a few hours of careful design. The cost of getting it wrong is production data corruption that may take days to discover and weeks to fully understand.

Sources

PostgreSQL Documentation — Transaction Isolation
PostgreSQL Documentation — MVCC
PostgreSQL Documentation — Explicit Locking
PostgreSQL Documentation — PREPARE TRANSACTION
Chris Richardson — Transactional Outbox Pattern (microservices.io)
Ports, Cahill et al. — "Serializable Snapshot Isolation in PostgreSQL" (VLDB 2012)

About the Author

Toc Am

Founder of AmtocSoft. Writing practical deep-dives on AI engineering, cloud architecture, and developer tooling. Previously built backend systems at scale. Reviews every post published under this byline.

LinkedIn X / Twitter

Published: 2026-06-17 · Updated: 2026-04-18 · Written with AI assistance, reviewed by Toc Am.

Get These In Your Inbox

Weekly deep-dives on AI engineering, no fluff. Join the newsletter →

Subscribe (free)

Or grab the book ($39, ~100 pages) · Buy me a coffee

☕ Buy Me a Coffee · 🔔 YouTube · 💼 LinkedIn · 🐦 X/Twitter