Database Transactions in 2026: ACID, Isolation Levels, MVCC, and Deadlock Prevention
Database Transactions in 2026: ACID, Isolation Levels, MVCC, and Deadlock Prevention

Introduction
Transaction bugs are among the most insidious failures in production systems. They almost never appear in your local test environment. They don't produce clean stack traces. They surface as subtle data corruption — a bank balance that is off by the exact amount of a concurrent withdrawal, a cart that charges the same order twice during a flash sale, an inventory count that goes negative despite a CHECK constraint. By the time you notice something is wrong, the transaction that caused the issue finished hours ago and left no obvious trace.
What makes this category of bug especially dangerous is its concurrency dependency. A function that transfers money between two accounts can pass a thousand unit tests and still corrupt data the moment two users call it at the same millisecond. You need real concurrent load to trigger these anomalies, and real concurrent load typically means production.
The underlying mechanics that prevent these bugs — ACID properties, isolation levels, and Multi-Version Concurrency Control — have been part of relational databases for decades. But knowing that "ACID means reliable" is not enough to write correct concurrent code. You need to understand exactly what each ACID property guarantees, what it does not guarantee, and what happens when you trade one guarantee for another.
Isolation levels, in particular, are widely misunderstood. Most developers know their database has them. Few know which anomalies each level actually prevents. Fewer still understand that PostgreSQL's Read Committed (the default) lets non-repeatable reads happen, or that Serializable uses Snapshot Isolation rather than actual serial execution.
MVCC — the mechanism PostgreSQL uses to implement isolation — explains behaviors that otherwise seem paradoxical: why reads never block writes, why VACUUM is a correctness requirement rather than optional maintenance, and why you can observe a row version that was deleted by another transaction without holding any lock.
This post works through all of it with concrete examples. Every anomaly is demonstrated with actual SQL. Every pattern has working code you can run against a real PostgreSQL instance. The goal is not a survey — it is a working mental model you can apply to every concurrent data access pattern you write.
1. ACID: What Each Property Actually Means in Practice
ACID is four words that each hide a large amount of implementation complexity. Here is what each property actually guarantees — and what it does not.
Atomicity: All or Nothing, Not Eventually Consistent Parts
Atomicity means a transaction either commits completely or has zero effect. If your BEGIN ... COMMIT block contains twenty SQL statements and statement fifteen fails, none of the first fourteen persist. The database rolls back to the state before BEGIN.
This is not the same as eventual consistency. Atomicity is a synchronous, binary guarantee. There is no "partial commit that will be cleaned up later." If you ROLLBACK — whether explicitly or because the connection drops — every change in that transaction disappears.
The classic bank transfer shows why this matters:
-- Without atomicity, a crash here causes permanent inconsistency
BEGIN;
UPDATE accounts SET balance = balance - 500 WHERE id = 1;
-- If the process crashes HERE, account 1 has lost $500 and account 2 gained nothing
UPDATE accounts SET balance = balance + 500 WHERE id = 2;
COMMIT;
PostgreSQL implements atomicity through its Write-Ahead Log (WAL). Before any data page is modified on disk, a WAL record describing the change is written and flushed. If the server crashes mid-transaction, recovery replays the WAL — but only for committed transactions. Uncommitted WAL records are ignored during recovery.
Consistency: Application Invariants Are Your Responsibility
Consistency is the most misunderstood ACID property. The database enforces structural constraints: NOT NULL, CHECK, UNIQUE, FOREIGN KEY. It will not let you insert a row that violates a foreign key or a balance that violates a CHECK constraint.
But business rule consistency is your code's job. If you have the invariant "a user's total order value must never exceed their credit limit," the database will not enforce that automatically. Your transaction logic must read the current total, compute the new total, verify it does not exceed the limit, and only then insert the order — inside a single transaction with sufficient isolation.
This distinction matters because it defines where bugs live. A database can only be as consistent as the invariants you express. Leaving a business invariant unenforced in code and trusting the database to catch it is a common source of data corruption.
Isolation: Concurrent Transactions Should Not Interfere
Isolation is the property that directly governs concurrency bugs. Ideally, each transaction executes as if it were the only transaction in the system. Realistically, enforcing full isolation has a performance cost, so databases expose isolation levels that let you trade isolation for throughput.
The degree of isolation determines which concurrency anomalies are possible. We will cover these in detail in Section 2. For now: the default PostgreSQL isolation level — Read Committed — prevents dirty reads but allows non-repeatable reads. Most of the subtle bugs described in this post are permitted at Read Committed.
Durability: What "Committed" Actually Means
Durability means once the database returns success from a COMMIT, the data survives any subsequent crash. This sounds obvious but has implementation consequences.
PostgreSQL's durability guarantee depends on fsync. When a transaction commits, PostgreSQL calls fsync() to flush WAL records to disk before returning to the client. If fsync is disabled (a dangerous but sometimes-seen optimization), a crash can lose committed transactions.
"Committed" means the WAL record reached durable storage. It does not mean the data is immediately visible in the heap files — PostgreSQL may not have flushed the actual data pages yet. Recovery will replay the WAL and reconstruct those pages if needed.
ACID vs BASE: An Intentional Trade-off
BASE (Basically Available, Soft state, Eventually consistent) is not a degraded version of ACID — it is a different contract chosen for different workload profiles. Systems like Cassandra or DynamoDB choose availability and partition tolerance over consistency. This is appropriate when you need to write across multiple data centers and can tolerate briefly stale reads.
The problem is when teams accept eventual consistency by accident — using a message queue where they needed a synchronous transaction, or reading from a replica without understanding replication lag. The choice between ACID and BASE should be explicit, documented, and made with full understanding of which anomalies become possible.
Architecture

The diagram below shows two concurrent transactions — one with isolation enforced and one without — and how their intermediate states interact:
2. Isolation Levels and the Anomalies They Permit
PostgreSQL implements four isolation levels defined by the SQL standard, though Read Uncommitted is treated identically to Read Committed internally.
Read Uncommitted: Dirty Reads
A dirty read occurs when transaction T2 reads a row that T1 has modified but not yet committed. If T1 rolls back, T2 has read data that never officially existed.
PostgreSQL does not implement dirty reads. Even at READ UNCOMMITTED, it falls back to Read Committed behavior. However, this anomaly exists in MySQL and SQL Server and is the reason READ UNCOMMITTED should essentially never be used.
Read Committed: Non-Repeatable Reads
PostgreSQL's default. Each statement within a transaction sees a fresh snapshot of committed data. This means:
- No dirty reads (you only see committed data).
- Non-repeatable reads are possible: if you
SELECT balancetwice in the same transaction, and another transaction commits a change between your two reads, you see different values.
-- Session 1
BEGIN;
SELECT balance FROM accounts WHERE id = 1;
-- Returns: 1000
-- Session 2 (runs concurrently)
BEGIN;
UPDATE accounts SET balance = 500 WHERE id = 1;
COMMIT;
-- Session 1 (still in same transaction)
SELECT balance FROM accounts WHERE id = 1;
-- Returns: 500 ← different value! Non-repeatable read.
COMMIT;
This is the anomaly that breaks the "read then decide" pattern. If your transaction reads a value to make a business decision, and that value can change before you act on it, your decision is based on stale data.
Repeatable Read: Phantom Reads
At REPEATABLE READ, PostgreSQL takes a snapshot at the start of the transaction (not per-statement as in Read Committed). The same SELECT returns the same rows no matter how many times you run it — but new rows inserted by other transactions can still appear.
-- Session 1
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SELECT COUNT(*) FROM orders WHERE user_id = 42;
-- Returns: 5
-- Session 2 (runs concurrently)
BEGIN;
INSERT INTO orders (user_id, amount) VALUES (42, 99.99);
COMMIT;
-- Session 1 (still in same transaction)
SELECT COUNT(*) FROM orders WHERE user_id = 42;
-- At REPEATABLE READ in PostgreSQL: returns 5 (snapshot isolation prevents phantom)
-- Note: PostgreSQL's REPEATABLE READ actually prevents phantom reads too,
-- but write skew is still possible (see Serializable section)
COMMIT;
Note: PostgreSQL's implementation of Repeatable Read is actually stronger than the SQL standard requires — it uses snapshot isolation which incidentally prevents phantom reads. However, write skew anomalies remain possible.
Serializable: Full Isolation
PostgreSQL uses Serializable Snapshot Isolation (SSI) — a technique that detects when a set of concurrent transactions would produce a result impossible under any serial execution, and aborts one of them.
-- Classic write skew: two doctors both check if someone else is on call
-- Session 1
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT COUNT(*) FROM on_call WHERE shift_id = 1;
-- Returns: 2 (two doctors on call)
-- Decides: it's safe to go off call since 2 > 1
UPDATE on_call SET status = 'off' WHERE doctor_id = 101 AND shift_id = 1;
COMMIT; -- May fail with: ERROR: could not serialize access
-- Session 2 (concurrent)
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT COUNT(*) FROM on_call WHERE shift_id = 1;
-- Also returns: 2
-- Also decides: safe to go off call
UPDATE on_call SET status = 'off' WHERE doctor_id = 102 AND shift_id = 1;
COMMIT; -- One of these commits will succeed; the other will fail
SSI tracks read/write dependencies between transactions. When it detects a cycle (T1 read what T2 will write, T2 read what T1 will write), one transaction is aborted with a serialization failure. Your application must be prepared to retry.
Anomaly Matrix
Choosing the right level: For most CRUD web applications, Read Committed is appropriate. For financial operations — balance checks, inventory deductions, anything with a "check then act" pattern — use Serializable and handle serialization failures with retries. The performance overhead of SSI is typically 5-15% in write-heavy workloads and near-zero for read-heavy workloads.
3. MVCC: How PostgreSQL Implements Isolation Without Locking Reads
Multi-Version Concurrency Control is the mechanism that lets PostgreSQL give readers and writers independent progress without coordination. Readers never block writers. Writers never block readers. This is the fundamental reason PostgreSQL can maintain high read throughput under concurrent write load.
Row Versions: xmin and xmax
Every row in a PostgreSQL heap has two hidden system columns:
xmin: the transaction ID that inserted this row version.xmax: the transaction ID that deleted or updated this row version (zero if still live).
When you UPDATE a row, PostgreSQL does not modify the existing row. It marks the old row version with xmax = current_transaction_id and inserts a new row version with xmin = current_transaction_id. The old version remains on the heap until VACUUM removes it.
You can observe this directly:
-- Create a test table and observe xmin/xmax
CREATE TABLE mvcc_demo (id int, val text);
INSERT INTO mvcc_demo VALUES (1, 'original');
-- See the row's transaction metadata
SELECT id, val, xmin, xmax FROM mvcc_demo;
-- xmin shows the transaction that inserted this row
-- xmax is 0 (row is live, not deleted/updated)
-- Now update the row in a transaction
BEGIN;
UPDATE mvcc_demo SET val = 'updated' WHERE id = 1;
-- In another session, query the table:
-- SELECT id, val, xmin, xmax FROM mvcc_demo;
-- You'll see BOTH the old row (xmax = current_txid) and new row (xmin = current_txid)
COMMIT;
-- After commit, VACUUM can remove the old row version
VACUUM mvcc_demo;
Snapshot Visibility
At transaction start (or statement start for Read Committed), PostgreSQL records:
1. The current transaction ID.
2. The set of all currently active (in-progress) transaction IDs.
3. The highest transaction ID assigned so far.
A row version is visible to a snapshot if:
- Its xmin committed before the snapshot was taken (and is not in the active set).
- Its xmax is either zero, or has not committed by the time the snapshot was taken.
This is why a long-running transaction always sees the same snapshot of the data, even as other transactions commit. It does not hold any locks on those rows — it simply does not consider newer committed versions visible.
VACUUM: A Correctness Requirement
Because old row versions accumulate on the heap, VACUUM is not optional maintenance — it is a correctness requirement for two reasons:
- Space reclamation: Without VACUUM, heap files grow without bound as updates accumulate dead tuples.
- Transaction ID wraparound: PostgreSQL uses 32-bit transaction IDs. After ~2 billion transactions, IDs wrap around. PostgreSQL prevents this by running aggressive autovacuum on tables approaching the wraparound threshold. If VACUUM is blocked for too long, PostgreSQL will stop accepting writes and demand manual VACUUM — a production incident.
-- Monitor dead tuple accumulation
SELECT relname, n_live_tup, n_dead_tup,
round(n_dead_tup::numeric / nullif(n_live_tup + n_dead_tup, 0) * 100, 2) AS dead_pct,
last_autovacuum
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;
HOT Updates: Avoiding Index Churn
If you update a column that is not indexed, PostgreSQL can use a Heap Only Tuple (HOT) update: the new row version is placed on the same heap page as the old version, and no index entry is created for the new version. The index still points to the old row, which has a pointer to the new version.
HOT updates significantly reduce write amplification for tables with many indexes. They only apply when the updated columns have no indexes and the new row fits on the same page.

4. Deadlocks: Detection, Prevention, and Patterns
A deadlock occurs when two transactions are each waiting for a lock held by the other. Neither can proceed. Left unresolved, both would wait forever.
PostgreSQL Deadlock Detection
PostgreSQL's lock manager detects deadlocks by periodically checking the wait-for graph. The detection interval is controlled by deadlock_timeout (default: 1 second). When a cycle is detected, PostgreSQL cancels one transaction (the "victim") with:
ERROR: deadlock detected
DETAIL: Process 12345 waits for ShareLock on transaction 67890; blocked by process 11111.
Process 11111 waits for ShareLock on transaction 12345; blocked by process 12345.
HINT: See server log for query details.
The victim transaction receives a rollback. Your application must detect this error and retry.
Reproducing a Deadlock
-- Session 1
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1; -- Locks row id=1
-- (wait before running next statement)
-- Session 2 (run while Session 1 is waiting)
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 2; -- Locks row id=2
UPDATE accounts SET balance = balance + 100 WHERE id = 1; -- Waits for Session 1's lock on id=1
-- Session 1 (now run)
UPDATE accounts SET balance = balance + 100 WHERE id = 2; -- Waits for Session 2's lock on id=2
-- Deadlock! PostgreSQL cancels one of the transactions.
COMMIT;
Deadlock Prevention: Consistent Lock Ordering
The canonical prevention technique is to always acquire locks in the same global order across all transactions. If every transfer always locks the lower account ID first, no cycle can form:
import psycopg2
from contextlib import contextmanager
@contextmanager
def get_connection(dsn: str):
conn = psycopg2.connect(dsn)
try:
yield conn
conn.commit()
except Exception:
conn.rollback()
raise
finally:
conn.close()
def transfer_funds(dsn: str, from_id: int, to_id: int, amount: float) -> None:
"""
Transfer funds between accounts using consistent lock ordering to prevent deadlocks.
Always acquires locks in ascending account ID order regardless of transfer direction.
"""
# Sort IDs to ensure consistent lock ordering
lock_first, lock_second = sorted([from_id, to_id])
with get_connection(dsn) as conn:
with conn.cursor() as cur:
# Lock both rows in consistent order using SELECT FOR UPDATE
cur.execute("""
SELECT id, balance FROM accounts
WHERE id IN (%s, %s)
ORDER BY id -- Critical: order matches our lock_first/lock_second ordering
FOR UPDATE
""", (lock_first, lock_second))
rows = {row[0]: row[1] for row in cur.fetchall()}
if rows[from_id] < amount:
raise ValueError(f"Insufficient funds: balance {rows[from_id]}, requested {amount}")
cur.execute("UPDATE accounts SET balance = balance - %s WHERE id = %s", (amount, from_id))
cur.execute("UPDATE accounts SET balance = balance + %s WHERE id = %s", (amount, to_id))
SKIP LOCKED: Queue Consumer Pattern
SELECT FOR UPDATE SKIP LOCKED is a powerful pattern for building work queues on top of PostgreSQL. Each consumer claims rows that no other consumer has locked, without waiting:
-- Queue consumer: atomically claim and process one job
-- SKIP LOCKED means: if a row is locked by another consumer, skip it entirely
-- This allows multiple consumers to work the queue concurrently without contention
WITH claimed AS (
SELECT id, payload, created_at
FROM job_queue
WHERE status = 'pending'
ORDER BY created_at
LIMIT 1
FOR UPDATE SKIP LOCKED -- Skip rows locked by other consumers
)
UPDATE job_queue
SET status = 'processing', claimed_at = now(), worker_id = pg_backend_pid()
FROM claimed
WHERE job_queue.id = claimed.id
RETURNING job_queue.*;
import psycopg2
import json
import time
def consume_jobs(dsn: str, worker_id: str) -> None:
"""
Queue consumer using SKIP LOCKED for contention-free concurrent processing.
Multiple instances can run this function simultaneously without deadlocks.
"""
conn = psycopg2.connect(dsn)
while True:
with conn.cursor() as cur:
# Claim exactly one job atomically
cur.execute("""
WITH claimed AS (
SELECT id, payload
FROM job_queue
WHERE status = 'pending'
ORDER BY created_at
LIMIT 1
FOR UPDATE SKIP LOCKED
)
UPDATE job_queue
SET status = 'processing',
claimed_at = now(),
worker_id = %s
FROM claimed
WHERE job_queue.id = claimed.id
RETURNING job_queue.id, job_queue.payload
""", (worker_id,))
job = cur.fetchone()
conn.commit()
if job is None:
time.sleep(0.1) # No work available, backoff
continue
job_id, payload = job
try:
process_job(json.loads(payload))
# Mark as complete
cur.execute("UPDATE job_queue SET status = 'done' WHERE id = %s", (job_id,))
conn.commit()
except Exception as e:
# Mark as failed, allow retry
cur.execute("""
UPDATE job_queue
SET status = 'failed', error = %s
WHERE id = %s
""", (str(e), job_id))
conn.commit()
def process_job(payload: dict) -> None:
# Application-specific job processing
pass
NOWAIT: Fail Fast on Lock Contention
Use NOWAIT when you want immediate failure rather than waiting:
-- Fail immediately if the row is locked rather than waiting
SELECT * FROM accounts WHERE id = 1 FOR UPDATE NOWAIT;
-- If locked: ERROR: could not obtain lock on row in relation "accounts"
This is useful in user-facing APIs where waiting for a lock would cause unacceptable latency. Handle the lock error at the application layer and return an appropriate response.
5. Optimistic vs Pessimistic Concurrency Control
Choosing between optimistic and pessimistic locking is one of the most consequential architectural decisions in concurrent data access design.
Pessimistic Concurrency Control
Pessimistic control assumes conflict is likely. You acquire a lock before reading, hold it through processing, and release at commit. SELECT FOR UPDATE is the standard mechanism:
-- Pessimistic: lock the row before reading it
BEGIN;
SELECT balance FROM accounts WHERE id = 1 FOR UPDATE;
-- Other transactions trying to UPDATE this row will wait here
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
COMMIT;
-- Lock released at COMMIT
Pessimistic locking is correct for high-contention scenarios — financial operations, inventory deductions, anything where conflict is the common case. Its downside is serializing access: only one transaction can work on a row at a time.
Optimistic Concurrency Control
Optimistic control assumes conflict is rare. Read without locking. When you are ready to commit, verify that the data you read has not changed. If it has, abort and retry.
The standard implementation uses a version column:
-- Schema with version column
CREATE TABLE products (
id BIGSERIAL PRIMARY KEY,
name TEXT NOT NULL,
stock INT NOT NULL,
version INT NOT NULL DEFAULT 1
);
import psycopg2
import time
from typing import Optional
def deduct_stock_optimistic(
dsn: str,
product_id: int,
quantity: int,
max_retries: int = 5
) -> bool:
"""
Optimistic locking with version column.
Detects concurrent modification by checking that version hasn't changed.
Returns True on success, raises after max_retries exhausted.
"""
conn = psycopg2.connect(dsn)
conn.autocommit = False
for attempt in range(max_retries):
try:
with conn.cursor() as cur:
# Step 1: Read current state without locking
cur.execute(
"SELECT stock, version FROM products WHERE id = %s",
(product_id,)
)
row = cur.fetchone()
if row is None:
raise ValueError(f"Product {product_id} not found")
current_stock, current_version = row
if current_stock < quantity:
raise ValueError(f"Insufficient stock: have {current_stock}, need {quantity}")
# Step 2: Update, but only if version still matches what we read
# If another transaction modified the row, version will have changed
# and this UPDATE will affect 0 rows — our signal to retry
cur.execute("""
UPDATE products
SET stock = stock - %s,
version = version + 1
WHERE id = %s
AND version = %s -- Optimistic lock check
""", (quantity, product_id, current_version))
rows_affected = cur.rowcount
if rows_affected == 0:
# Version mismatch: another transaction modified the row
# Rollback and retry
conn.rollback()
backoff = 0.01 * (2 ** attempt) # Exponential backoff
time.sleep(backoff)
continue
conn.commit()
return True
except psycopg2.Error:
conn.rollback()
raise
raise RuntimeError(f"Failed to deduct stock after {max_retries} attempts")
When to Use Each
| Scenario | Recommendation | Reason |
|---|---|---|
| Financial transfers | Pessimistic (SELECT FOR UPDATE) | Conflict is common; retry cost is high |
| Inventory deduction (flash sale) | Pessimistic with SKIP LOCKED | High contention during peak load |
| Profile updates | Optimistic (version column) | Low contention; clean UX without lock waits |
| Configuration changes | Optimistic | Rare writes, easy to retry |
| Long-running transactions | Optimistic | Holding locks for seconds causes cascading waits |
| Batch data processing | Pessimistic with NOWAIT | Fail-fast is preferable to queuing behind slow batches |
Optimistic locking achieves higher throughput when contention is low because no lock traffic touches the lock manager. Under high contention, it degrades badly — every transaction retries constantly, and the effective throughput drops below pessimistic locking.
6. Distributed Transactions
Distributed transactions are qualitatively harder than single-database transactions. When your operation spans two databases — or a database and a message queue — you lose the atomicity guarantee that a single BEGIN ... COMMIT provides.
Why You Cannot Just BEGIN in Two Databases
Consider an order placement that must:
1. Deduct inventory from a PostgreSQL database.
2. Publish an order_placed event to a Kafka topic.
There is no single COMMIT that covers both. If you write to PostgreSQL and then Kafka fails, your inventory is decremented but no order event was published. If Kafka succeeds and PostgreSQL crashes, the event exists but the inventory is not decremented.
Two-Phase Commit (2PC)
2PC is the classic solution: a coordinator sends PREPARE to all participants, waits for all to acknowledge readiness, then sends COMMIT or ROLLBACK. PostgreSQL supports 2PC natively via PREPARE TRANSACTION.
-- Coordinator: Phase 1 (Prepare)
BEGIN;
UPDATE inventory SET stock = stock - 1 WHERE product_id = 42;
PREPARE TRANSACTION 'order_12345'; -- Row is locked until COMMIT PREPARED or ROLLBACK PREPARED
-- Coordinator: Phase 2 (Commit, after all participants prepared)
COMMIT PREPARED 'order_12345';
-- Or rollback if any participant failed
ROLLBACK PREPARED 'order_12345';
2PC has a critical failure mode: if the coordinator crashes after sending PREPARE but before sending COMMIT, participants remain in a prepared state holding locks — indefinitely, until an operator resolves the situation. For this reason, 2PC is often avoided in favor of the Saga pattern.
The Outbox Pattern: Atomic Database + Message Queue
The outbox pattern solves the database/queue atomicity problem without 2PC. The key insight: write the event into an outbox table in the same transaction as your business data change. A separate relay process reads the outbox and publishes to the queue. If the relay crashes, it replays from the outbox. If the database transaction rolls back, the outbox row is never written.
-- Outbox table schema
CREATE TABLE outbox (
id BIGSERIAL PRIMARY KEY,
aggregate_type TEXT NOT NULL, -- e.g., 'order'
aggregate_id TEXT NOT NULL, -- e.g., order UUID
event_type TEXT NOT NULL, -- e.g., 'order_placed'
payload JSONB NOT NULL,
created_at TIMESTAMPTZ DEFAULT now(),
published_at TIMESTAMPTZ, -- NULL = not yet published
sequence_no BIGSERIAL -- For ordered delivery
);
import psycopg2
import json
from datetime import datetime
def place_order(dsn: str, user_id: int, product_id: int, quantity: int) -> str:
"""
Place an order using the outbox pattern.
The order record and the outbox event are written in the same transaction.
If this transaction commits, the event will eventually be published.
If it rolls back, neither the order nor the event persists.
"""
import uuid
order_id = str(uuid.uuid4())
conn = psycopg2.connect(dsn)
conn.autocommit = False
try:
with conn.cursor() as cur:
# Business operation 1: deduct inventory
cur.execute("""
UPDATE inventory
SET stock = stock - %s
WHERE product_id = %s AND stock >= %s
""", (quantity, product_id, quantity))
if cur.rowcount == 0:
raise ValueError("Insufficient inventory")
# Business operation 2: create order record
cur.execute("""
INSERT INTO orders (id, user_id, product_id, quantity, status, created_at)
VALUES (%s, %s, %s, %s, 'pending', now())
""", (order_id, user_id, product_id, quantity))
# Outbox entry: written in SAME transaction as business data
# This is the atomicity guarantee — both commit or neither does
event_payload = {
"order_id": order_id,
"user_id": user_id,
"product_id": product_id,
"quantity": quantity,
"timestamp": datetime.utcnow().isoformat()
}
cur.execute("""
INSERT INTO outbox (aggregate_type, aggregate_id, event_type, payload)
VALUES ('order', %s, 'order_placed', %s)
""", (order_id, json.dumps(event_payload)))
conn.commit()
return order_id
except Exception:
conn.rollback()
raise
finally:
conn.close()
def relay_outbox_to_kafka(dsn: str, kafka_producer) -> int:
"""
Relay worker: reads unpublished outbox events and publishes them to Kafka.
Uses SKIP LOCKED so multiple relay workers can run without contention.
Idempotency: records published_at to prevent double-publishing.
Returns count of events published.
"""
conn = psycopg2.connect(dsn)
conn.autocommit = False
published_count = 0
try:
with conn.cursor() as cur:
# Claim a batch of unpublished events
cur.execute("""
SELECT id, aggregate_type, aggregate_id, event_type, payload
FROM outbox
WHERE published_at IS NULL
ORDER BY sequence_no
LIMIT 100
FOR UPDATE SKIP LOCKED
""")
events = cur.fetchall()
for event_id, agg_type, agg_id, event_type, payload in events:
# Publish to Kafka (outside the transaction)
kafka_producer.produce(
topic=f"{agg_type}.events",
key=agg_id,
value=json.dumps({
"event_type": event_type,
"payload": payload
})
)
kafka_producer.flush()
# Mark as published — only after Kafka confirms receipt
cur.execute("""
UPDATE outbox
SET published_at = now()
WHERE id = %s
""", (event_id,))
published_count += 1
conn.commit()
return published_count
except Exception:
conn.rollback()
raise
finally:
conn.close()
The outbox pattern delivers at-least-once semantics: if the relay crashes after publishing to Kafka but before marking published_at, it will republish on restart. Consumers must handle duplicate events (typically via idempotency keys).
Saga Pattern for Long-Running Operations
When an operation spans multiple services and cannot be wrapped in a single transaction, the Saga pattern models it as a sequence of local transactions, each with a compensating transaction that undoes its effect if a later step fails.
# Saga: place order across three services
# Each step has a compensating action
def order_saga(order_request: dict) -> None:
steps_completed = []
try:
# Step 1: Reserve inventory
inventory_reservation = reserve_inventory(order_request["product_id"], order_request["quantity"])
steps_completed.append(("inventory", inventory_reservation["reservation_id"]))
# Step 2: Charge payment
payment = charge_payment(order_request["user_id"], order_request["total"])
steps_completed.append(("payment", payment["charge_id"]))
# Step 3: Create shipment
shipment = create_shipment(order_request)
steps_completed.append(("shipment", shipment["shipment_id"]))
except Exception as e:
# Compensate in reverse order
for service, resource_id in reversed(steps_completed):
if service == "inventory":
release_inventory_reservation(resource_id)
elif service == "payment":
refund_payment(resource_id)
elif service == "shipment":
cancel_shipment(resource_id)
raise
Sagas are eventually consistent — there is a window during which the saga is partially applied. Design your system so that partially-applied states are either invisible to users or explicitly surfaced (e.g., "order pending" rather than immediate confirmation).
Conclusion
Database transactions are not a checkbox feature — they are a precise contract between your application and the database about what anomalies are and are not possible. The default settings (Read Committed isolation, autocommit off) are sensible for most applications, but "most" is not "all."
The practical takeaways:
Choose isolation levels deliberately. Read Committed is correct for simple read-and-write operations, but any "read the current state, then decide" pattern needs at minimum Repeatable Read or Serializable. Add BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE to financial operations and be prepared to retry on serialization failure.
Understand what MVCC costs. Dead tuple accumulation is not a minor housekeeping concern — it affects query performance (table bloat), index bloat, and eventually transaction ID wraparound. Monitor n_dead_tup in pg_stat_user_tables. Ensure autovacuum is not chronically blocked by long-running transactions.
Practice deadlock prevention as discipline, not reaction. Consistent lock ordering is a team convention that should be enforced in code review. A single transaction that acquires locks in a different order from every other transaction is a latent deadlock waiting for the right concurrent load. The SKIP LOCKED pattern for queue consumers eliminates an entire class of contention by design.
Avoid distributed transactions where possible. The outbox pattern covers the most common cross-system consistency requirement (database + message queue) without 2PC's coordinator failure modes. For multi-service operations, sagas with compensating transactions are the pragmatic choice — they trade strict consistency for resilience and explicit failure handling.
Transaction correctness is not an advanced concern. It is a foundational property of any system that handles money, inventory, or any resource with finite supply. The cost of getting it right is a few hours of careful design. The cost of getting it wrong is production data corruption that may take days to discover and weeks to fully understand.
Sources
- PostgreSQL Documentation — Transaction Isolation: https://www.postgresql.org/docs/current/transaction-iso.html
- PostgreSQL Documentation — MVCC: https://www.postgresql.org/docs/current/mvcc.html
- PostgreSQL Documentation — Explicit Locking: https://www.postgresql.org/docs/current/explicit-locking.html
- PostgreSQL Documentation — PREPARE TRANSACTION: https://www.postgresql.org/docs/current/sql-prepare-transaction.html
- Ports and Adapters: Outbox Pattern — Chris Richardson, microservices.io
- Serializable Snapshot Isolation in PostgreSQL — Ports, Cahill et al., 2012
Enjoyed this post? Follow AmtocSoft for AI tutorials from beginner to professional.
☕ Buy Me a Coffee | 🔔 YouTube | 💼 LinkedIn | 🐦 X/Twitter
Comments
Post a Comment