Microservices vs Monolith in 2026: The Honest Decision Framework
Microservices vs Monolith in 2026: The Honest Decision Framework

Introduction
In 2016, the industry consensus was loud and confident: monoliths are legacy, microservices are the future. Every conference talk, every architectural review, every greenfield project brief had the same answer. Split everything into services. Deploy them independently. Scale them independently. The architecture would mirror the organization, and the organization would ship faster.
A decade later, the honest post-mortems are piling up. Amazon famously decomposed their retail monolith into services, and that decomposition genuinely enabled their growth. But Amazon also has thousands of engineers, a dedicated distributed systems platform team, and the scale to justify the overhead. Most teams that cargo-culted the pattern got the complexity without the scale that justifies it. Monoliths were rebuilt from scratch as distributed systems and became harder to understand, harder to debug, and slower to ship.
Shopify runs one of the world's largest e-commerce platforms on a Ruby on Rails monolith. Stack Overflow serves millions of developers per month on nine physical servers. Prime Video's video monitoring team made headlines in 2023 when they collapsed their microservices architecture back into a monolith and reduced costs by 90 percent. These aren't edge cases or embarrassing admissions — they're engineering teams making correct decisions for their scale and team structure.
The microservices vs. monolith debate was always the wrong framing. The right question is: what level of distribution is right for this team, at this scale, with these constraints? That question has a different answer in 2026 than it did in 2016, because the costs of getting it wrong are better understood. We have more failure data. We have more honesty about the operational tax that distributed systems impose. And we have a clearer-eyed picture of when distribution actually delivers its promised benefits versus when it just moves the complexity from code into the network.
This post is an honest decision framework — not a technology endorsement. We will look at when monoliths are the right call, when microservices are genuinely justified, how to decompose when the time comes, how to handle service communication without building a reliability nightmare, and what the real operational costs look like before you commit to them. Position taken upfront: for most teams at most stages, a well-structured modular monolith is the correct default. Extract services when you have a specific, demonstrable reason. Not because a blog post said to.
1. The Monolith Is Not the Problem
The word "monolith" has become a pejorative in engineering culture, synonymous with technical debt, deployment risk, and legacy thinking. This is a category error. A monolith is a deployment topology, not a quality judgment. The distinction that actually matters is not monolith vs. microservices — it is modular vs. tangled.
A tangled monolith is what people are actually afraid of. It is a codebase where the user service imports the billing service which imports the analytics service which imports the user service again. Every change ripples unpredictably across the system. The test suite takes 45 minutes because nothing can be tested in isolation. Deployment is a full-system rebuild, and every release is a roulette wheel because nobody knows what touched what. This is a real problem, but it is a problem of internal architecture, not deployment topology. Converting a tangled monolith into microservices does not fix the tangle — it promotes it to a distributed tangle, which is harder to observe and harder to fix.
A modular monolith is structured around clear domain boundaries with explicit interfaces between modules. The payment module exposes a PaymentService interface. The order module calls that interface. Neither module reaches into the other's internals. The modules are independently testable, internally cohesive, and externally loosely coupled. The fact that they all run in the same process is incidental. Netflix's original monolith was modular. So is the Django codebase powering Instagram, and the Rails codebase powering Shopify.
Shopify is the canonical example worth sitting with. At the time of writing, Shopify processes more than $10 billion in GMV annually, handles traffic spikes that would buckle most architectures, and runs a global merchant and consumer platform — all on a Rails monolith they call their "modular monolith." They have invested heavily in defining module boundaries, preventing cross-module data access, and building internal tooling to enforce the rules. It is not simple, but it is significantly simpler than the alternative. Their chief architect has said publicly that the modular monolith is the right choice for Shopify at Shopify's scale, and that rewriting it as microservices would consume years of engineering effort for uncertain benefit.
Stack Overflow is the other number to keep in your head. Nine physical servers. Millions of page views per month. The team is small, the deployment is simple, and the performance is exceptional — because SQL Server, careful indexing, and in-process caching inside a single deployment unit beats the overhead of service-to-service network calls at that traffic volume.
The real signals that a monolith has a structural problem — and not that it needs to be decomposed into services — are: circular dependencies between modules, a shared database god object where every module reads every table, the inability to run any subset of the codebase in isolation, and deployment gates that require every team to sign off because every change can affect every other change. These are problems you fix through refactoring and internal boundary enforcement, not through network boundaries. Building clear module interfaces is the prerequisite to decomposition. If you cannot define a clean interface between two modules inside a monolith, extracting them as services will not create one — it will just add latency to the confusion.
The design principle that matters most for future decomposability is domain-first module organization. Organize code by business domain (orders, payments, inventory, notifications), not by technical layer (controllers, services, repositories). Vertical slices that own their domain from API to database are far easier to extract into independent services later than horizontal layers that cut across every domain. Build the modular monolith correctly, and you have an extraction-ready architecture. Skip the modular structure in favor of early extraction, and you will be debugging distributed transactions before your user base justifies it.

2. When Microservices Are Justified
The useful question is not "should we use microservices?" but "do we have a specific problem that service extraction solves, where the solution's cost is less than the problem's cost?" Most of the time the answer is no. Some of the time — at sufficient scale, with sufficient team complexity — the answer is yes.
Independent scaling requirements are the clearest technical justification. If your payment processing workload requires 10x the compute during peak hours and your user authentication workload requires none, deploying them as separate services means you scale payment horizontally without paying for unused authentication capacity. Inside a monolith, you scale everything together. At the scale where that inefficiency costs meaningful money — typically when your infrastructure bill is in the tens of thousands per month — this math starts to matter. At startup scale, the waste from over-provisioning a single deployment is negligible compared to the engineering overhead of managing multiple services.
Different deployment cadences are the second strong technical justification. If your ML inference service needs to redeploy every hour as the model is retrained, and your core user service deploys once a month, coupling those two inside a monolith means every model refresh triggers a full system deployment, with all the associated risk, testing, and coordination. Decoupling their deployment cycles through service boundaries is a direct reduction in deployment risk, not an increase.
Team autonomy at Conway's Law scale is the organizational justification. Conway's Law states that systems reflect the communication structures of the organizations that build them. The inverse is also useful: if you have three independent teams with distinct ownership boundaries, a monolith will create constant merge conflicts, deployment coordination costs, and organizational friction that a services-based architecture resolves. This is not a technical requirement — it is an organizational one. But it is real. The signal to watch for is: are multiple teams fighting over deployment? Are you scheduling release windows to coordinate between teams? Are merge conflicts in shared modules a weekly source of delay? That is the organizational pressure that service extraction is designed to relieve.
Compliance isolation is the fourth justification, and often underweighted. PCI DSS scope is a real concern for any team handling payment card data. If you can isolate all cardholder data handling into a single service with its own infrastructure, you reduce the audit surface area from your entire system to one bounded component. The same logic applies to HIPAA compliance for health data, SOC 2 boundaries, and GDPR data residency requirements. Service extraction for compliance isolation is justified at any scale because the alternative — scoping your entire monolith under PCI DSS — is significantly more expensive in audit costs and ongoing compliance overhead.
over deploys monthly?} Q1 -->|No| Q2{Wildly different
scaling needs?} Q1 -->|Yes| Q3{Team size > 15?} Q3 -->|No| Stay[Keep in monolith\nFix process, not architecture] Q3 -->|Yes| Extract[Extract service] Q2 -->|No| Q4{Different deploy
cadences causing risk?} Q2 -->|Yes| Q5{Cost waste > $5k/mo?} Q5 -->|No| Stay Q5 -->|Yes| Extract Q4 -->|No| Q6{Compliance isolation
required? PCI/HIPAA} Q4 -->|Yes| Extract Q6 -->|Yes| Extract Q6 -->|No| Stay style Extract fill:#2d6a4f,color:#fff style Stay fill:#6b2737,color:#fff
The signal that is often mistaken for a microservices justification is team or engineer count. "We have 50 engineers, therefore we need microservices" is not valid logic. You need microservices when 50 engineers are organized into independent product teams with independent ownership, independent deployment, and independent scaling requirements. If 50 engineers are all working on the same product with shared ownership and coordinated releases, a modular monolith serves them better than services. The 2-pizza team rule from Amazon applies to the team ownership model, not to headcount alone.
The signal that actually indicates readiness for service extraction is operational maturity: do you have distributed tracing deployed? Do you have a service registry and health check infrastructure? Do you have on-call rotations capable of debugging cross-service failures at 2am? Without those foundations, extracting a service creates problems you cannot diagnose. Build the observability platform before you need it to debug a production incident in a distributed system.
3. Decomposition Patterns
When the decision to extract a service is made — based on the framework above, not on hype — the implementation matters enormously. Big-bang rewrites are the highest-risk migration path and the most common mistake. Every successful decomposition from a production system uses incremental migration patterns.
The Strangler Fig is the most battle-tested incremental migration pattern. The name comes from the strangler fig tree, which grows around an existing tree and gradually replaces it. In software, you route a subset of traffic to the new service while the old code still handles the rest. Over months, you increase the new service's traffic share, fix its bugs under production load, and eventually decommission the old code path. The monolith shrinks. The new service grows. At no point do you have a hard cutover.
# strangler_fig_router.py
# Routes requests to either the legacy monolith handler or the new payment service
# based on a feature flag. Enables gradual traffic migration with instant rollback.
import os
import httpx
from typing import Optional
from dataclasses import dataclass
# Feature flag thresholds — increase these gradually as confidence builds
# 0.0 = all traffic to legacy, 1.0 = all traffic to new service
PAYMENT_SERVICE_TRAFFIC_PERCENT = float(os.getenv("PAYMENT_SERVICE_TRAFFIC_PCT", "0.0"))
@dataclass
class PaymentRequest:
order_id: str
amount_cents: int
currency: str
customer_id: str
@dataclass
class PaymentResult:
success: bool
transaction_id: Optional[str]
error: Optional[str]
class StranglerFigPaymentRouter:
"""
Routes payment processing requests between the legacy monolith handler
and the new standalone payment service. Uses deterministic hashing on
order_id so the same order always goes to the same backend during
migration — prevents split-brain issues where one system charges but
the other records the transaction.
"""
def __init__(self, legacy_handler, new_service_url: str):
self.legacy_handler = legacy_handler
self.new_service_url = new_service_url
self.http_client = httpx.AsyncClient(timeout=5.0)
def _should_use_new_service(self, order_id: str) -> bool:
"""
Deterministic routing: hash the order_id to decide which backend
handles this request. Same order_id always routes consistently,
regardless of when the request arrives or which server handles it.
"""
# Simple consistent hash: use last 4 hex chars of order_id
# to get a stable 0-9999 bucket, then compare to threshold
bucket = int(order_id[-4:], 16) % 10000
threshold = int(PAYMENT_SERVICE_TRAFFIC_PERCENT * 100)
return bucket < threshold
async def process_payment(self, request: PaymentRequest) -> PaymentResult:
if self._should_use_new_service(request.order_id):
return await self._call_new_service(request)
else:
return await self._call_legacy(request)
async def _call_new_service(self, request: PaymentRequest) -> PaymentResult:
"""Call the extracted payment microservice via HTTP."""
try:
response = await self.http_client.post(
f"{self.new_service_url}/v1/payments",
json={
"order_id": request.order_id,
"amount_cents": request.amount_cents,
"currency": request.currency,
"customer_id": request.customer_id,
},
)
data = response.json()
if response.status_code == 200:
return PaymentResult(
success=True,
transaction_id=data["transaction_id"],
error=None,
)
return PaymentResult(success=False, transaction_id=None, error=data.get("error"))
except httpx.TimeoutException:
# On timeout, fall back to legacy — safety net during migration
return await self._call_legacy(request)
async def _call_legacy(self, request: PaymentRequest) -> PaymentResult:
"""Call the original monolith payment handler."""
return await self.legacy_handler.process_payment(request)
Branch by Abstraction works when you cannot control routing at the HTTP layer. Introduce an interface that both the old and new implementations satisfy. Initially the interface delegates to the old code. You write the new implementation behind the interface. Once the new implementation passes tests, you flip the implementation at the injection point. The calling code never changes.
Domain-Driven Design bounded contexts should define your service boundaries, not technical convenience. A bounded context is a subsystem with its own domain model, its own language, and its own data. The Order concept in your ordering context has different attributes and behaviors than the Order concept in your fulfillment context. Trying to share one Order model across both creates the tight coupling that makes services hard to evolve independently.
Database-per-service is the hardest constraint and the most important one. A shared database between two services is not a microservices architecture — it is a distributed monolith with all the overhead of services and none of the independence. If service A and service B both read from the same table, they cannot be deployed or scaled independently. Any schema change requires coordinating both services. The independence that justifies the complexity of separate services requires separate data ownership. This means denormalization. It means eventual consistency between services. It means accepting that you cannot use a JOIN across service boundaries. Those costs are real, and they are why shared databases are so tempting. They are also why so many microservices migrations fail to deliver their promised independence.
The anti-corruption layer pattern prevents new service boundaries from being contaminated by the legacy domain model. When extracting a service from a monolith, the legacy codebase has its own internal model — often a god object with 60 fields that represents "everything about a customer." The new service has a clean, bounded model. The anti-corruption layer is a translation component at the boundary that converts the legacy model into the new service's model and back. Without it, the new service's design gets polluted by the legacy model's shape, and you have not actually established a new boundary — you have just moved the legacy model into a new process.

4. Service Communication Patterns
How services talk to each other is where distributed systems earn their complexity tax. Every communication pattern is a tradeoff between latency, reliability guarantees, operational overhead, and coupling. Getting this wrong is the most common cause of microservices failures in production.
Synchronous communication via REST or gRPC is appropriate when you need a response before proceeding. A payment authorization must succeed before you confirm an order. A user lookup must return before you render a page. REST is universal and easy to debug. gRPC is faster (binary Protocol Buffers over HTTP/2) and enforces schema via .proto files. Use gRPC for internal service-to-service calls where you control both ends. Use REST for external-facing APIs where clients are diverse.
The fundamental problem with synchronous communication in a distributed system is temporal coupling. If service A calls service B synchronously, and service B is slow or unavailable, service A is slow or unavailable. Synchronous call chains compound: 100ms at each of five service hops means 500ms minimum latency for the calling service, plus the probability of failure at each hop multiplied together. If each service has 99.9% availability, five synchronous dependencies gives you 99.5% availability for the composite operation — before accounting for network failures.
Asynchronous communication via message queues (Kafka, RabbitMQ, Redis Streams) decouples services temporally. When an order is placed, the order service publishes an OrderPlaced event and returns immediately. The inventory service, notification service, and analytics service each consume that event in their own time. The order service does not know or care whether any of them are available when it publishes. This eliminates temporal coupling at the cost of eventual consistency — the inventory service will subtract stock, but not necessarily before the next request arrives.
# saga_choreography.py
# Implements the Saga pattern via event choreography for distributed transactions.
# Each service listens for events, performs its local transaction, and emits
# the next event in the saga chain. On failure, each service emits a compensating event.
import json
import asyncio
from enum import Enum
from dataclasses import dataclass, asdict
from typing import Optional
class SagaEventType(str, Enum):
# Forward events — happy path
ORDER_PLACED = "order.placed"
PAYMENT_RESERVED = "payment.reserved"
INVENTORY_RESERVED = "inventory.reserved"
ORDER_CONFIRMED = "order.confirmed"
# Compensating events — rollback path
PAYMENT_FAILED = "payment.failed"
INVENTORY_FAILED = "inventory.failed"
PAYMENT_RELEASED = "payment.released" # compensate payment.reserved
ORDER_CANCELLED = "order.cancelled"
@dataclass
class SagaEvent:
event_type: SagaEventType
order_id: str
correlation_id: str # tracks the full saga across services
payload: dict
failure_reason: Optional[str] = None
class PaymentService:
"""
Handles payment.reserved and payment.failed events.
On order.placed: attempt to reserve funds. Emit payment.reserved or payment.failed.
On inventory.failed: emit payment.released to compensate the reservation.
"""
async def handle_event(self, event: SagaEvent, emit):
if event.event_type == SagaEventType.ORDER_PLACED:
await self._reserve_payment(event, emit)
elif event.event_type == SagaEventType.INVENTORY_FAILED:
await self._release_payment(event, emit)
async def _reserve_payment(self, event: SagaEvent, emit):
order = event.payload
try:
# Idempotency key: use correlation_id so retries are safe
transaction_id = await self._charge_card(
customer_id=order["customer_id"],
amount_cents=order["amount_cents"],
idempotency_key=event.correlation_id,
)
await emit(SagaEvent(
event_type=SagaEventType.PAYMENT_RESERVED,
order_id=event.order_id,
correlation_id=event.correlation_id,
payload={"transaction_id": transaction_id, **order},
))
except PaymentDeclinedError as e:
await emit(SagaEvent(
event_type=SagaEventType.PAYMENT_FAILED,
order_id=event.order_id,
correlation_id=event.correlation_id,
payload=order,
failure_reason=str(e),
))
async def _release_payment(self, event: SagaEvent, emit):
# Compensating transaction: reverse the reservation
await self._refund_charge(
transaction_id=event.payload["transaction_id"],
idempotency_key=f"refund-{event.correlation_id}",
)
await emit(SagaEvent(
event_type=SagaEventType.PAYMENT_RELEASED,
order_id=event.order_id,
correlation_id=event.correlation_id,
payload=event.payload,
))
async def _charge_card(self, customer_id, amount_cents, idempotency_key):
# Stubbed: actual Stripe/Adyen call here
return f"txn_{idempotency_key[:8]}"
async def _refund_charge(self, transaction_id, idempotency_key):
# Stubbed: actual refund call here
pass
class InventoryService:
"""
Listens for payment.reserved. Attempts to reserve stock.
Emits inventory.reserved or inventory.failed.
On failure, the payment service will see inventory.failed and release the charge.
"""
async def handle_event(self, event: SagaEvent, emit):
if event.event_type == SagaEventType.PAYMENT_RESERVED:
await self._reserve_stock(event, emit)
async def _reserve_stock(self, event: SagaEvent, emit):
order = event.payload
try:
await self._decrement_inventory(
sku=order["sku"],
quantity=order["quantity"],
idempotency_key=event.correlation_id,
)
await emit(SagaEvent(
event_type=SagaEventType.INVENTORY_RESERVED,
order_id=event.order_id,
correlation_id=event.correlation_id,
payload=order,
))
except InsufficientStockError as e:
await emit(SagaEvent(
event_type=SagaEventType.INVENTORY_FAILED,
order_id=event.order_id,
correlation_id=event.correlation_id,
payload=order,
failure_reason=str(e),
))
async def _decrement_inventory(self, sku, quantity, idempotency_key):
pass # Actual inventory update here
class PaymentDeclinedError(Exception): pass
class InsufficientStockError(Exception): pass
The Circuit Breaker prevents cascading failures. When service B is failing, service A should stop trying to call it immediately rather than queuing up requests that will timeout after five seconds each, exhausting connection pools, and propagating the failure upstream. A circuit breaker wraps a remote call and tracks failure rate. When failures exceed a threshold, the circuit "opens" and requests fail fast (immediately, without attempting the call). After a cooldown window, the circuit moves to "half-open" and tries a test request. If it succeeds, the circuit closes and normal traffic resumes.
# circuit_breaker.py
# A minimal circuit breaker with exponential backoff for remote service calls.
# States: CLOSED (normal), OPEN (failing fast), HALF_OPEN (testing recovery).
import time
import asyncio
from enum import Enum
from typing import Callable, TypeVar, Awaitable
T = TypeVar("T")
class CircuitState(Enum):
CLOSED = "closed" # Normal operation
OPEN = "open" # Failing fast — not attempting calls
HALF_OPEN = "half_open" # Testing if service has recovered
class CircuitBreakerOpen(Exception):
"""Raised when a call is blocked because the circuit is open."""
pass
class CircuitBreaker:
"""
Circuit breaker with exponential backoff on retry windows.
failure_threshold: number of failures before circuit opens
recovery_timeout: seconds to wait before attempting recovery (HALF_OPEN)
success_threshold: consecutive successes in HALF_OPEN before closing
"""
def __init__(
self,
failure_threshold: int = 5,
recovery_timeout: float = 30.0,
success_threshold: int = 2,
):
self.failure_threshold = failure_threshold
self.recovery_timeout = recovery_timeout
self.success_threshold = success_threshold
self._state = CircuitState.CLOSED
self._failure_count = 0
self._success_count = 0
self._last_failure_time: float = 0.0
self._backoff_multiplier = 1.0 # Increases with each OPEN cycle
@property
def state(self) -> CircuitState:
if self._state == CircuitState.OPEN:
# Check if recovery window has elapsed
elapsed = time.monotonic() - self._last_failure_time
recovery_window = self.recovery_timeout * self._backoff_multiplier
if elapsed >= recovery_window:
self._state = CircuitState.HALF_OPEN
self._success_count = 0
return self._state
async def call(self, func: Callable[..., Awaitable[T]], *args, **kwargs) -> T:
"""Execute func through the circuit breaker."""
current_state = self.state
if current_state == CircuitState.OPEN:
raise CircuitBreakerOpen(
f"Circuit is OPEN. Next retry in "
f"{self.recovery_timeout * self._backoff_multiplier:.0f}s"
)
try:
result = await func(*args, **kwargs)
self._on_success()
return result
except Exception as e:
self._on_failure()
raise
def _on_success(self):
if self._state == CircuitState.HALF_OPEN:
self._success_count += 1
if self._success_count >= self.success_threshold:
# Service recovered — close the circuit and reset backoff
self._state = CircuitState.CLOSED
self._failure_count = 0
self._backoff_multiplier = 1.0
elif self._state == CircuitState.CLOSED:
# Reset failure count on any success (sliding window behavior)
self._failure_count = max(0, self._failure_count - 1)
def _on_failure(self):
self._failure_count += 1
self._last_failure_time = time.monotonic()
if self._failure_count >= self.failure_threshold:
if self._state != CircuitState.OPEN:
# First time opening: start backoff at 1x
self._state = CircuitState.OPEN
else:
# Already open — exponential backoff up to 8x the base timeout
self._backoff_multiplier = min(self._backoff_multiplier * 2, 8.0)
# Usage example
payment_circuit = CircuitBreaker(failure_threshold=5, recovery_timeout=30.0)
async def get_payment_status(order_id: str) -> dict:
try:
return await payment_circuit.call(
payment_service_client.get_status,
order_id=order_id,
)
except CircuitBreakerOpen:
# Return cached status or degraded response rather than failing hard
return {"status": "unknown", "degraded": True}
A service mesh (Istio, Linkerd) handles cross-cutting concerns — mutual TLS between services, circuit breaking, retry logic, distributed tracing, traffic splitting — at the infrastructure layer without code changes. For teams with 10+ services, the investment in a service mesh pays off by removing dozens of per-service implementations of the same retry/timeout/mTLS logic. For teams with 3-5 services, the operational overhead of the mesh itself (Istio in particular is operationally demanding) likely exceeds the benefit.
5. The Distributed Systems Tax
Every microservices adoption prospectus focuses on the benefits. The tax is real and should be stated plainly before any decomposition decision is made.
Network failures are now a first-class concern. In a monolith, a function call either returns or throws an exception from the called code. In a distributed system, the call can fail because the network dropped the packet, because the remote service is restarting, because DNS resolution failed, because the TLS handshake timed out, because a load balancer returned a 502, or because the remote service returned a 200 but the response was truncated. Every cross-service call requires timeout handling, retry logic with exponential backoff and jitter, and circuit breakers. This is not optional. A service that does not handle these failure modes will eventually fail in production in a way that cascades across your entire system.
Distributed tracing is a prerequisite, not an afterthought. When a user request in a monolith fails, you have one stack trace in one log. When a request traverses five services and fails, you have five partial logs in five log streams with no correlation between them — unless you have implemented distributed tracing. OpenTelemetry with Jaeger or Honeycomb, propagating trace IDs through every service call, is the minimum viable observability for a microservices architecture. Without it, debugging a production incident requires correlating timestamps across five dashboards and reconstructing the call graph manually. This is what "flying blind" looks like in practice, and it happens at 2am.
Data consistency requires explicit engineering. The ACID transaction guarantee that a relational database gives you inside a monolith does not extend across service boundaries. When an order is placed and requires a payment reservation and an inventory reservation, you cannot wrap those three operations in a single database transaction. You must implement the Saga pattern, with compensating transactions for each step that can fail. You must design every operation to be idempotent so that retries do not double-charge customers. You must accept that the system will be in inconsistent intermediate states during normal operation and design the user experience around eventual consistency.
Operational complexity scales linearly with service count. Each new service requires: its own deployment pipeline, its own container registry entry, its own Kubernetes namespace and resource limits, its own alert policies, its own runbook, its own on-call escalation path, its own log aggregation configuration, and its own metrics dashboard. A team that manages ten services needs ten times the operational infrastructure of a team with one monolith. This overhead does not scale down when services are small — a two-function service costs nearly as much to operate as a large one.
The latency math is unforgiving. A synchronous call chain of five services, each adding 20ms of internal processing time and 10ms of network latency on a low-latency internal network, contributes 150ms of minimum latency to the terminal response. The same logic executed as five function calls inside a monolith takes microseconds. This only matters when latency is a user-facing concern — interactive UIs, APIs with SLAs, real-time pipelines — but it matters a lot in those contexts. The "just throw Varnish in front of it" solution does not work when the response contains personalized or real-time data.
The Prime Video story is worth the specifics. Their video quality monitoring system was originally built as microservices on AWS Lambda and Step Functions. The system worked, but at scale the inter-service communication costs and Lambda invocation costs grew with data volume. When they collapsed it into a monolith running on a single ECS service, costs dropped by 90% and scalability improved because the bottleneck had been the orchestration layer, not the processing logic. The key insight: their data pipeline had high throughput and low latency requirements between steps — exactly the workload profile where in-process function calls vastly outperform inter-service network calls. The microservices architecture had been chosen by default, not by analysis.
6. The Majestic Monolith and Modular Approaches
The 2026 landscape has produced a clearer vocabulary for the middle ground. The "Majestic Monolith" — a term popularized by DHH and the Rails community — describes a well-structured single-deployment application that deliberately eschews distribution until the evidence demands it. The "Modular Monolith" is its more formal cousin: a monolith organized around hard domain module boundaries enforced by tooling, not just convention.
For most teams at most stages, the modular monolith is the right default. This is not a consolation prize. It is the correct engineering decision given the available evidence. A modular monolith built with clean domain boundaries, interface-based module communication, and vertical slicing by feature is a genuinely production-grade architecture. It deploys as a single unit, which means one pipeline, one deployment, one set of dashboards. It fails as a single unit, which means one stack trace, one log stream, one place to look. And it can be decomposed incrementally when and if the evidence of scaling or team pressure appears.
The escalation path for a modular monolith is well-defined. You start with vertical feature slices: the orders module owns everything from the API endpoint to the database table, with no cross-module data access. Interfaces define the contract between modules. An internal event bus handles cross-cutting concerns like notifications and analytics without creating import cycles. When a specific module shows the characteristics that justify extraction — independent scaling needs, different deployment cadence, compliance isolation, team ownership friction — you apply the Strangler Fig and extract exactly that module. The rest of the system continues to run as before. The modular structure you built from day one means the extracted module already has a clean interface — you are adding a network boundary, not redesigning the module.
Mini-services occupy a useful middle ground that does not appear in most architectural discussions: one-concern-per-process without full microservices overhead. A worker process that handles asynchronous email sending is a mini-service. A cron process that runs nightly batch reconciliation is a mini-service. They deploy separately, can be scaled independently, and have narrow enough scope that they do not require a full distributed systems framework. They share the main application database under a single schema owner. This pattern gives you the deployment independence that matters (email sending can go down without affecting the main API) without the data consistency complexity of full service decomposition.
Internal service boundaries enforced by linting tools — the dependency-cruiser for JavaScript, import-linter for Python, custom Go module constraints — are the unglamorous work that makes the modular monolith actually work. Without enforcement, module boundaries drift. Engineers add a convenience import across a boundary. Then another. Within six months the modular structure exists in the documentation but not in the codebase. The architectural tests that enforce "orders module must not import from payments module" are the difference between a real modular monolith and a tangled one with aspirational documentation.
Conclusion
The honest decision framework is this: start with a modular monolith, organized around domain boundaries, with interfaces between modules and vertical feature slices. This is the correct default for new projects and for teams under 15-20 engineers working on a single product. Build it well — enforce the module boundaries with tooling, maintain clear interfaces, resist the urge to share database tables across module lines. You will have an architecture that is easy to understand, easy to debug, cheap to operate, and ready to decompose when the time comes.
Extract a service when you have a specific, demonstrable, quantified reason: a compliance boundary that scopes the audit surface, a scaling requirement that is costing real money, a deployment cadence mismatch that is causing real risk, or a team ownership conflict that is causing real friction. Not because your architecture looks like what Netflix presented at QCon. Not because your team has hit 30 engineers. Not because the new engineer from Google says that's how they do it there.
The companies that have gotten this right — Shopify, Stack Overflow, Basecamp, and even the Prime Video team when they made their reversal — have one thing in common: they made architectural decisions based on the specific problems in front of them, not on the architectural fashion of the moment. Microservices are not a destination. They are a tool. The tool has a real cost. Use it when the problem justifies the cost, and not before.
The worst outcome is a distributed monolith: all the operational complexity of microservices with all the tight coupling of a tangled monolith. It is achievable by extracting services without establishing clean domain boundaries, by sharing a database across services, or by building synchronous call chains without fault tolerance. Avoid it by doing the hard work of module design first, inside the monolith, before any extraction happens. Get the boundaries right in code before you promote them to network boundaries.
Start structured. Extract deliberately. Measure first.
Sources
- Shopify Engineering: Deconstructing the Monolith
- Amazon Prime Video: Scaling Up the Prime Video Audio/Video Monitoring Service
- Stack Overflow: Stack Overflow Architecture
- Martin Fowler: MonolithFirst
- Sam Newman: Building Microservices, 2nd Edition
- DHH: The Majestic Monolith
- DORA Research: State of DevOps Report 2024
Enjoyed this post? Follow AmtocSoft for AI tutorials from beginner to professional.
☕ Buy Me a Coffee | 🔔 YouTube | 💼 LinkedIn | 🐦 X/Twitter
Comments
Post a Comment