AmtocSoft Tech Insights: OpenTelemetry in Production: Distributed Tracing, Metrics, and Logs at Scale

Friday, April 17, 2026

OpenTelemetry in Production: Distributed Tracing, Metrics, and Logs at Scale

Introduction

A request enters your API gateway, hits three microservices, writes to a database, publishes to a message queue, and returns a response. Somewhere in that chain, 4% of requests take over 10 seconds. Your monitoring dashboards show the overall p99 latency. They cannot show you which service is responsible, which database query is the bottleneck, or which code path is exercised on slow requests.

This is the distributed tracing problem. And before OpenTelemetry standardized the observability ecosystem in 2023-2026, every vendor solved it differently. Switching from Datadog to Honeycomb meant re-instrumenting your entire codebase. Jaeger and Zipkin used different wire formats. Vendor lock-in was the price of observability.

OpenTelemetry (OTel) is a CNCF project that defines vendor-neutral APIs, SDKs, and a wire protocol for traces, metrics, and logs. Instrument once. Export to any backend — Datadog, Honeycomb, Jaeger, Tempo, Prometheus. Change backends by swapping one configuration line.

This post covers production OpenTelemetry: spans and context propagation, sampling strategies for high-volume services, the Collector architecture, metric instruments and cardinality, log correlation, and the operational patterns that make OTel work reliably at scale.

The Observability Three Pillars Problem

Before traces, metrics, and logs were unified, they were silos. Metrics showed something was wrong. Logs had the error details. Traces showed the request path. But correlating a spike in your error rate metric to the specific log lines and the exact trace that caused it required manual cross-referencing between three different tools.

OpenTelemetry's answer is correlation IDs. Every trace gets a trace_id. Every span within that trace shares the trace_id. Every log line emitted during a span can be tagged with the trace_id and span_id. When your metrics spike, you query the metric, find the time window, pull the traces for that window, and jump directly to the log lines from those traces. Three tools, one correlation key.

The unified data model also enables new analysis patterns. Instead of asking "what's the error rate?" (metric) or "what did the error say?" (log), you can ask "for every request that took over 2 seconds, which downstream service call contributed the most latency?" That question is only answerable with distributed traces.

Spans: The Building Block of Distributed Traces

A trace is a tree of spans. A span represents a unit of work: an HTTP request, a database query, a function call. Each span records its start time, duration, attributes (key-value metadata), events (timestamped log entries), and a status.

The Python OpenTelemetry SDK instruments this at multiple layers. Automatic instrumentation wraps common libraries — requests, django, sqlalchemy, redis-py — and creates spans without code changes. Manual instrumentation adds spans for your business logic.

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

# Provider setup (once at app startup)
provider = TracerProvider()
provider.add_span_processor(
    BatchSpanProcessor(OTLPSpanExporter(endpoint="http://otel-collector:4317"))
)
trace.set_tracer_provider(provider)

tracer = trace.get_tracer(__name__)

def process_order(order_id: str, user_id: str) -> dict:
    with tracer.start_as_current_span("process_order") as span:
        # Attributes: indexed, searchable, queryable
        span.set_attribute("order.id", order_id)
        span.set_attribute("user.id", user_id)

        order = fetch_order(order_id)
        span.set_attribute("order.total_cents", order.total_cents)
        span.set_attribute("order.item_count", len(order.items))

        if order.total_cents > 100_000:
            # Events: timestamped log entries attached to the span
            span.add_event("high_value_order_detected", {
                "threshold_cents": 100_000,
                "actual_cents": order.total_cents
            })

        result = charge_payment(order)

        if result.error:
            # Status communicates success/failure to trace backends
            span.set_status(trace.StatusCode.ERROR, result.error_message)
            span.record_exception(result.exception)

        return result

The BatchSpanProcessor buffers spans in memory and exports them in batches — critical for production performance. The SimpleSpanProcessor (for development) exports synchronously, which adds latency to every request.

Context Propagation: How Traces Cross Service Boundaries

A trace is only useful if it captures the entire request path across all services. When service A calls service B, service B needs to know it's part of the same trace. This happens via context propagation — injecting trace context into outgoing requests and extracting it from incoming requests.

The W3C TraceContext standard defines the traceparent HTTP header:
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01

The format is version-trace_id-parent_span_id-flags. Any service that extracts this header can create child spans under the same trace.

OpenTelemetry auto-instrumentation handles this automatically for HTTP frameworks. For custom transports (gRPC metadata, message queue headers, custom protocols), you inject and extract manually:

from opentelemetry import trace, propagate
from opentelemetry.propagators.b3 import B3MultiFormat

# Inject context into outgoing message queue message
def publish_event(event: dict) -> None:
    headers = {}
    # Injects trace context into the headers dict
    propagate.inject(headers)

    kafka_producer.produce(
        topic="order_events",
        value=json.dumps(event),
        headers=headers  # trace context travels with the message
    )

# Extract context from incoming message queue message  
def consume_event(message) -> None:
    # Extract trace context — consumer becomes a child span
    context = propagate.extract(dict(message.headers()))

    with tracer.start_as_current_span(
        "process_event",
        context=context,  # attaches to the producer's trace
        kind=trace.SpanKind.CONSUMER
    ) as span:
        span.set_attribute("messaging.system", "kafka")
        span.set_attribute("messaging.destination", message.topic())
        process(json.loads(message.value()))

Without explicit propagation through message queues, your trace would show the HTTP request to your API service, then stop. The downstream async processing would appear as a separate, disconnected trace — giving you no visibility into the full request lifecycle.

Sampling Strategies for High-Volume Services

At 10,000 requests per second, tracing every request generates 10,000 spans per second, per service. A five-service call chain produces 50,000 spans per second. Storage costs become significant. Trace backend ingestion has limits. Sampling is required.

from opentelemetry.sdk.trace.sampling import (
    TraceIdRatioBased, 
    ParentBased,
    ALWAYS_ON,
    ALWAYS_OFF
)

# Head-based sampling: decision made at trace start
# ParentBased respects parent sampling decision
# 10% sample rate for new traces without a parent
sampler = ParentBased(
    root=TraceIdRatioBased(0.1),  # 10% of new traces
    remote_parent_sampled=ALWAYS_ON,    # if parent sampled, sample this too
    remote_parent_not_sampled=ALWAYS_OFF  # if parent not sampled, don't
)

provider = TracerProvider(sampler=sampler)

Head-based sampling (decision at trace start) is simple but misses rare high-value events. A 10% sample rate means 90% of your error traces are discarded.

Tail-based sampling (decision after trace completes) is more powerful: keep all error traces, all slow traces, and sample the rest. The OpenTelemetry Collector supports tail-based sampling via the tailsamplingprocessor:

# otel-collector-config.yaml
processors:
  tail_sampling:
    decision_wait: 10s
    num_traces: 100000
    policies:
      # Always keep error traces
      - name: errors
        type: status_code
        status_code: {status_codes: [ERROR]}

      # Keep slow traces (p99 visibility)
      - name: slow-traces
        type: latency
        latency: {threshold_ms: 1000}

      # Sample 5% of everything else
      - name: probabilistic
        type: probabilistic
        probabilistic: {sampling_percentage: 5}

The Collector buffers traces for decision_wait seconds before making the sampling decision — long enough for all spans in a trace to arrive. This requires more memory than head-based sampling (buffering 100K traces × N spans each) but produces dramatically better signal for debugging.

The OpenTelemetry Collector: Gateway, Processor, and Router

The Collector is the central piece of production OTel deployments. Instead of each service exporting directly to your observability backend, services send to a local Collector agent (sidecar or DaemonSet), which processes and routes telemetry to multiple backends.

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:   # port 4317
        endpoint: 0.0.0.0:4317
      http:   # port 4318
        endpoint: 0.0.0.0:4318

processors:
  batch:              # batch spans for efficiency
    timeout: 1s
    send_batch_size: 1024

  memory_limiter:     # prevent OOM under load
    check_interval: 1s
    limit_mib: 500

  resourcedetection:  # enrich with k8s/host metadata
    detectors: [env, k8s_node, k8s_pod, docker]

  attributes/add_env: # add environment tag to all telemetry
    actions:
      - key: deployment.environment
        value: production
        action: insert

exporters:
  otlp/honeycomb:
    endpoint: api.honeycomb.io:443
    headers:
      x-honeycomb-team: ${HONEYCOMB_API_KEY}

  prometheus:        # metrics exposed for Prometheus scrape
    endpoint: 0.0.0.0:8888

  loki:              # logs to Grafana Loki
    endpoint: http://loki:3100/loki/api/v1/push

service:
  pipelines:
    traces:
      receivers:  [otlp]
      processors: [memory_limiter, batch, resourcedetection, attributes/add_env]
      exporters:  [otlp/honeycomb]
    metrics:
      receivers:  [otlp]
      processors: [batch, resourcedetection]
      exporters:  [prometheus]
    logs:
      receivers:  [otlp]
      processors: [batch, resourcedetection]
      exporters:  [loki]

The Collector architecture enables fan-out: send traces to both Honeycomb (for developer debugging) and Jaeger (for security/compliance) simultaneously. It enables backend migration: add a new exporter, verify parity, remove the old one — no application code changes.

Metrics: Instrument Types and the Cardinality Problem

OpenTelemetry metrics are more rigorous than Prometheus metrics in their distinction between instrument types:

from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.exporter.prometheus import PrometheusExporter

meter = metrics.get_meter(__name__)

# Counter: monotonically increasing (requests served, errors)
request_counter = meter.create_counter(
    name="http.server.request_count",
    description="Total HTTP requests",
    unit="requests"
)

# Histogram: distribution of values (latency, request size)
# Replaced Gauge for latency — gives p50, p95, p99
latency_histogram = meter.create_histogram(
    name="http.server.request_duration",
    description="HTTP request latency",
    unit="ms",
    explicit_bucket_boundaries_advisory=[5, 10, 25, 50, 100, 250, 500, 1000, 2500, 5000]
)

# UpDownCounter: can increase or decrease (active connections, queue depth)
active_requests = meter.create_up_down_counter(
    name="http.server.active_requests",
    description="Currently active HTTP requests"
)

# Observable Gauge: polled on collection (memory usage, cache size)
def observe_cache_size(options):
    yield metrics.Observation(len(cache), {"cache.name": "user_sessions"})

cache_gauge = meter.create_observable_gauge(
    name="cache.size",
    callbacks=[observe_cache_size],
    description="Current cache entry count"
)

# Usage in request handler
def handle_request(request):
    active_requests.add(1, {"http.method": request.method})
    start = time.time()

    try:
        response = process(request)
        request_counter.add(1, {
            "http.method": request.method,
            "http.route": request.route,
            "http.status_code": response.status_code
        })
        return response
    finally:
        latency_histogram.record(
            (time.time() - start) * 1000,
            {"http.method": request.method, "http.route": request.route}
        )
        active_requests.add(-1, {"http.method": request.method})

The cardinality trap: every unique combination of attribute values creates a new time series. Adding user_id as an attribute to a request counter creates one time series per user. At 100,000 users, that's 100,000 time series — a cardinality explosion that will OOM your Prometheus or Thanos.

High-cardinality values (user IDs, request IDs, IP addresses) belong in traces as span attributes — where they're indexed per-request. Metrics should use low-cardinality attributes: HTTP method, route template, status code, region, service name. The rule: metrics for "how many?", traces for "which one?".

Log Correlation: Connecting Logs to Traces

The highest-value feature of OpenTelemetry in production is log-trace correlation. When you know the trace ID of a slow or failing request, you can query your log backend for every log line emitted during that trace. This replaces manual log trawling with direct navigation.

import structlog
from opentelemetry import trace

def get_otel_context_processor(logger, method, event_dict):
    """Inject current trace context into every log entry."""
    span = trace.get_current_span()
    if span.is_recording():
        ctx = span.get_span_context()
        event_dict["trace_id"] = format(ctx.trace_id, "032x")
        event_dict["span_id"] = format(ctx.span_id, "016x")
        event_dict["trace_flags"] = ctx.trace_flags
    return event_dict

# Configure structlog with OTel context injection
structlog.configure(
    processors=[
        structlog.contextvars.merge_contextvars,
        get_otel_context_processor,         # injects trace_id, span_id
        structlog.processors.TimeStamper(fmt="iso"),
        structlog.processors.JSONRenderer(),
    ]
)

logger = structlog.get_logger()

def charge_payment(order_id: str, amount_cents: int) -> dict:
    with tracer.start_as_current_span("charge_payment") as span:
        span.set_attribute("payment.amount_cents", amount_cents)

        # Every log line now includes trace_id and span_id automatically
        logger.info("payment_initiated", 
                   order_id=order_id, 
                   amount_cents=amount_cents)

        result = payment_gateway.charge(amount_cents)

        if result.declined:
            logger.warning("payment_declined",
                          order_id=order_id,
                          decline_reason=result.reason)

        return result

In Grafana with Loki and Tempo configured together: click on a trace span, click "Related logs", and every log line with a matching trace_id appears. The correlation replaces a 10-minute manual investigation with a 10-second click.

Production Operations: Performance Overhead and Reliability

OpenTelemetry adds latency. The SDK creates spans, serializes them, and sends them over the network. Measured overhead in production:

Auto-instrumentation (Python/Node): 0.5-2ms per request for span creation
BatchSpanProcessor: ~0.1ms amortized (async export)
Attribute setting: ~0.01ms per attribute
Network to Collector: near-zero (async, batched, local sidecar)

For most services, this is acceptable. For extremely latency-sensitive paths (sub-millisecond processing), use sampling to reduce span volume and remove attributes from the hot path.

The Collector's reliability is critical: if it goes down, where do your spans go? Configure retry and the OTLP exporter's retry_on_failure:

from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace.export import RetryingSpanExporter

# Retry failed exports with exponential backoff
exporter = RetryingSpanExporter(
    OTLPSpanExporter(endpoint="http://otel-collector:4317"),
    max_attempts=5,
    initial_delay=1.0,
    max_delay=30.0,
    multiplier=2.0
)

For the Collector itself, run it as a DaemonSet (one per node) in Kubernetes, not as a central deployment. This keeps telemetry traffic on-node, eliminates a cross-node network bottleneck, and provides node-level redundancy. Configure memory_limiter to prevent the Collector from OOMing under spike load — it will drop telemetry rather than crash.

Exemplars: Connecting Metrics to Traces

The most powerful observability pattern that OpenTelemetry enables is exemplars — specific trace IDs embedded in metric data points. When your p99 latency spikes, instead of searching for a representative slow trace, the metric data point contains the trace ID of a request that exemplifies the spike.

from opentelemetry.sdk.metrics._internal.point import Exemplar

# Histogram with exemplar recording (auto-enabled with OTel SDK)
# The SDK automatically records exemplars when a span is active
with tracer.start_as_current_span("handle_request") as span:
    start = time.time()
    result = process_request(request)

    # SDK automatically attaches trace_id to this histogram recording
    # because a span is active — no explicit code needed
    latency_histogram.record(
        (time.time() - start) * 1000,
        attributes={"http.route": request.route}
    )

In Grafana with Tempo as the trace backend: your Prometheus latency graph shows a spike. Click on the spike. Grafana extracts the exemplar trace ID and opens the exact trace in Tempo. You go from "something is slow" to "this specific request, this specific database query" in one click.

Prometheus supports exemplars natively since version 2.26. Configure the scrape endpoint to expose them:

# prometheus.yml
scrape_configs:
  - job_name: 'my-service'
    scrape_interval: 15s
    sample_limit: 0
    honor_timestamps: true
    # Exemplars enabled by default with OTel SDK Prometheus exporter

The exemplar workflow is the concrete realization of the "three pillars" vision: metrics for detection, traces for investigation, one click connecting them.

Multi-Language Trace Propagation

In practice, your system has services in Python, Go, Java, and Node.js. OpenTelemetry's language SDKs all implement the same W3C TraceContext standard — traces propagate transparently across language boundaries.

Go instrumentation pattern:

import (
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/trace"
    "go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
)

var tracer = otel.Tracer("payment-service")

func ProcessPayment(ctx context.Context, req PaymentRequest) (*PaymentResult, error) {
    ctx, span := tracer.Start(ctx, "process_payment",
        trace.WithAttributes(
            attribute.Int64("payment.amount_cents", req.AmountCents),
            attribute.String("payment.currency", req.Currency),
        ),
    )
    defer span.End()

    result, err := chargeCard(ctx, req)
    if err != nil {
        span.RecordError(err)
        span.SetStatus(codes.Error, err.Error())
        return nil, err
    }

    span.SetAttributes(attribute.String("payment.transaction_id", result.TransactionID))
    return result, nil
}

// HTTP server with automatic OTel instrumentation
mux := http.NewServeMux()
mux.HandleFunc("/payment", handlePayment)
// otelhttp wraps the handler: auto-creates spans, propagates context
http.ListenAndServe(":8080", otelhttp.NewHandler(mux, "payment-server"))

The trace flows: Python API service creates the root span → calls Go payment service over HTTP → otelhttp extracts the trace context → tracer.Start creates a child span → the complete trace visible in Honeycomb/Jaeger shows both services.

Debugging with OTel: A Production Incident Walkthrough

The practical value of OpenTelemetry becomes clear in incident response. A concrete example of the debugging workflow:

Scenario: 5% error rate spike on the /api/checkout endpoint. P99 latency increased from 340ms to 4.2 seconds.

Without OTel: Check application logs. Search for "error" across all services. Try to correlate log timestamps. Look at database slow query logs. 45-60 minute investigation.

With OTel:

Step 1: Metric alert fires. Open Grafana dashboard. The error rate metric for http.route=/api/checkout, http.status_code=500 shows the spike at 14:23 UTC.

Step 2: Filter traces for the same time window, route, and status=error. Ten error traces appear. All share a common attribute: db.statement containing a table scan pattern.

Step 3: Open the slowest trace. The span tree shows:

checkout_handler (4.1s)
├── validate_cart (12ms)
├── check_inventory (8ms)  
├── calculate_tax (6ms)
└── create_order (4.07s)        ← bottleneck
    ├── begin_transaction (1ms)
    ├── insert_order (2ms)
    ├── check_fraud_score (4.0s)  ← the problem
    │   └── external_api_call (4.0s, status=timeout)
    └── (never reached)

Step 4: The fraud score service has a 4-second timeout on its external API call, with no circuit breaker. The external API is slow. Every checkout is blocking for 4 seconds.

Step 5: Jump to correlated logs. The fraud service log lines (matched by trace_id) show the external API returning HTTP 503 with a Retry-After: 60 header — which the service is ignoring.

Total time: 8 minutes from alert to root cause. The trace made the bottleneck structurally visible.

Kubernetes Deployment Patterns

In Kubernetes, the standard OTel deployment is a DaemonSet Collector with a sidecar pattern for resource-intensive processing:

# Collector DaemonSet — one per node
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: otel-collector
spec:
  selector:
    matchLabels:
      app: otel-collector
  template:
    spec:
      containers:
      - name: otel-collector
        image: otel/opentelemetry-collector-contrib:0.95.0
        resources:
          requests: {memory: "200Mi", cpu: "100m"}
          limits:   {memory: "500Mi", cpu: "500m"}
        volumeMounts:
        - name: config
          mountPath: /etc/otel
      volumes:
      - name: config
        configMap:
          name: otel-collector-config
---
# Application deployment
apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      containers:
      - name: api-service
        env:
        # Point to the DaemonSet Collector on the same node
        - name: OTEL_EXPORTER_OTLP_ENDPOINT
          value: "http://$(HOST_IP):4317"
        - name: HOST_IP
          valueFrom:
            fieldRef:
              fieldPath: status.hostIP
        - name: OTEL_SERVICE_NAME
          value: "payment-api"
        - name: OTEL_RESOURCE_ATTRIBUTES
          value: "deployment.environment=production,service.version=$(APP_VERSION)"

The HOST_IP environment variable resolves to the node's IP, routing telemetry to the local DaemonSet Collector pod. This avoids centralized collector bottlenecks and keeps latency minimal.

Auto-Instrumentation: Zero-Code Observability

For Python services, OpenTelemetry's auto-instrumentation agent wraps common frameworks without any code changes. A single command instruments Django, Flask, FastAPI, SQLAlchemy, Redis, and Celery simultaneously:

# Install the agent
pip install opentelemetry-distro opentelemetry-exporter-otlp
opentelemetry-bootstrap -a install

# Run your app with the agent
OTEL_SERVICE_NAME=payment-api \
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317 \
opentelemetry-instrument python manage.py runserver

Every HTTP request automatically gets a span. Every SQLAlchemy query automatically gets a child span with the SQL statement, table, and duration. No code changes. The auto-instrumentation baseline gives you 80% of the observability value — traces for every inbound request and every database call — before you write a single manual span.

Manual spans layer on top: add them for business logic units (process_order, validate_cart, charge_payment) and for custom attributes that aren't captured automatically (order.total_cents, user.tier, feature_flag.value). The combination of automatic framework coverage plus targeted manual spans produces a complete picture with minimal instrumentation burden.

Conclusion

OpenTelemetry in 2026 is the observability standard for distributed systems. The value proposition is concrete: instrument once with vendor-neutral APIs, correlate logs and traces with shared trace IDs, sample intelligently with the Collector's tail-based policies, and switch backends without re-instrumenting.

The operational investment is front-loaded: Collector deployment, sampling policy configuration, and establishing attribute naming conventions across teams. Once the infrastructure is in place, every new service gets observability out of the box through auto-instrumentation.

The debugging workflow that emerges — metric spike → trace query → correlated log lines → root cause identified — replaces hours of manual investigation with minutes. That ROI justifies the setup cost in the first incident it shortens.

Sources

About the Author

Toc Am

Founder of AmtocSoft. Writing practical deep-dives on AI engineering, cloud architecture, and developer tooling. Previously built backend systems at scale. Reviews every post published under this byline.

LinkedIn X / Twitter

Published: 2026-04-17 · Written with AI assistance, reviewed by Toc Am.

Get These In Your Inbox

Weekly deep-dives on AI engineering, no fluff. Join the newsletter →

Subscribe (free)

Or grab the book ($39, ~100 pages) · Buy me a coffee

☕ Buy Me a Coffee · 🔔 YouTube · 💼 LinkedIn · 🐦 X/Twitter

AmtocSoft Tech Insights