A monolithic app is manageable with a single log file. Following a request that hops through 20 microservices, though, requires distributed tracing. OpenTelemetry (OTel) is the CNCF standard for traces, metrics and logs — vendor-agnostic, with SDKs in every major language.

Core Concepts

  • Trace: the entire journey of a request (root span plus its children)
  • Span: a single unit of work within a trace (DB call, HTTP request)
  • Trace context: ID propagated between services (W3C traceparent header)
  • Exporter: sends traces to a backend (Jaeger, Zipkin, OTLP)

Node.js Auto-Instrumentation

npm i @opentelemetry/api @opentelemetry/sdk-node \
      @opentelemetry/auto-instrumentations-node \
      @opentelemetry/exporter-trace-otlp-http
// tracing.js
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');
const { Resource } = require('@opentelemetry/resources');
const { SemanticResourceAttributes } = require('@opentelemetry/semantic-conventions');

const sdk = new NodeSDK({
    resource: new Resource({
        [SemanticResourceAttributes.SERVICE_NAME]: 'webapp',
        [SemanticResourceAttributes.SERVICE_VERSION]: '1.2.3'
    }),
    traceExporter: new OTLPTraceExporter({
        url: 'http://otel-collector:4318/v1/traces'
    }),
    instrumentations: [getNodeAutoInstrumentations()]
});

sdk.start();

process.on('SIGTERM', () => sdk.shutdown());
# Load tracing.js before server.js
node -r ./tracing.js server.js

Manual Spans

const { trace, SpanStatusCode } = require('@opentelemetry/api');
const tracer = trace.getTracer('webapp');

async function calculateReport(userId) {
    return tracer.startActiveSpan('calculate-report', async span => {
        span.setAttribute('user.id', userId);
        try {
            const data = await fetchData(userId);
            span.setAttribute('data.count', data.length);
            const result = processData(data);
            span.setStatus({ code: SpanStatusCode.OK });
            return result;
        } catch (err) {
            span.recordException(err);
            span.setStatus({ code: SpanStatusCode.ERROR, message: err.message });
            throw err;
        } finally {
            span.end();
        }
    });
}

Context Propagation

When one service makes an HTTP call to another, the traceparent header is added automatically. The receiver reads it and continues the same trace.

# Header added automatically
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
# version-trace_id-span_id-flags

OTel Collector

Rather than each service shipping straight to Jaeger, the Collector in the middle receives traces, samples them, enriches them and forwards them to backends. It's the fan-out point of the architecture.

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc: { endpoint: 0.0.0.0:4317 }
      http: { endpoint: 0.0.0.0:4318 }

processors:
  batch: {}
  tail_sampling:
    policies:
      - name: errors
        type: status_code
        status_code: { status_codes: [ERROR] }
      - name: slow
        type: latency
        latency: { threshold_ms: 500 }
      - name: baseline
        type: probabilistic
        probabilistic: { sampling_percentage: 1 }

exporters:
  otlp/jaeger:
    endpoint: jaeger:4317
    tls: { insecure: true }
  logging: {}

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch, tail_sampling]
      exporters: [otlp/jaeger, logging]

Jaeger UI

Jaeger can run as a single Docker container. Search traces by service, operation, duration or tag, and see the timeline visually.

docker run -d --name jaeger \
    -p 4317:4317 \
    -p 16686:16686 \
    jaegertracing/all-in-one:latest
# UI: http://localhost:16686

Sampling Strategy

  • Head sampling: decide at the start of the request — cheap, but can miss error traces
  • Tail sampling: collect the whole trace, then decide (at the collector). Errors are always kept
  • Probabilistic: 1-5% sampling is usually enough
  • Rate-limiting: max N traces per second

Integrating Metrics and Logs

OTel isn't just traces — it exports metrics (a Prometheus alternative) and logs through the same SDK. With all three together you can jump from a span to related logs and metrics (Observability 2.0).

Vendor Choice

  • Self-hosted: Jaeger, Tempo + Grafana, SigNoz
  • SaaS: Datadog APM, New Relic, Honeycomb, Lightstep
  • OTel's biggest advantage: vendor-agnostic — swap the exporter and you swap the vendor

Conclusion

Distributed tracing is the single highest-ROI observability investment in a microservice architecture. It answers "which service slowed things down?" in seconds. OpenTelemetry is the lingua franca of this space — if you're starting fresh, go with OTel.

Distributed tracing setup

Reach out to KEYDAL for OpenTelemetry, Jaeger/Tempo setup and microservice observability. Contact us

WhatsApp