A monolithic app is manageable with a single log file. Following a request that hops through 20 microservices, though, requires distributed tracing. OpenTelemetry (OTel) is the CNCF standard for traces, metrics and logs — vendor-agnostic, with SDKs in every major language.
Core Concepts
- Trace: the entire journey of a request (root span plus its children)
- Span: a single unit of work within a trace (DB call, HTTP request)
- Trace context: ID propagated between services (W3C traceparent header)
- Exporter: sends traces to a backend (Jaeger, Zipkin, OTLP)
Node.js Auto-Instrumentation
npm i @opentelemetry/api @opentelemetry/sdk-node \
@opentelemetry/auto-instrumentations-node \
@opentelemetry/exporter-trace-otlp-http
// tracing.js
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');
const { Resource } = require('@opentelemetry/resources');
const { SemanticResourceAttributes } = require('@opentelemetry/semantic-conventions');
const sdk = new NodeSDK({
resource: new Resource({
[SemanticResourceAttributes.SERVICE_NAME]: 'webapp',
[SemanticResourceAttributes.SERVICE_VERSION]: '1.2.3'
}),
traceExporter: new OTLPTraceExporter({
url: 'http://otel-collector:4318/v1/traces'
}),
instrumentations: [getNodeAutoInstrumentations()]
});
sdk.start();
process.on('SIGTERM', () => sdk.shutdown());
# Load tracing.js before server.js
node -r ./tracing.js server.js
Manual Spans
const { trace, SpanStatusCode } = require('@opentelemetry/api');
const tracer = trace.getTracer('webapp');
async function calculateReport(userId) {
return tracer.startActiveSpan('calculate-report', async span => {
span.setAttribute('user.id', userId);
try {
const data = await fetchData(userId);
span.setAttribute('data.count', data.length);
const result = processData(data);
span.setStatus({ code: SpanStatusCode.OK });
return result;
} catch (err) {
span.recordException(err);
span.setStatus({ code: SpanStatusCode.ERROR, message: err.message });
throw err;
} finally {
span.end();
}
});
}
Context Propagation
When one service makes an HTTP call to another, the traceparent header is added automatically. The receiver reads it and continues the same trace.
# Header added automatically
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
# version-trace_id-span_id-flags
OTel Collector
Rather than each service shipping straight to Jaeger, the Collector in the middle receives traces, samples them, enriches them and forwards them to backends. It's the fan-out point of the architecture.
# otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc: { endpoint: 0.0.0.0:4317 }
http: { endpoint: 0.0.0.0:4318 }
processors:
batch: {}
tail_sampling:
policies:
- name: errors
type: status_code
status_code: { status_codes: [ERROR] }
- name: slow
type: latency
latency: { threshold_ms: 500 }
- name: baseline
type: probabilistic
probabilistic: { sampling_percentage: 1 }
exporters:
otlp/jaeger:
endpoint: jaeger:4317
tls: { insecure: true }
logging: {}
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch, tail_sampling]
exporters: [otlp/jaeger, logging]
Jaeger UI
Jaeger can run as a single Docker container. Search traces by service, operation, duration or tag, and see the timeline visually.
docker run -d --name jaeger \
-p 4317:4317 \
-p 16686:16686 \
jaegertracing/all-in-one:latest
# UI: http://localhost:16686
Sampling Strategy
- Head sampling: decide at the start of the request — cheap, but can miss error traces
- Tail sampling: collect the whole trace, then decide (at the collector). Errors are always kept
- Probabilistic: 1-5% sampling is usually enough
- Rate-limiting: max N traces per second
Integrating Metrics and Logs
OTel isn't just traces — it exports metrics (a Prometheus alternative) and logs through the same SDK. With all three together you can jump from a span to related logs and metrics (Observability 2.0).
Vendor Choice
- Self-hosted: Jaeger, Tempo + Grafana, SigNoz
- SaaS: Datadog APM, New Relic, Honeycomb, Lightstep
- OTel's biggest advantage: vendor-agnostic — swap the exporter and you swap the vendor
Conclusion
Distributed tracing is the single highest-ROI observability investment in a microservice architecture. It answers "which service slowed things down?" in seconds. OpenTelemetry is the lingua franca of this space — if you're starting fresh, go with OTel.
Reach out to KEYDAL for OpenTelemetry, Jaeger/Tempo setup and microservice observability. Contact us