How do I connect Galileo to OpenTelemetry and our existing tracing pipeline (collector/exporter setup)?
LLM Observability & Evaluation

How do I connect Galileo to OpenTelemetry and our existing tracing pipeline (collector/exporter setup)?

10 min read

Most teams already have OpenTelemetry wired into their stack long before they bring Galileo in. The question isn’t “Can Galileo work with OpenTelemetry?”—it’s “How do we plug Galileo into our existing traces and collector/exporter pipeline without rebuilding everything?” This guide walks through that integration so you can get full eval and guardrail coverage on your LLM traffic while keeping your current observability setup intact.

Quick Answer: You connect Galileo to OpenTelemetry by forwarding the right sessions → traces → spans from your existing collectors into Galileo’s ingestion endpoint, then instrumenting your LLM/agent calls with Galileo’s SDKs so evaluation signals, guardrail decisions, and Luna-2 scores become part of the same trace you already send to Datadog, New Relic, Grafana, etc.


The Quick Overview

  • What It Is: A way to integrate Galileo’s Agent Observability Platform with your existing OpenTelemetry tracing pipeline (collector/exporter) so LLM and agent behavior is evaluated, guarded, and monitored using the same telemetry backbone you already trust.
  • Who It Is For: Platform, ML, and app teams running OpenTelemetry in production who want Galileo’s evaluation, Signals, and Protect capabilities without duplicating their observability stack.
  • Core Problem Solved: You avoid building a parallel “AI-only” monitoring system. Instead, Galileo consumes and enriches your existing traces, runs evaluators (including Luna-2) on them, and feeds results back into the same observability and incident workflows that your teams already use.

How It Works

At a high level, you’re doing three things:

  1. Ensuring your LLM and agent requests are represented as OpenTelemetry sessions → traces → spans.
  2. Configuring your OpenTelemetry collector/exporter to forward relevant spans to Galileo’s ingestion endpoint.
  3. Using Galileo SDKs and the Evaluation Engine to attach eval results, guardrail scores, and actions (block, redact, override, webhook) back onto the trace, so you get one unified view.

Once wired up, the flow looks like this:

  • Your app or agent runtime emits OTel spans for user sessions, tool calls, and LLM interactions.
  • The OpenTelemetry collector receives those spans and fans them out to:
    • Your existing backend(s) (e.g., Datadog, Prometheus, Grafana, Honeycomb).
    • Galileo’s ingestion service.
  • Galileo ingests traces, runs 20+ out-of-the-box evaluators and any custom evaluators you define, distills them into Luna / Luna-2 for low-latency scoring, and sends guardrail decisions back into the trace.
  • In production, Protect uses these scores in sub-200ms to intercept risky behavior (hallucinations, prompt injection, PII leaks, policy drift, bad tool actions) and trigger actions—block, redact, override, or webhook-based escalation—without you touching your OTel topology again.

1. Instrument sessions → traces → spans

You can’t guard what you don’t see. Your first move is making sure your agent and LLM workflows are modeled as traces:

  1. Session: A user journey (e.g., one support case, one research task, one internal workflow).
  2. Trace: A single end-to-end run of your assistant or agent for that session.
  3. Spans: The steps inside the trace:
    • Input validation span
    • Retrieval span (RAG query)
    • LLM generation span
    • Tool call spans (e.g., CRM lookup, ticket creation)
    • Post-processing / response formatting span

Concretely:

  • Use your language’s OpenTelemetry SDK (e.g., @opentelemetry/sdk-node, opentelemetry-instrumentation-fastapi) to:
    • Create a root span for each user request.
    • Wrap each significant agent step in its own span.
  • Add attributes that Galileo will use for evaluation, such as:
    • llm.prompt
    • llm.response
    • llm.model_name
    • llm.provider
    • agent.tool.name
    • agent.tool.args
    • rag.query
    • rag.top_k
    • rag.context_docs (or references to them)

Galileo’s Agent Observability Platform is built around this structure. Once spans carry these details, Galileo can compute hallucination risk, retrieval quality, wrong-tool selection, and more—without your team inventing yet another logging format.

2. Configure the OpenTelemetry collector → Galileo

Next, you extend your existing collector config to send a copy of relevant telemetry to Galileo. The pattern:

  • Keep your current exporters (Datadog, OTLP, Jaeger, etc.).
  • Add Galileo as an additional OTLP/HTTP or OTLP/gRPC exporter.
  • Use processor rules to:
    • Filter: Only send spans that represent AI/LLM/agent behavior (to control volume and costs).
    • Normalize: Ensure attributes Galileo expects are present and consistently named.

A simplified example (YAML-like pseudocode):

exporters:
  otlp/galileo:
    endpoint: https://ingest.galileo.ai/otlp
    headers:
      authorization: Bearer ${GALILEO_API_KEY}

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp/datadog, otlp/galileo]

Typical steps in detail:

  1. Get Galileo ingestion details

    • From Galileo’s UI or onboarding docs:
      • OTLP endpoint (HTTP or gRPC)
      • Required headers (API key / auth token)
      • Any tenant or environment identifiers
  2. Add a Galileo exporter

    • In your collector configuration, define an otlp/galileo exporter with the endpoint and headers.
    • Respect your network and compliance requirements (SaaS, VPC-peered, or on-prem Galileo deployment).
  3. Wire exporter into the traces pipeline

    • Add otlp/galileo to the exporters list for your traces pipeline.
    • Optionally create a dedicated traces/ai pipeline that only receives spans with service.name = "ai-service" or attributes like llm.model_name to control which traces Galileo sees.
  4. Apply selectors if you’re in a noisy environment

    • Use filter or attributes processors to:
      • Drop non-LLM spans.
      • Rename attributes to match Galileo’s recommended schema (e.g., request.body.promptllm.prompt).

The result: your existing collector remains the hub. You don’t add new agents to every service—you just add Galileo as another export target for AI-related traces.

3. Attach Galileo SDKs and evaluators

With traces flowing, you want Galileo to do more than “view logs.” Galileo Evaluate + Luna-2 turn those traces into continuous evals and production guardrails.

You typically:

  1. Install the Galileo SDK

    • In your AI app or agent service (Python, Node.js, etc.).
    • The SDK:
      • Helps you standardize span attributes for LLM prompts/responses.
      • Can emit structured events (e.g., evaluation results) that Galileo links to the corresponding span.
  2. Define or select evaluators in the Evaluation Engine

    • Use 20+ out-of-the-box evaluators for:
      • RAG: context relevance, citation consistency, missing context.
      • Safety and Security: prompt injection, PII leak, jailbreaking, policy violations.
      • Agents: wrong tool action, tool failure handling, loop detection.
    • Create custom evaluators using:
      • A description and examples to generate an LLM-as-judge evaluator.
      • SME annotations and CLHF-style few-shot tuning to align with your domain.
  3. Distill evaluators to Luna / Luna-2 for production

    • Galileo takes your evaluators and distills them into compact small language models.
    • These run in Galileo’s inference stack with:
      • Sub-200ms guardrail scoring latency.
      • ~97% lower cost than GPT-style judges, even at 100% traffic coverage.
    • This is the difference between a demo (sampled evals on a subset of traces) and real reliability (always-on evaluation on all traffic).
  4. Feed eval outputs back into spans

    • Galileo attaches metrics like:
      • galileo.hallucination_score
      • galileo.prompt_injection_score
      • galileo.pii_risk_score
      • galileo.rag_context_coverage
      • galileo.agent_tool_misuse
    • These become span attributes or linked events visible in both:
      • Galileo’s Agent Insights and custom dashboards.
      • Your existing APM / observability tools via the same OTel trace.

Now, every trace doesn’t just show “what happened”; it shows “was this safe, correct, and aligned with policy?” in a machine-readable way.


Features & Benefits Breakdown

Core FeatureWhat It DoesPrimary Benefit
OTel-native ingestionConsumes traces from your existing OpenTelemetry collectors and exporters.No parallel pipeline; Galileo snaps into your current tracing topology.
Eval-to-guardrail lifecycleTurns evaluation results (offline + live) into guardrail policies enforced in production.You don’t just observe failures—you intercept them before users see them.
Luna-2 production evaluatorsDistills evaluators into compact models served on Galileo’s inference stack.Run 10–20 guardrail metrics per request with sub-200ms latency and ~97% lower cost than heavyweight LLM judges.

Ideal Use Cases

  • Best for teams standardizing on OpenTelemetry: Because you want all agent and LLM behavior observable via the same sessions → traces → spans model, and you don’t want a special-case “AI monitoring” stack that diverges from everything else.
  • Best for production agents and RAG systems under strict SLAs: Because you must evaluate and guard 100% of traffic in real time, not just sample traces, and you need Galileo’s Protect to intercept risky outputs or tool calls without blowing your latency budget.

Limitations & Considerations

  • Attribute schema alignment: Galileo isn’t magic if your spans don’t carry the right data. You’ll need to standardize attributes for prompts, responses, RAG context, and tools. The workaround is to use Galileo’s recommended schema and adjust via collector processors or SDK helpers.
  • Trace volume and cost management: Sending every single span from every service to Galileo may be overkill. Typically, you restrict export to AI/agent workloads. Start with targeting services and spans that touch your LLMs or agents, then increase coverage as needed.

Pricing & Plans

Pricing depends on scale (traces per month), deployment model (SaaS, VPC, on-prem), and how heavily you use Protect and Luna-2 for real-time guardrailing. The common pattern:

  • Evaluate in development, then promote those evaluators as production guardrails once they’ve proven their value and precision.

  • Cover 100% of your AI traffic in production with Luna-2-powered evaluation while keeping costs predictable.

  • Team / Growth Plan: Best for product and ML teams shipping their first serious agent or RAG system into production and needing evaluation, experimentation, and basic CI/CD quality gates integrated with their existing OpenTelemetry pipeline.

  • Enterprise Plan: Best for larger organizations with multiple AI workloads, strict compliance requirements, and a central platform team—teams that need dedicated inference resources, VPC or on-prem deployment, SSO, SOC 2 Type II posture, HIPAA-eligible infrastructure with BAAs, and deep integration into existing observability and incident tooling.

(For detailed pricing, custom SLAs, or deployment options, talk to Galileo’s sales team.)


Frequently Asked Questions

Do I have to change my existing OpenTelemetry collector or exporters to use Galileo?

Short Answer: No major changes—just add Galileo as an exporter and selectively route AI-related traces.

Details:
You keep your current receivers and exporters (e.g., OTLP to Datadog, Jaeger, Zipkin). To connect Galileo, you:

  • Add an otlp exporter pointing at Galileo’s ingestion endpoint with the appropriate auth headers.
  • Wire that exporter into the traces pipeline.
  • Optionally add filters to target services or spans that involve LLMs/agents.

This means you don’t rip out or replace anything—you extend the existing pipeline so Galileo becomes another consumer of your AI traces.


Can Galileo send evaluation and guardrail results back into my existing observability tools?

Short Answer: Yes. Evaluation scores and guardrail decisions are attached to spans, so they flow through your normal OTel path.

Details:
Once Galileo is integrated:

  • Evaluation outputs (e.g., hallucination scores, injection risk, PII detection, wrong tool usage) are added as span attributes or events.
  • Those enriched spans still go to your existing backends through your current exporters.
  • You can:
    • Build dashboards in your APM tool filtered by galileo.* attributes.
    • Trigger alerts whenever, for example, galileo.hallucination_score > 0.8 or galileo.pii_risk_score > 0.6.
    • Correlate AI failure modes with infrastructure signals (CPU, latency, error rate) on the same trace.

In parallel, Galileo’s own UI provides Agent Insights, root cause analysis, and custom dashboards specifically tailored to LLM and agent behavior.


Summary

Connecting Galileo to OpenTelemetry and your existing tracing pipeline (collector/exporter setup) means you don’t have to choose between your current observability stack and serious AI reliability. You keep the same sessions → traces → spans structure and the same collector; you just:

  1. Instrument your agents and LLM calls with the right span attributes.
  2. Add Galileo as an OTLP exporter from your OTel collector.
  3. Let Galileo Evaluate, Signals, and Protect turn those traces into continuous evaluation and real-time guardrails using Luna-2.

Instead of discovering hallucinations, prompt injection, PII leaks, and wrong tool actions after users complain, you detect and intercept them automatically—without changing how your telemetry flows.


Next Step

Get Started