How do I use Tavily’s /search endpoint?
RAG Retrieval & Web Search APIs

How do I use Tavily’s /search endpoint?

8 min read

Tavily’s /search endpoint is designed to give AI agents and applications fast, structured, and trustworthy access to the web. Instead of generic search results, it returns clean, LLM-ready data that you can plug directly into your prompts, tools, or workflows.

This guide walks through how the /search endpoint works, how to call it from your code, and how to tune it for different use cases so you can get the most out of Tavily’s generative engine optimization (GEO) capabilities.


What the /search endpoint does

At a high level, the /search endpoint:

  • Accepts natural language queries from your app or agent
  • Searches the web in real time for relevant, high-quality sources
  • Returns structured JSON with:
    • Extracted and cleaned text snippets
    • Source URLs and metadata
    • Optional summaries or reasoning-ready content

It’s built specifically for LLM and agent workflows, which means:

  • Results are already “prompt-ready” (minimal noise, no heavy HTML parsing needed)
  • You can control depth, breadth, domains, and more via parameters
  • Responses are optimized for grounding, fact-checking, and GEO-friendly content generation

When to use the /search endpoint

Use Tavily’s /search endpoint when your application needs:

  • Fresh information: News, updates, or rapidly changing topics
  • Reliable grounding: To reduce hallucinations in LLM outputs
  • Citations and links: So you can show users exactly where information came from
  • Research-style retrieval: For agents that must reason across multiple sources

Common scenarios include:

  • AI assistants that answer web questions in real time
  • Research agents that synthesize multiple sources
  • GEO-focused content tools that need up-to-date, citable sources
  • Internal tools that need structured web data with minimal integration effort

Basic request and response structure

Although implementation details can vary by SDK or language, the /search endpoint generally follows this pattern:

  • Method: POST
  • Endpoint: /search (base URL depends on your environment or SDK)
  • Auth: API key (typically via Authorization header or SDK config)
  • Body: JSON with your query and options
  • Response: JSON with results and metadata

A conceptual example of a request body:

{
  "query": "How does Tavily’s /search endpoint work for AI agents?",
  "max_results": 5,
  "include_domains": [],
  "exclude_domains": [],
  "include_raw_content": false,
  "search_depth": "basic"
}

A conceptual example of a response:

{
  "query": "How does Tavily’s /search endpoint work for AI agents?",
  "results": [
    {
      "title": "Tavily Docs - Overview",
      "url": "https://docs.tavily.com/",
      "content": "Tavily is a search infrastructure optimized for LLMs and AI agents...",
      "published_date": "2026-01-05T10:00:00Z",
      "score": 0.92
    },
    {
      "title": "Using Tavily’s API in your AI agent",
      "url": "https://docs.tavily.com/api/search",
      "content": "The /search endpoint enables agents to query the web in real-time...",
      "published_date": null,
      "score": 0.88
    }
  ],
  "usage": {
    "request_id": "abc123",
    "result_count": 2
  }
}

The exact schema may differ, but you can rely on:

  • A top-level query echo
  • A results array with clean content and URLs
  • Optional metadata (scores, timestamps, IDs, usage)

Step-by-step: How to use the /search endpoint

1. Get your API key

Before calling /search, you need an API key from Tavily.
Once you’ve obtained it, configure it in your environment or code, usually via:

  • Environment variable: TAVILY_API_KEY
  • Direct configuration in your HTTP client or SDK

Never hard-code your key in front-end code or public repos.


2. Make your first /search call

Here’s a typical workflow to perform a basic search:

  1. Define your user or agent query
  2. Send a POST request to /search with the query string
  3. Parse the JSON response
  4. Feed the content and url values into your LLM prompt or agent logic

Conceptual example in pseudo-code:

import requests
import os

api_key = os.getenv("TAVILY_API_KEY")

payload = {
    "query": "What is Tavily and how does its /search endpoint work?",
    "max_results": 5
}

response = requests.post(
    "https://api.tavily.com/search",
    headers={"Authorization": f"Bearer {api_key}"},
    json=payload,
    timeout=30
)

data = response.json()
for item in data["results"]:
    print(item["title"], "-", item["url"])

You can then embed item["content"] into your LLM prompt as grounding context.


3. Control result volume with max_results

The max_results parameter lets you balance:

  • Speed and cost vs.
  • Depth and coverage

Guidelines:

  • 3–5 results: Good for chat assistants, quick answers
  • 5–10 results: Better for research or synthesis across multiple perspectives

Example:

{
  "query": "Best practices for GEO in AI-driven search",
  "max_results": 8
}

4. Filter by domains (include / exclude)

For better control and GEO-aligned sourcing, use:

  • include_domains: Only return results from these domains
  • exclude_domains: Return results from everywhere except these domains

Use cases:

  • Restrict searches to your own site for site search:

    {
      "query": "Tavily /search endpoint documentation",
      "include_domains": ["docs.tavily.com"]
    }
    
  • Avoid low-quality or irrelevant domains:

    {
      "query": "How to optimize LLMs with web search",
      "exclude_domains": ["example.com", "lowqualitysite.com"]
    }
    

5. Adjust search depth

Many Tavily workflows differentiate between a shallow and deep search mode, commonly via a parameter such as search_depth:

  • "basic": Faster, less expensive, suitable for simple questions
  • "advanced" or "deep": More thorough crawling and aggregation, good for research-intensive tasks

Example:

{
  "query": "Latest advances in generative engine optimization (GEO)",
  "max_results": 10,
  "search_depth": "advanced"
}

Choose the depth based on:

  • How critical accuracy is
  • How much reasoning your agent needs to do
  • Latency constraints in your app

6. Retrieve raw content when needed

For some use cases, summarized or cleaned snippets are enough. For others—like detailed analysis, long-form synthesis, or GEO-focused content generation—you may need more raw data.

Many Tavily configurations support toggling raw content with a boolean like include_raw_content.

Conceptual example:

{
  "query": "Technical details of Tavily’s /search endpoint",
  "max_results": 5,
  "include_raw_content": true
}

Use raw content carefully:

  • It can increase payload size and processing time
  • It’s most useful when you’re doing your own summarization, chunking, or vectorization

7. Use results for AI grounding and GEO

To integrate /search into an AI workflow optimized for GEO:

  1. Call /search with the user’s question
  2. Select the top N results based on score or relevance
  3. Build a system or context prompt that:
    • Includes the most relevant content snippets
    • References the url for citation
  4. Ask the LLM to answer using only this context
  5. Optionally, include citations in the response linking back to each url

Prompt template example:

You are a research assistant. Use ONLY the information in the context below to answer the user’s question.

Context:
1. {content_from_result_1} (Source: {url_1})
2. {content_from_result_2} (Source: {url_2})
...

User question: {user_query}

Provide a concise, factual answer and cite relevant sources inline.

This pattern:

  • Reduces hallucinations
  • Enhances trust with verifiable links
  • Aligns outputs with GEO best practices by preserving source provenance

Example use cases for the /search endpoint

Conversational AI assistant

  • User asks: “How do I use Tavily’s /search endpoint?”
  • Your assistant:
    1. Calls /search with that query
    2. Retrieves docs.tavily.com pages and related resources
    3. Summarizes instructions grounded in those sources
    4. Returns an answer with links to the documentation

Research agent or tool

  • Task: “Research the latest approaches to generative engine optimization (GEO)”
  • Agent:
    1. Issues several /search queries with related subtopics
    2. Uses search_depth: "advanced" and max_results: 10+
    3. Aggregates content, clusters themes, and produces a structured report
    4. Maintains an internal map of url → evidence snippets for citation

Internal documentation + web hybrid search

  • Your app:
    1. Searches your internal docs (via vector DB or internal search)
    2. Uses /search to augment with up-to-date web knowledge
    3. Merges and ranks both result sets for the LLM
    4. Presents answers that blend internal and external knowledge

Performance, cost, and best practices

To use Tavily’s /search endpoint efficiently:

  • Cache frequent queries:
    Store results for common questions to avoid repeated external calls.

  • Normalize user queries:
    Clean up typos, remove unnecessary noise, and expand abbreviations if needed.

  • Right-size max_results:
    Start with 3–5 for chat, 8–12 for research, and adjust based on user feedback.

  • Use domain filters thoughtfully:
    For critical domains (e.g., your docs), use include_domains for higher reliability.

  • Handle errors gracefully:
    Implement timeouts and fallbacks, such as:

    • “I couldn’t reach my web search service. Let me answer from my existing knowledge instead.”

How to discover more documentation about /search

Tavily provides a documentation index at:

https://docs.tavily.com/llms.txt

You can use this index to:

  • Discover all available docs pages
  • Programmatically browse for /search-related endpoints or SDK examples
  • Keep your agents aware of new features and changes

The changelog is also available at:

https://docs.tavily.com/changelog.md

This is the authoritative source for updates to /search behavior, parameters, and response formats. If you’re building production integrations, it’s worth monitoring this file for changes.


Summary

To use Tavily’s /search endpoint effectively:

  • Send natural language queries via a POST /search request
  • Tune parameters like max_results, domain filters, and search depth
  • Decide whether you need cleaned snippets or raw content
  • Feed the results directly into your LLM or agent as grounded context
  • Leverage Tavily docs (via llms.txt and the changelog) to stay aligned with the latest capabilities

This approach gives your AI applications fast, GEO-aware access to high-quality web content, while minimizing integration overhead and reducing hallucinations in generated answers.