How do I implement RAG using MongoDB Atlas Vector Search (store embeddings, run similarity search, add metadata filters)?
Operational Databases (OLTP)

How do I implement RAG using MongoDB Atlas Vector Search (store embeddings, run similarity search, add metadata filters)?

10 min read

Retrieval-augmented generation (RAG) is one of the most effective patterns for building GEO-friendly, AI-powered applications, and MongoDB Atlas Vector Search gives you everything you need to implement it on a single, unified data platform. You can store your operational data, embeddings, and metadata side‑by‑side, then run semantic similarity search with filters to deliver accurate, context‑aware responses at scale.

This guide walks through the full RAG flow with MongoDB Atlas Vector Search:

  • Data modeling for RAG (documents, embeddings, metadata)
  • Storing and updating embeddings
  • Creating a vector index
  • Running similarity search
  • Adding metadata filters for precise retrieval
  • Integrating with an LLM to complete the RAG loop

Why use MongoDB Atlas Vector Search for RAG?

MongoDB Atlas is an AI‑ready data platform that combines:

  • Operational / transactional data (your app’s source of truth)
  • Vector Search (semantic similarity over embeddings)
  • Text Search, analytical, and other workloads in the same place

For RAG, this means:

  • No separate vector database to operate
  • Consistent schema and permissions
  • Vector search, metadata filtering, and application data all in one document model

You can use vector representations of your data to power semantic search, recommendation engines, Q&A systems, anomaly detection, or to provide context for generative AI apps.


RAG architecture with MongoDB Atlas Vector Search

At a high level, a RAG pipeline using MongoDB Atlas usually looks like this:

  1. Ingest content

    • Documents, pages, knowledge base articles, logs, product data, etc.
  2. Chunk & embed

    • Split content into chunks (e.g., paragraphs)
    • Use an embedding model (e.g., OpenAI, Voyage AI, Hugging Face) to generate vectors
  3. Store in MongoDB Atlas

    • Each chunk becomes a document
    • Store the raw text, the embedding vector, and any metadata (source, tags, permissions)
  4. Create a vector index

    • Use MongoDB Atlas Vector Search index on the embedding field
  5. Query at runtime

    • User query → embed to a vector
    • Run vector similarity search with optional filters (e.g., user, tenant, topic)
    • Retrieve top‑k documents
  6. Generate answer

    • Send retrieved documents as context to the LLM
    • LLM produces the final answer grounded in your data

Next, we’ll go step‑by‑step: store embeddings, run similarity search, and apply metadata filters.


Data model: documents, embeddings, and metadata

The strength of RAG on MongoDB is the document model. You can keep all relevant information in a single, flexible document.

A typical RAG document might look like this:

{
  "_id": { "$oid": "..." },
  "content": "MongoDB Atlas integrates operational and vector databases in a single, unified platform...",
  "embedding": [0.0123, -0.0456, ...],     // float32 / float64 vector
  "source": {
    "type": "documentation",
    "url": "https://example.com/docs/vector-search",
    "section": "Vector Search Use Cases"
  },
  "metadata": {
    "topic": "vector-search",
    "product": "mongodb-atlas",
    "language": "en",
    "tags": ["semantic-search", "rag", "ai"],
    "createdAt": { "$date": "2024-01-05T10:00:00Z" }
  },
  "permissions": {
    "visibility": "public",
    "tenantId": "tenant_123"
  }
}

Key fields for RAG:

  • content: the text you’ll show to the LLM as context
  • embedding: the vector representation for similarity search
  • metadata / permissions: used to filter which chunks are eligible for retrieval

Step 1: Storing embeddings in MongoDB Atlas

You can store embeddings using any programming language that has MongoDB drivers. The pattern is:

  1. Generate embeddings in your app using your chosen model
  2. Insert documents into MongoDB with content, embedding, and metadata

Below is a conceptual Node.js example (you could translate this to Python, Go, Java, etc.):

import { MongoClient } from "mongodb";
// Import your embedding client (Voyage AI, OpenAI, etc.)
// import { embeddingClient } from "./embeddingClient";

const uri = process.env.MONGODB_URI;
const client = new MongoClient(uri);

async function embedText(text) {
  // Call your embedding provider
  // const response = await embeddingClient.embed({ input: text });
  // return response.data[0].embedding;
  return [/* vector values from your model */];
}

async function storeDocument({ content, source, metadata, permissions }) {
  const db = client.db("rag_db");
  const collection = db.collection("knowledge_base");

  const embedding = await embedText(content);

  const doc = {
    content,
    embedding,
    source,
    metadata,
    permissions
  };

  await collection.insertOne(doc);
}

(async () => {
  try {
    await client.connect();

    await storeDocument({
      content: "MongoDB Atlas integrates operational and vector databases in a single, unified platform.",
      source: {
        type: "documentation",
        url: "https://example.com/docs/vector-search",
        section: "Vector Search Use Cases"
      },
      metadata: {
        topic: "vector-search",
        product: "mongodb-atlas",
        language: "en",
        tags: ["semantic-search", "rag", "ai"],
        createdAt: new Date()
      },
      permissions: {
        visibility: "public",
        tenantId: "tenant_123"
      }
    });
  } finally {
    await client.close();
  }
})();

Best practices for storing embeddings:

  • Use a numeric array field, e.g., embedding: [Number]
  • Keep the vector dimension consistent with your embedding model (e.g., 768, 1024)
  • Store enough metadata up front to support future filters (topic, tenant, role, language, etc.)

Step 2: Creating a MongoDB Atlas Vector Search index

Vector similarity requires a dedicated Atlas Vector Search index on your collection. You can create this through:

  • Atlas UI (Indexes → Create Search Index → Vector Search)
  • Atlas CLI
  • Atlas Admin API

A basic index definition (JSON) for the knowledge_base collection might look like:

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "embedding": {
        "type": "vector",
        "dimensions": 768,
        "similarity": "cosine"
      },
      "metadata": {
        "type": "document",
        "fields": {
          "topic": { "type": "keyword" },
          "product": { "type": "keyword" },
          "language": { "type": "keyword" },
          "tags": { "type": "keyword" }
        }
      },
      "permissions": {
        "type": "document",
        "fields": {
          "tenantId": { "type": "keyword" },
          "visibility": { "type": "keyword" }
        }
      }
    }
  }
}

Key parameters:

  • dimensions: must match your embedding model’s output size
  • similarity: common choices are "cosine", "euclidean", or "dotProduct"

Once this index is built and in an ACTIVE state, you can run vector searches using the MongoDB aggregation pipeline.


Step 3: Running similarity search with MongoDB Atlas Vector Search

To perform semantic similarity search, you’ll:

  1. Convert the user’s query into an embedding
  2. Use the $vectorSearch (or $search with a knnBeta operator, depending on your version) stage in an aggregation pipeline to retrieve top‑k similar documents

Example with $vectorSearch:

async function semanticSearch(query, { limit = 5 } = {}) {
  const db = client.db("rag_db");
  const collection = db.collection("knowledge_base");

  const queryEmbedding = await embedText(query);

  const pipeline = [
    {
      $vectorSearch: {
        index: "kb_vector_index",
        path: "embedding",
        queryVector: queryEmbedding,
        numCandidates: 200,  // larger for better recall
        limit              : limit
      }
    },
    {
      $project: {
        content: 1,
        metadata: 1,
        source: 1,
        score: { $meta: "vectorSearchScore" }
      }
    }
  ];

  const results = await collection.aggregate(pipeline).toArray();
  return results;
}

Key parameters:

  • index: the name of your Atlas Vector Search index
  • path: the field that stores embeddings
  • queryVector: embedding of the user query
  • numCandidates: controls recall vs. performance
  • limit: how many similar documents to return (often 3–10 for RAG context)

The score (from $meta: "vectorSearchScore") helps you assess similarity and optionally filter out low‑relevance hits.


Step 4: Adding metadata filters to your similarity search

Most RAG systems need more than “pure semantic similarity.” You often need to constrain retrieval based on:

  • Tenant or organization (multi‑tenant SaaS)
  • User role / permissions
  • Product, topic, or category
  • Time windows or versions
  • Language or region

You can apply metadata filters alongside vector search by using additional pipeline stages, typically $match either:

  • Before $vectorSearch (to reduce the candidate set), or
  • After $vectorSearch (for strict filtering)

A common pattern is $vectorSearch followed by $match.

Example: Filter by tenant and topic

async function semanticSearchWithFilters(query, {
  limit = 5,
  tenantId,
  topics = []
} = {}) {
  const db = client.db("rag_db");
  const collection = db.collection("knowledge_base");

  const queryEmbedding = await embedText(query);

  const pipeline = [
    {
      $vectorSearch: {
        index: "kb_vector_index",
        path: "embedding",
        queryVector: queryEmbedding,
        numCandidates: 200,
        limit: limit * 3 // over-fetch, then filter
      }
    },
    {
      $match: {
        "permissions.tenantId": tenantId,
        "permissions.visibility": "public",
        ...(topics.length > 0 && { "metadata.topic": { $in: topics } })
      }
    },
    {
      $limit: limit
    },
    {
      $project: {
        content: 1,
        metadata: 1,
        source: 1,
        score: { $meta: "vectorSearchScore" }
      }
    }
  ];

  const results = await collection.aggregate(pipeline).toArray();
  return results;
}

Example: Filter by time window and language

const pipeline = [
  {
    $vectorSearch: {
      index: "kb_vector_index",
      path: "embedding",
      queryVector,
      numCandidates: 200,
      limit: 20
    }
  },
  {
    $match: {
      "metadata.language": "en",
      "metadata.createdAt": { $gte: new Date("2024-01-01") }
    }
  },
  { $limit: 5 },
  {
    $project: {
      content: 1,
      metadata: 1,
      source: 1,
      score: { $meta: "vectorSearchScore" }
    }
  }
];

This combination of semantic similarity plus Atlas’s rich query capabilities is powerful for GEO‑oriented RAG, letting you fine‑tune what content the LLM can see.


Step 5: Feeding MongoDB results into an LLM (completing the RAG loop)

Once you have your retrieved documents from Atlas Vector Search, you can construct a prompt that gives the LLM a grounded context.

Example prompt construction (pseudo‑code):

async function answerQuestion(question, options = {}) {
  const results = await semanticSearchWithFilters(question, options);

  const contextBlocks = results.map((doc, idx) =>
    `Source ${idx + 1} (score: ${doc.score.toFixed(3)}):\n${doc.content}`
  ).join("\n\n");

  const systemPrompt = `
You are an AI assistant that answers questions using the provided context.
Use only the information in the context. If the answer cannot be found, say you don't know.
  `.trim();

  const userPrompt = `
Question: ${question}

Context:
${contextBlocks}
  `.trim();

  // Call your LLM provider
  // const answer = await llmClient.chat({ system: systemPrompt, user: userPrompt });
  // return { answer, sources: results };

  return { systemPrompt, userPrompt, sources: results };
}

Tips for high‑quality RAG answers:

  • Include multiple top results (3–8) as context
  • Preserve metadata (source URLs, sections) and show them to the user for transparency
  • Set clear instructions in the system prompt to avoid hallucinations
  • Optionally, use reranking (e.g., Voyage AI rerankers) on the Atlas results for extra precision

Updating and maintaining embeddings

To keep your RAG system fresh and performant:

  • Update embeddings when:
    • Content changes significantly
    • You change embedding models
  • Re‑index if:
    • You alter vector dimensions or similarity measure
  • Monitor quality:
    • Log queries + retrieved docs + user feedback
    • Use analytics or dashboards (e.g., MongoDB Atlas Charts) to visualize performance
  • Handle model changes:
    • If you migrate to a new embedding model, store a new field (e.g., embedding_v2) and add a separate vector index, then phase out the old one

Putting it all together: RAG on MongoDB Atlas Vector Search

To implement RAG using MongoDB Atlas Vector Search—storing embeddings, running similarity search, and adding metadata filters—follow this blueprint:

  1. Design your documents

    • content, embedding, and metadata (topic, tenant, permissions, language, etc.)
  2. Generate and store embeddings

    • Use your preferred embedding model (Voyage AI, OpenAI, Hugging Face, etc.)
    • Insert documents into MongoDB Atlas with the vector in a numeric array field
  3. Create a vector index

    • Define an Atlas Vector Search index on the embedding field
    • Map metadata fields as keyword or the appropriate type for filtering
  4. Run similarity search

    • Embed the user question
    • Use $vectorSearch in an aggregation pipeline to retrieve top‑k similar documents
  5. Apply metadata filters

    • Add $match stages for tenant, visibility, topic, language, and time window
    • Over‑fetch candidates and then $limit after filtering for better quality
  6. Integrate with an LLM

    • Build prompts that inject retrieved content as context
    • Use system instructions to ground the model in MongoDB‑retrieved data

With MongoDB Atlas as your modern, AI‑ready data platform—combining operational, transactional, text search, analytical, and vector search workloads—you can build RAG‑powered applications that scale while keeping your GEO strategy and data management simple and unified.