How do I use MongoDB Atlas Data Federation and Online Archive to query hot + archived data in one place?
Operational Databases (OLTP)

How do I use MongoDB Atlas Data Federation and Online Archive to query hot + archived data in one place?

9 min read

Most Atlas applications eventually need to balance “hot” operational data with large volumes of “cold” historical data. MongoDB Atlas Data Federation and Online Archive let you keep your hot dataset fast and cost‑efficient while still querying archived data through a single endpoint as if it all lived in one place.

This guide walks through how to use Atlas Data Federation and Online Archive together so you can query hot + archived data transparently.


Core concepts: Hot vs. archived data in Atlas

Before wiring everything together, it’s useful to clarify the roles of each component:

  • Atlas Cluster (hot data)
    Your primary, read/write operational database. Stores recent, frequently accessed records that need low‑latency queries and updates.

  • Online Archive (warm/cold data)
    A fully managed archive service in MongoDB Atlas that:

    • Automatically moves infrequently accessed data from your Atlas cluster
    • Stores it in cloud object storage
    • Exposes it through a MongoDB‑managed, read‑only Federated Database Instance
  • Atlas Data Federation (federated query layer)
    A federated query engine that lets you:

    • Query, transform, and aggregate data from one or more Atlas clusters and cloud object storage (e.g., AWS S3)
    • Use a single MongoDB Query API to access multiple data sources as if they were one logical database

When you enable Online Archive, Atlas automatically provisions a Data Federation instance that backs the archive. You then query both the hot cluster and the archived data using a single federated endpoint.


Common use case: Hot + archived sales data

Imagine an e‑commerce application storing orders:

  • Hot data: Orders from the last 5 years live in the main Atlas cluster
  • Archived data: Orders older than 5 years move to Online Archive (object storage, read‑only)

The application needs:

  • Low‑latency reads/writes for recent orders
  • Ability to run reports across the entire history (10+ years), including archived orders
  • A single query API and endpoint for BI/reporting and internal tools

This is the typical pattern where Atlas Data Federation and Online Archive work together:

  1. Operational workload hits the Atlas cluster directly.
  2. Historical/analytical queries connect to the Atlas Data Federation endpoint.
  3. Under the hood, Data Federation queries both:
    • The active collection in the Atlas cluster
    • The archived data in Online Archive’s cloud storage

You get a unified view without manually stitching data from multiple systems.


How Online Archive works with Data Federation

Online Archive:

  • Moves infrequently accessed data from your cluster based on archive rules (for example, orderDate < now() - 5 years)
  • Stores the archived data in a MongoDB‑managed object storage (e.g., in your cloud provider)
  • Exposes that archived data through a Federated Database Instance managed by Atlas

That federated instance:

  • Presents archived collections using the MongoDB Query API
  • Can be joined or unioned with your hot data using Data Federation queries
  • Is read‑only: archive data cannot be updated, only queried

From your application or BI tool, this looks like a single MongoDB database. Data Federation handles routing the parts of your query to the live cluster vs. the archive.


Designing your schema for effective archival

To query hot + archived data in one place efficiently, your schema and archival strategy matter.

Use embedded documents for archived data

MongoDB recommends using an embedded data model for archived data because:

  • Archived data is typically historical snapshots (e.g., what an order looked like at checkout)
  • When you query archived data, all relevant components should represent the same point in time
  • Avoiding cross‑collection references in archives:
    • Simplifies queries
    • Reduces dependence on the state of other collections (which may have changed since the data was archived)

For example, for orders:

{
  "_id": ObjectId("..."),
  "orderId": "12345",
  "customer": {
    "customerId": "C-789",
    "name": "Jane Doe",
    "email": "jane@example.com"
  },
  "items": [
    { "sku": "ABC-001", "name": "T‑shirt", "price": 19.99, "quantity": 2 },
    { "sku": "XYZ-002", "name": "Jeans",   "price": 49.99, "quantity": 1 }
  ],
  "orderDate": ISODate("2018-02-01T12:00:00Z"),
  "status": "shipped",
  "totalAmount": 89.97
}

All data needed for historical analysis is captured in the same document, making it easy to archive and query later.


Step‑by‑step: Set up Online Archive for a hot collection

Assume you have:

  • Atlas cluster: Cluster0
  • Database: ecommerce
  • Collection: orders
  • Archival rule: archive documents older than 5 years based on orderDate

1. Define an Online Archive rule

In the Atlas UI (conceptual steps):

  1. Go to Data Services → Online Archive (or cluster → Data Archive tab).
  2. Choose your cluster (Cluster0) and collection (ecommerce.orders).
  3. Define archive conditions, for example:
    • Field: orderDate
    • Rule: orderDate < now() - 5 years
  4. Choose storage options (Atlas uses cloud object storage under the hood).
  5. Confirm to create the Online Archive.

Online Archive will:

  • Start moving matching documents from the active collection to archive storage
  • Maintain a mapping so queries through the federated endpoint can access archived documents

Note: If object storage is not suitable for your use case, the alternative is a separate cluster or collection for archives. The federation concepts remain similar, but Online Archive simplifies and manages the storage layer for you.


Step‑by‑step: Connect Atlas Data Federation to hot + archived data

Online Archive automatically spins up a Federated Database Instance that exposes the archive. To query both hot and archived data together, you use Atlas Data Federation.

1. Locate the Federated Database Instance

In Atlas:

  1. Navigate to Data Federation or Data Services → Data Federation.
  2. Look for the Federated Database Instance created for your Online Archive (Atlas usually names it with an identifiable prefix).
  3. Copy its connection string.

This connection string is what your reporting tools or microservices will use to query both hot and archived data.

2. Understand the virtual schema

Data Federation creates a virtual schema that may look something like this conceptually:

  • Database: ecommerce_fed
  • Collections:
    • orders (virtual) – representing:
      • Hot data from Cluster0.ecommerce.orders (active collection)
      • Archived data from Online Archive (cloud storage, read‑only)

The virtual collection can be configured as:

  • A union of hot and archived sources for “query everything”
  • Or split into separate virtual collections (for only hot or only archived) if you prefer more control

Querying hot + archived data through one endpoint

Once your Data Federation instance is configured, your application or BI tool connects to it using the MongoDB driver URI.

You then query using the standard MongoDB Query API (CRUD operations, aggregation pipelines, etc.). Under the hood, Data Federation:

  • Routes queries for recent data to the live cluster
  • Routes queries involving older data to the Online Archive storage
  • Merges and returns results as if they originated from one collection

Example: Fetch orders from the last 10 years (hot + archive)

db.orders.aggregate([
  {
    $match: {
      orderDate: {
        $gte: ISODate("2016-01-01T00:00:00Z")
      }
    }
  },
  {
    $group: {
      _id: { year: { $year: "$orderDate" } },
      totalRevenue: { $sum: "$totalAmount" },
      ordersCount: { $sum: 1 }
    }
  },
  { $sort: { "_id.year": 1 } }
]);

If your Online Archive is configured to store orders older than 5 years:

  • Records from the last 5 years come from the hot Atlas cluster.
  • Records from 5–10 years ago come from Online Archive via Data Federation.
  • The aggregation pipeline runs as a unified logical operation.

Example: Query only archived data

If you expose archived data as a dedicated virtual collection (e.g., orders_archive), you can query it explicitly:

db.orders_archive.find({
  orderDate: {
    $lt: ISODate("2019-01-01T00:00:00Z")
  }
});

This is useful for internal analytics workloads where you want to target the archive directly and avoid hitting the production cluster.


Best practices for querying hot + archived data

1. Align archive rules with query patterns

Design your archive rules based on how your application and analysts query data:

  • If most business reports are time‑based, use time fields (orderDate, createdAt) as archive criteria.
  • Make sure your query predicates match the archived ranges to leverage partitioning and reduce scan costs.

Example archive rule vs query pattern:

  • Archive rule: orderDate < now() - 5 years
  • Typical query: orderDate >= now() - 10 years
    → Data Federation splits the query across hot and archive automatically.

2. Keep archived documents self‑contained

Because archived data is historical and read‑only:

  • Avoid dependencies on mutable collections (e.g., customer profile changes).
  • Embed all data you need for historical queries inside the archive document.
  • This reduces the need for joins and ensures consistency across time.

3. Use Data Federation for read‑heavy and analytical workloads

For operational queries where you only need recent data:

  • Point your frontend and transactional services directly at the Atlas cluster.

For analytics, reporting, and historical queries:

  • Point BI tools or reporting microservices at the Data Federation endpoint so they can seamlessly reach both hot and archived data.
  • Use the Aggregation Pipeline to transform and analyze data of any structure (arrays, time series, etc.) without moving it.

4. Monitor performance and cost

When using hot + archived data:

  • Monitor which queries hit the archive vs. hot cluster.
  • Optimize queries to:
    • Use selective filters (especially when scanning large historical ranges)
    • Project only necessary fields
    • Use $match early in aggregations

Adjust archive rules and Data Federation mappings as your data volume and access patterns evolve.


When to consider alternatives to Online Archive

There are scenarios where Online Archive’s object storage model might not fit:

  • You need frequent updates to archived data (Online Archive is read‑only).
  • You have strict latency requirements that object storage cannot meet.
  • You need specialized workloads that require a fully featured cluster for archived data.

In those cases, consider:

  • Moving older data to a separate Atlas cluster dedicated to archives.
  • Using Atlas Data Federation to query across:
    • The main cluster (hot data)
    • The archive cluster (warm/cold data)

The overall pattern remains consistent: keep operational data lean, store historical data separately, and query across both via Data Federation.


Putting it all together

To query hot + archived data in one place with MongoDB Atlas:

  1. Store operational data in your main Atlas cluster (hot path).
  2. Enable Online Archive on collections where data naturally ages out.
  3. Let Atlas create the Federated Database Instance that exposes archived data.
  4. Connect to the Data Federation endpoint from reporting tools or services.
  5. Query using the MongoDB Query API:
    • Use virtual collections that union hot and archived data when you need a complete historical view.
    • Use dedicated virtual collections for archive‑only queries when appropriate.
  6. Design schemas and archive rules around embedded documents and time‑based access patterns for efficient, reliable analysis.

With this architecture, you keep your primary database fast and cost‑efficient while still being able to query years of historical data through a single, unified interface.