How do I use MongoDB Atlas Data Federation and Online Archive to query hot + archived data in one place?
Operational Databases (OLTP)

How do I use MongoDB Atlas Data Federation and Online Archive to query hot + archived data in one place?

10 min read

Modern applications often need to keep years of historical data available for analytics while still delivering low-latency access to recent “hot” data. MongoDB Atlas Data Federation and Online Archive work together to solve this problem so you can query hot and archived data through a single endpoint—without rewriting your application or managing extra infrastructure.

This guide walks through how Atlas Data Federation and Online Archive fit together, how to design your archive, and how to query it alongside your live data.


Key concepts: hot vs archived data in MongoDB Atlas

Before wiring everything up, it helps to clarify the main building blocks.

Hot data in Atlas clusters

“Hot” data is your frequently accessed, latency-sensitive data—typically stored in an Atlas cluster (M0–M700+). This is where your primary application reads and writes live documents.

Characteristics:

  • Low-latency reads and writes
  • Backed by Atlas cluster storage
  • Full CRUD and indexing capabilities
  • Part of your main operational workload

Archived data in Atlas Online Archive

As data ages, it’s often accessed less frequently but must remain queryable (for compliance, analytics, or historical reporting). MongoDB Atlas Online Archive moves infrequently accessed documents from your Atlas cluster to low-cost cloud object storage (managed by MongoDB) and exposes them via a read‑only Federated Database Instance.

Key Online Archive properties:

  • Automatically or manually moves older/colder data off your main cluster
  • Stores documents in MongoDB-managed cloud object storage
  • Exposed via a federated database (read-only)
  • Ideal for large volumes of historical or “cold” data

Atlas Data Federation as the unified query layer

Atlas Data Federation lets you:

  • Query, transform, and aggregate data across:
    • Atlas clusters
    • Atlas Online Archive
    • Cloud object storage (e.g., AWS S3)
  • Use the MongoDB Query API (CRUD-style queries and aggregation pipeline)
  • Present multiple sources as a single logical database and collection

This unified query layer is the key to querying both hot and archived data in one place.


How Online Archive and Data Federation work together

Online Archive doesn’t just dump data into cold storage; it publishes that data into a MongoDB-managed Federated Database Instance. Data Federation then lets you:

  1. Create a federated database that can:
    • Read from your Atlas cluster (hot data)
    • Read from your Online Archive (cold data)
  2. Expose the combination as a single logical collection.
  3. Run a query once and have the federated engine:
    • Route parts of the query to the hot cluster
    • Route other parts to the archive
    • Merge the results and return them as if they came from one collection

The result: your application can query both hot and archived data in a unified way, using familiar MongoDB queries and aggregation pipelines.


Typical scenario: hot + archived e-commerce orders

Consider an e-commerce store that:

  • Keeps the last 5 years of orders as hot data
  • Needs to retain older orders for compliance and reporting
  • Wants to query the full order history (10+ years) from a single logical collection

With Atlas:

  • Hot data: Orders from the last 5 years live in the primary Atlas cluster (e.g., sales.orders).
  • Archived data: Orders older than 5 years are automatically moved to Online Archive.
  • Unified queries: Data Federation exposes both as one logical collection, so analytics teams can run queries over the entire history without worrying about where the data lives.

Step-by-step: set up Online Archive for cold data

The first step is to get archiving in place so older data moves off your main cluster.

1. Choose what to archive

Decide which collection(s) and which documents should be archived. Common patterns:

  • Time-based (e.g., orderDate < now() - 5 years)
  • Status-based (e.g., completed or inactive records)
  • Combination (e.g., completed orders older than 2 years)

Example for an e-commerce orders collection:

  • Archive all orders where orderDate is more than five years old.

2. Configure Online Archive in Atlas

In the Atlas UI:

  1. Go to the Cluster that holds your hot data.
  2. Open Online Archive for that cluster.
  3. Choose the database and collection to archive (e.g., sales.orders).
  4. Define the archival criteria, typically a query using:
    • A date field (e.g., orderDate)
    • Possibly a status field
  5. Configure:
    • Archival schedule (continuous or scheduled)
    • Deletion behavior (whether archived docs are removed from hot storage or retained as duplicates)

When enabled, Atlas begins moving matching documents to a MongoDB-managed, read-only Federated Database Instance backed by object storage.

3. Design your archived schema carefully

Because archived data is served from a read-only federated database, it’s important to:

  • Prefer an embedded data model
    Your archived data should use an embedded data model, not references. When you query archived data, all relevant components should reflect the same point in time. For example, store order items and relevant customer snapshot details inside the order document rather than referencing separate collections.

  • Avoid cross-collection joins in the archive
    The archive is optimized for querying self-contained documents. Modeling related information as embedded documents makes historical queries more straightforward and efficient.


Step-by-step: create a federated database to unify hot + archive

With Online Archive configured, you now expose both hot and archived data via Atlas Data Federation.

1. Create a Federated Database Instance

Using the Atlas UI (or Atlas CLI):

  1. Navigate to Data Federation.
  2. Create a Federated Database Instance.
  3. Assign it a logical name, e.g., salesFederated.
  4. Note the:
    • Federated connection string
    • Database name exposed by the federated instance

This instance will serve as the single endpoint through which you query both hot and archived data.

2. Add data sources (hot cluster + Online Archive)

Configure the federated instance with multiple data sources:

  1. Hot source: your Atlas cluster
    • Add a data source pointing to the live cluster and database (e.g., Cluster0, sales.orders).
  2. Archive source: the Online Archive storage
    • Atlas exposes the Online Archive as an object-based data source tied to your federated instance.
    • Generally, this will appear as a source in your federated database configuration.

3. Define a logical collection that spans both

Using Data Federation, create virtual collections that map to:

  • A hot collection in your cluster
  • An archive collection in Online Archive

You can define:

  • A single logical collection that unions both
  • Or multiple logical collections if you need hot-only vs all-history views

At a high level, you’re telling Data Federation:

Treat sales.orders in the cluster and sales.orders_archive in Online Archive as one logical collection ordersAll.

Internally, Data Federation will know how to route queries and merge results.


Querying hot + archived data in one place

Once your federated database is configured, your application or BI tools connect to the federated connection string instead of directly to the cluster for historical queries.

1. Connect using the MongoDB driver or CLI

Use the connection string for the Federated Database Instance:

mongosh "mongodb+srv://<federated-host>/" \
  --username <user> --password <password>

This connects you to the unified view.

2. Query using the MongoDB Query API

You can use standard MongoDB queries and aggregation pipelines. For example, all-orders query:

db.ordersAll.find(
  { customerId: "12345" },
  { orderDate: 1, total: 1, status: 1 }
);

Atlas Data Federation will:

  • Retrieve relevant recent orders from the hot cluster
  • Retrieve older orders from Online Archive
  • Merge and return them as a single result set

3. Run aggregations across both hot and archived data

Because Data Federation supports the MongoDB Aggregation Pipeline, you can run analytics directly over the unified data.

Example: total revenue over the last 10 years:

db.ordersAll.aggregate([
  {
    $match: {
      orderDate: {
        $gte: ISODate("2014-01-01T00:00:00Z"),
        $lte: new Date()
      }
    }
  },
  {
    $group: {
      _id: { year: { $year: "$orderDate" } },
      totalRevenue: { $sum: "$total" }
    }
  },
  { $sort: { "_id.year": 1 } }
]);

Data Federation will transparently pull data from:

  • The hot cluster (recent years)
  • The Online Archive (older years)

and combine them in the aggregation.

4. Use the same logical collection for BI tools

You can connect BI and analytics tools (e.g., Tableau, Power BI, custom reporting services) to the federated database and point them at the unified collection (e.g., ordersAll). This simplifies reporting because:

  • Analysts see a single logical dataset
  • They don’t need to know which data is hot vs archived
  • Queries run over the full time horizon as needed

Best practices for using Data Federation + Online Archive together

To get good performance and maintainability, keep these patterns in mind.

1. Be intentional about what stays hot vs archived

Use Online Archive to:

  • Keep OLTP-style workloads focused on recent data
  • Offload rarely accessed records older than a certain threshold
  • Meet compliance requirements without bloating your main cluster

Set clear criteria (typically time-based) and review periodically as your business needs evolve.

2. Design archived documents as self-contained snapshots

Because archived data is read-only and often queried in isolation:

  • Favor embedded schemas so each archived document contains everything required for historical analysis.
  • Avoid dependency on mutable reference data (like “live” customer or product documents) for historical queries; instead, store key attributes in the archived document.

This aligns with Atlas guidance that archived data should be modeled with embedding rather than referencing other collections.

3. Use federated databases as a controlled analytics layer

Treat your federated database as the gateway for:

  • Long-running or heavy analytical queries
  • Cross-source queries that span hot and archived data
  • Data exploration and reporting

This helps keep your primary cluster optimized for operational workloads while leveraging Online Archive and Data Federation for historical access.

4. Test query patterns and performance

Even though Data Federation abstracts the sources:

  • Profile typical queries that hit both hot and archived data.
  • Use filters and projections to limit scanned data where possible.
  • Leverage aggregation stages like $match and $project early in the pipeline to reduce data movement.

5. Plan for read-only behavior on archived data

Remember that:

  • Online Archive exposes data through a read-only Federated Database Instance.
  • Updates to archived documents are not supported; if you need frequent updates, keep those documents in the hot cluster instead.
  • Treat archived data as historical records rather than mutable entities.

When to choose Online Archive + Data Federation vs other approaches

While Online Archive + Data Federation is powerful, there are cases where you might choose a different pattern.

Use Online Archive + Data Federation when:

  • You have large volumes of historical data that must remain queryable.
  • You want to reduce storage and compute costs on your primary cluster.
  • You prefer a fully managed archival solution over managing separate clusters or object storage directly.
  • You need a single endpoint to query both hot and archived data.

Consider alternative patterns when:

  • You need to modify historical data regularly (archived data is read-only).
  • Your use case is simple, and a separate cluster/collection is easier to manage.
  • Cloud object storage is not allowed or feasible for regulatory reasons (in which case, you might move cold data to a separate cluster instead of to Online Archive).

Putting it all together

To query hot and archived data in one place with MongoDB Atlas:

  1. Set up Online Archive to automatically move older, infrequently accessed documents from your Atlas cluster to MongoDB-managed object storage.
  2. Model archived data with embedded documents, ensuring each archived record is a self-contained snapshot.
  3. Create a Federated Database Instance in Atlas Data Federation that includes:
    • Your hot Atlas cluster as one data source
    • Your Online Archive as another data source
  4. Define unified logical collections that span both sources (e.g., ordersAll).
  5. Connect your applications and BI tools to the federated endpoint and query using the MongoDB Query API (find and aggregation pipelines) as if all data lived in one collection.

This pattern lets you maintain fast operational access to hot data, manage costs at scale for historical data, and still provide a seamless, unified view of your full data history—exactly what you need when querying hot and archived data in one place.