
Apify cost estimate: how many compute units will it take to crawl ~10,000 pages per day?
Most teams hit the same wall when they start scaling on Apify: “How many compute units is this crawler actually going to burn?” If you’re aiming for roughly 10,000 pages per day, you can get to a sensible cost estimate—but only if you break the problem down into how Apify billing actually works and what your Actor is doing per page.
Below I’ll walk through how compute units (CUs) are charged, realistic CU/page ranges from production workloads, and how I’d size a 10k‑pages‑per‑day crawler stack before putting a credit card on the line.
The Quick Overview
- What It Is: A practical, engineering-level way to estimate Apify compute unit usage for crawling ~10,000 pages per day, with example scenarios and a repeatable formula.
- Who It Is For: Developers, data teams, and PMs planning new scrapers or migrating existing crawlers (Scrapy, Playwright, Puppeteer, Selenium) to Apify.
- Core Problem Solved: You need to budget Apify costs and pick a pricing plan, but CU usage depends on page complexity, tech stack, and how your Actor is built.
How Apify compute units work (and why per‑page cost varies)
On Apify, you don’t pay “per page”; you pay for compute units. A compute unit is a normalized chunk of resources that blends:
- CPU time
- Memory usage
- Browser overhead (if you’re running Playwright/Puppeteer/Selenium)
- Run duration
In practice, your CU cost per page depends mostly on:
-
Rendering mode
- HTTP-only, no browser (Crawlee
CheerioCrawler, basicgot-scraping): cheapest per page. - Headless browser (Playwright/Puppeteer/Selenium): more CU per page due to browser startup, JS execution, and rendering time.
- HTTP-only, no browser (Crawlee
-
Page complexity & blocking
- Heavy SPAs, infinite scroll, reCAPTCHA, and aggressive bot protection mean longer run times and more retries → more CUs.
-
Actor architecture
- Reusing a single browser per run vs opening a new browser per URL.
- Concurrency configuration (too low = wasted time; too high = CPU/memory spikes and throttling).
- How much processing you do on each page (DOM parsing, text cleanup, AI calls, etc.).
Because of these variables, you can’t get an exact CU/page number from a doc table, but you can get realistic ranges and validate them with a small calibration run.
A practical baseline: CU usage ranges per 10,000 pages
From running and tuning a lot of Actors on Apify (my own and marketplace ones), these are reasonable ballpark estimates for 10k pages/day:
1. Lightweight HTML pages (no JS rendering)
Stack: Crawlee CheerioCrawler / HTTP client, minimal parsing, low blocking.
- CU per 1,000 pages: ~0.1–0.3 CUs
- CU per 10,000 pages: ~1–3 CUs/day
Typical when:
- You’re crawling static sites, blogs, documentation.
- You just need titles, meta, main content.
- No login, no complex flows, minimal retries.
2. Standard JS-rendered pages (Playwright/Puppeteer)
Stack: Crawlee PlaywrightCrawler or PuppeteerCrawler, 1 browser per worker, basic navigation and extraction.
- CU per 1,000 pages: ~0.5–1.5 CUs
- CU per 10,000 pages: ~5–15 CUs/day
Typical when:
- You need JS execution for content to show.
- There is moderate blocking (occasional captcha/interstitials).
- You don’t do crazy heavy DOM processing per page.
This range covers most “normal” Actors I see in the Apify Store that scrape e‑commerce, SaaS app UIs, and JS-heavy content sites.
3. Heavy pages or complex workflows
Stack: Playwright/Puppeteer/Selenium, login, multi-step flows, infinite scroll, extra processing.
- CU per 1,000 pages: ~1.5–4 CUs
- CU per 10,000 pages: ~15–40 CUs/day
Typical when:
- You log in per session and simulate user behavior.
- You scroll long lists, paginate deep, or click multiple elements per page.
- You do additional work: full text extraction, media downloads, deduping, AI calls, etc.
A simple formula you can reuse
To reason about your own crawler, use this:
Daily CUs ≈ (Pages per day) × (CU per page estimate)
Monthly CUs ≈ Daily CUs × 30
Plug in rough CU/page numbers from the ranges above:
Example (standard JS-rendered site):
CU per page ≈ 0.0008 (i.e., 0.8 CU per 1,000 pages)
Pages per day = 10,000
Daily CUs ≈ 10,000 × 0.0008 = 8 CUs/day
Monthly CUs ≈ 8 × 30 = 240 CUs/month
From there you compare 240 CUs/month against Apify’s current plan quotas and CU pricing.
Important: CU prices and plan quotas can change. Always cross-check against the live pricing page in the Apify Console or on apify.com.
Three concrete scenarios for crawling 10,000 pages per day
To make this less abstract, here are three “shapes” of workloads and their compute implications.
Scenario A: Static content crawler (Website Content Crawler–style, no JS)
-
Goal: Extract clean text + metadata from blog/docs for a RAG pipeline.
-
Stack:
- Crawlee
CheerioCrawleror Apify Website Content Crawler configured without JS rendering. - Minimal per-page logic: strip boilerplate, output Markdown/HTML/text.
- Crawlee
-
Estimated cost:
- ~0.1–0.3 CUs per 1,000 pages
- ~1–3 CUs per 10,000 pages/day
-
When this applies:
- Websites are mostly static and don’t require JS to render content.
- Blocking is low, no login required.
If you’re feeding a vector database and just need the text, this is where you want to be; it’s the cheapest tier of CU usage.
Scenario B: Standard headless browser crawler
-
Goal: Extract product data, listings, or dashboards where JS is required.
-
Stack:
- Crawlee
PlaywrightCrawler/PuppeteerCrawler. - Reuse browser contexts per run (don’t launch a full browser per page).
- Extract ~20–50 fields per page; no infinite scroll.
- Crawlee
-
Estimated cost:
- ~0.5–1.5 CUs per 1,000 pages
- ~5–15 CUs per 10,000 pages/day
-
When this applies:
- You’re scraping modern e‑commerce, SaaS UIs, or single-page apps.
- Blocking exists but is manageable with Apify proxies/unblocking.
This is the “middle of the road” case—what I’d assume until tests prove the site is either easier or nastier.
Scenario C: Heavy, protected app crawler
-
Goal: Crawl authenticated areas, long scrolling feeds, or high-value data behind protection.
-
Stack:
- Playwright/Puppeteer/Selenium Actors with login, session handling, link discovery.
- Infinite scroll or multi-step navigation per “page” of data.
- Potential retries due to captchas/rate limiting.
-
Estimated cost:
- ~1.5–4 CUs per 1,000 “pages”
- ~15–40 CUs per 10,000 pages/day
-
When this applies:
- You do a lot of clicks/scrolls per data unit.
- Each page is slow to load or occasionally blocked; you rely on retries.
- You may be downloading assets (images, PDFs, etc.) per page.
Here your budget needs to be conservative; overestimating is safer than underestimating.
How plan selection maps to these estimates
Apify plans mix included CUs with pay‑as‑you‑go overage. Since pricing tables can change, treat the numbers below as a decision framework rather than exact quotes.
Given a range of 5–15 CUs/day (150–450 CUs/month) for a standard JS-rendered 10k‑pages/day crawler:
- Lower bound (150 CUs/month):
- Often fits comfortably into a mid-tier plan if you’re not running many other Actors.
- Upper bound (450 CUs/month):
- May require a higher tier or accepting some pay‑as‑you‑go overage.
If you’re in Scenario A (1–3 CUs/day, ~30–90 CUs/month), you can usually start even at smaller plans unless you have lots of additional workloads.
If you’re in Scenario C (15–40 CUs/day, ~450–1,200 CUs/month), you’re in “bigger plan or custom quote” territory—at that point it’s worth talking to Apify sales to model costs, especially for enterprise usage (99.95% uptime, SOC2, GDPR, and CCPA compliance, etc.).
How to get a precise CU estimate for your own crawler
The only way to move from “estimate” to “confident number” is to run a calibration job and measure CU usage directly.
1. Build a realistic Actor slice
- Use the same stack you intend for production (HTTP only vs Playwright/Puppeteer).
- Include all the logic you plan to keep: parsing, post‑processing, and any integration calls.
- Avoid temporary
console.logspam or artificialwait(5000)calls that inflate usage.
2. Run on a small sample (e.g., 500–1,000 pages)
- Create an input that represents your real workload:
- Same site(s)
- Same filters
- Same depth (pagination, clicks, scrolling)
- In Apify Console → Actor → Runs, open the completed run:
- Note the Total CUs consumed.
- Note the Pages processed (you can log this or infer from dataset item count).
Calculate:
CU per page = (Total CUs in run) / (Pages processed)
3. Extrapolate to 10,000 pages/day
Once you have CU per page, plug into:
Daily CUs = 10,000 × CU per page
Monthly CUs = Daily CUs × 30
Example:
- Calibration run: 0.9 CUs for 1,200 pages.
- CU per page ≈ 0.9 / 1,200 ≈ 0.00075.
- Daily CUs at 10k pages: 10,000 × 0.00075 = 7.5 CUs/day.
- Monthly: 7.5 × 30 ≈ 225 CUs/month.
4. Add headroom for retries and growth
Real workloads aren’t flat. I tend to add:
- 20–30% buffer for:
- Temporary blocking and retries.
- Content growth / more pages over time.
- Additional features you might add later (extra fields, AI transforms, etc.).
Using the example above:
225 CUs/month × 1.3 ≈ 293 CUs/month effective budget
That’s the number you bring into plan selection and budget conversations.
How Apify’s features affect your CU usage (and cost stability)
Even though CUs are the billing unit, the operational stack you run on top of Apify can make a big difference in how many you consume:
-
Proxies and unblocking
- Stable proxies and automated unblocking → fewer retries → fewer wasted CUs.
- Built into Apify’s platform; you don’t need to script this yourself.
-
Concurrency tuning
maxConcurrencyandmaxRequestsPerCrawlin Crawlee let you dial in performance.- Too low = underutilized CPU; too high = throttling, errors, and retries.
- Start conservative, monitor run time and CPU, then adjust.
-
Actor reuse vs one-off scripts
- The more you reuse a well-tuned Actor as a repeatable job, the more you amortize the initial optimization effort.
- Each run produces a dataset you can export (JSON/CSV/Excel) or pull via API, Python/JavaScript clients, CLI, HTTP, or MCP.
-
Avoid heavy work inside the crawler loop
- Move AI-heavy processing (embedding, summarization) downstream when possible.
- Let the Actor produce a “raw” dataset, then process it in a separate step or tool (e.g., Zapier, Airbyte, your own ETL) to keep CU usage predictable.
Example: turning a CU estimate into a plan choice
Let’s apply this step-by-step for a typical “crawl 10,000 e‑commerce product pages per day” project:
-
Pilot run
- Actor: Playwright-based, simple navigation, ~30 fields/page.
- Sample: 1,000 product pages.
- Observed: 1.1 CUs used.
-
CU per page
1.1 CUs / 1,000 pages = 0.0011 CU/page -
Daily & monthly at 10k pages/day
Daily CUs = 10,000 × 0.0011 = 11 CUs/day Monthly CUs ≈ 11 × 30 = 330 CUs/month -
Buffer
330 × 1.3 ≈ 429 CUs/month -
Plan decision
- Look at the live Apify pricing page for:
- Included CUs in each plan.
- Overage CU price.
- If a plan includes ~400–500 CUs/month, you’re likely covered. Otherwise, pick a tier where:
- 429 CUs/month fits comfortably, or
- You’re okay paying some overage at the listed CU rate.
- Look at the live Apify pricing page for:
If your calibration run came back much lower (e.g., 0.0004 CU/page), you’d re‑run this math and likely find that a smaller plan is enough.
Limitations & considerations when estimating CUs
-
Numbers will vary by site:
There’s no universal CU/page value; everything above is calibrated on real workloads but is still a range, not a guarantee. -
Blocking patterns change over time:
Sites increase protection, add captchas, or change layout. That can increase CU usage unless you adjust your Actor and proxy strategy. -
Other Actors also consume CUs:
If you’re running multiple scrapers, data transformers, or maintenance jobs, you need to sum CUs across all of them, not just the 10k‑pages/day crawler.
The safest pattern is: calibrate → extrapolate → monitor. Watch early runs closely and adjust both your Actor and your plan if usage trends away from the initial estimate.
Summary
To estimate how many compute units it will take to crawl ~10,000 pages per day on Apify:
-
Pick a realistic CU per page range based on your stack:
- ~0.0001–0.0003 CU/page for static HTML.
- ~0.0005–0.0015 CU/page for standard JS-rendered pages.
- ~0.0015–0.004 CU/page for heavy, protected, or multi-step flows.
-
Run a calibration Actor on 500–1,000 real pages to get your own CU/page number.
-
Extrapolate to 10,000 pages/day and 30 days/month, then add 20–30% headroom.
-
Map the resulting monthly CU number to Apify’s current plans and CU quotas, deciding whether you fit in-plan or accept some pay‑as‑you‑go usage.
If you already have an example site in mind (or an existing script in Scrapy/Playwright you want to migrate), the next best step is to turn it into an Actor, run a test batch, and look at the actual CU usage in Apify Console.