
I need to track new listings on a marketplace site every hour—how do people do recurring scraping without building a whole backend?
Most teams that monitor marketplaces on an hourly (or tighter) cadence don’t build their own cron servers, proxy pools, and job scheduler anymore. They treat scraping as a managed job: define what to crawl, how often to run, where to put the dataset, and let a platform handle the backend. That’s exactly the problem Apify Actors and scheduled runs are designed to solve.
Quick Answer: You build or pick a marketplace scraper as an Apify Actor, configure it once, and then schedule it to run every hour in Apify’s cloud. Apify handles the cron jobs, proxies, unblocking, storage, and notifications, and you just consume a clean dataset or API when new listings appear.
The Quick Overview
- What It Is: A hosted way to run recurring web scraping jobs (Actors) on a schedule without building your own backend, cron, or proxy infrastructure.
- Who It Is For: Engineers, growth teams, and ops people who need hourly (or faster) updates from marketplaces—new listings, price changes, stock changes—without babysitting scripts.
- Core Problem Solved: Continuous tracking of new marketplace listings without managing servers, schedulers, or anti-bot issues yourself.
How It Works
At a high level, you define the scraping logic once, deploy it as an Apify Actor, and then let Apify run it every hour. Each run produces a dataset of listings you can query via API, export as JSON/CSV/Excel, or pipe into your own tools. You can store “seen” listing IDs to only act on genuine new items.
Here’s the usual lifecycle for “track new marketplace listings every hour”:
- Define your scraper as an Actor
- Schedule and monitor hourly runs
- Consume only the new listings via dataset/API
Let’s walk through each phase.
1. Define your scraper as an Actor
You have two options:
-
Option A: Use an existing Actor from the Apify Store
- Search the Apify Store for the marketplace you care about (many popular marketplaces already have scrapers).
- Configure inputs like:
- Category, search terms, or URL
- Location, price range, filters
- “Only new since X” flags if the Actor supports them
- Test-run the Actor once in Apify Console to confirm the fields you get back (title, price, URL, listing ID, seller, timestamp, etc.).
-
Option B: Build your own custom Actor
- Use Crawlee with Playwright/Puppeteer (or plain HTTP) to:
- Load the listing/search page.
- Scroll/paginate through results.
- Extract a stable listing identifier (ID/URL), price, title, and other metadata.
- Define a clear output schema so every run outputs a consistent dataset.
- Add light deduplication logic if useful (e.g., skip items you’ve seen before by reading an internal key‑value store or a previous dataset).
- Use Crawlee with Playwright/Puppeteer (or plain HTTP) to:
Either way, the result is the same: one Actor that, when run, outputs “all relevant listings right now” as a dataset.
2. Schedule and monitor hourly runs
Once your Actor is working interactively, you set up recurring scraping without building cron:
-
Create a schedule in Apify Console
- Go to your Actor → “Schedules”.
- Set the cron expression or pick a preset like “Every hour”.
- Choose:
- Input (search term, location, category).
- Timeout, memory, and concurrency limits appropriate for your marketplace.
- Notification settings (email/Slack/webhook when a run fails).
-
Let Apify handle the backend
- No need for:
- VPS / container orchestration
- System cron / Celery / Airflow
- Proxy pool setup and rotation
- Apify provides:
- Proxies and unblocking to handle rate limits and bot protection.
- Cloud execution for every hourly run.
- Monitoring and logs with per-run details and historical trends.
- Retries and run statuses so you can react to failures.
- No need for:
-
Keep it reliable over time
- Use logs and screenshots (for browser-based Actors) to debug layout changes.
- Version your Actor—when a marketplace changes HTML or API structure, update and deploy a new version without breaking existing schedules.
3. Consume only the new listings
Each hourly run creates a dataset with the latest snapshot of listings based on your search/filter.
Common patterns to get “just the new stuff”:
-
Client-side diffing
- Pull the last two runs via the Apify API.
- Compare listing IDs in your code or workflow (e.g., in a small Python script, Zapier, or Airbyte).
- Only act on IDs not seen in the previous run.
-
Actor-side deduplication
- Inside your Actor:
- Read a key‑value store containing previously seen listing IDs.
- Emit only listings that are new.
- Update the store with the latest IDs.
- Your dataset already contains “only new since last run” rows.
- Inside your Actor:
You can then:
- Export datasets as JSON, CSV, Excel directly from Apify Console.
- Integrate via API using official clients:
- JavaScript/TypeScript
- Python
- HTTP/OpenAPI/CLI/MCP
- Connect to tools:
- Push to Google Sheets or Google Drive
- Trigger workflows via webhooks or Zapier
- Sync into a database or warehouse via Airbyte
- Feed AI workflows (e.g., into Pinecone via custom code, or as context into your RAG pipeline).
Features & Benefits Breakdown
| Core Feature | What It Does | Primary Benefit |
|---|---|---|
| Scheduled Actor runs | Runs your marketplace scraper automatically every hour (or custom cron) in Apify’s cloud. | Recurring scraping without cron, servers, or manual triggers. |
| Managed proxies & unblocking | Routes requests through Apify’s proxy infrastructure and unblocking stack. | Fewer bans and CAPTCHA walls, no need to own proxy pools. |
| Datasets & API access | Stores each run’s results as datasets you can browse, export, or query via API. | Easy to detect new listings, keep history, and pipe data into other systems. |
Ideal Use Cases
- Best for tracking new marketplace listings: Because it runs automatically every hour, stores results as datasets, and lets you filter new versus seen listing IDs without writing a full backend.
- Best for price and availability monitoring: Because you can schedule more frequent runs, compare attributes between datasets, and alert on changes without building a custom monitoring system.
Limitations & Considerations
- Site terms and legal constraints: Some marketplaces restrict automated access. Always review the site’s terms of service and your legal/compliance requirements before scraping, and respect robots.txt where applicable.
- Highly dynamic / heavily protected sites: If a marketplace uses aggressive bot detection, you may need advanced unblocking strategies or a custom solution. In that case, Apify Professional Services can build and maintain a hardened Actor for you.
Pricing & Plans
You pay for runs, compute, and proxy usage, not for building your own backend. Typical flow for “hourly marketplace tracking”:
- Start on a self‑service plan, set up one or more Actors, and schedule them.
- Monitor actual usage (run duration, pages per run, proxy traffic) in Apify Console.
- Adjust schedule frequency and concurrency to fit your budget and SLAs.
Common patterns:
- Starter / Pay-as-you-go: Best for individuals or small teams needing to track a few search URLs or categories hourly, and who want to validate the workflow without committing to a large plan.
- Team / Enterprise: Best for companies monitoring many categories, markets, or competitors, needing higher limits, predictable billing, and SLAs (99.95% uptime, SOC2/GDPR/CCPA compliance).
For detailed, current pricing, check the pricing page on apify.com or contact sales for a demo.
Frequently Asked Questions
Can I really avoid building any backend for hourly scraping?
Short Answer: Yes. The scheduling, execution, proxies, storage, and monitoring all live on Apify.
Details: You still need scraping logic (either via an existing Actor or your own), but you don’t need:
- A server or container cluster to run it.
- A cron system to trigger it every hour.
- A database to store raw scraped data.
- Custom logging/monitoring.
Apify takes the Actor you define and runs it on a schedule, storing each result as a dataset. Your “backend” becomes a combination of:
- Actor code (scraping logic)
- Apify schedule (hourly runs)
- Apify datasets (results)
- Simple API calls or exports on your side
How do I know when a new listing appears without constantly polling?
Short Answer: Use hourly datasets plus IDs to detect new listings, and connect them to notifications via webhooks or a small script.
Details: A pragmatic setup looks like this:
- Include a stable
listingIdor URL in your Actor’s output. - After every run, use:
- An Apify webhook to notify your system when the run finishes.
- Your script or workflow to:
- Fetch the latest dataset via API.
- Compare listing IDs against previously stored IDs.
- For genuinely new IDs:
- Send yourself a Slack message or email.
- Insert rows into a Google Sheet, database, or CRM.
- Trigger an AI workflow (e.g., summarizing or classifying listings).
If you prefer zero extra code, you can also push new items directly into tools like Google Sheets or Slack using integrations and filters.
Summary
To track new marketplace listings every hour without building a backend, treat scraping as a scheduled cloud job instead of a script you babysit. With Apify, you:
- Wrap your marketplace logic in an Actor (or use one from the Store).
- Schedule it to run hourly in the cloud—no cron, no servers, no proxies to manage.
- Use datasets and IDs to pick out only genuinely new listings.
- Export or pull the data via API into whatever systems or AI workflows you already use.
You stay focused on “what to scrape and what to do with it,” and Apify owns the plumbing: proxies, unblocking, scheduling, monitoring, and storage.