
How do I run an Apify Actor via API/SDK and pull results as JSON or CSV?
Most teams hit the same wall the first time they try to productionize Apify: they can run an Actor in the Console, but wiring it into an app or pipeline—and getting clean JSON or CSV out—feels fuzzy. The good news is that Apify was built to make this exact flow boring and repeatable.
This guide walks through how to:
- Run any Apify Actor programmatically via API or official SDKs
- Wait for the run to finish (or not, if you want fire‑and‑forget)
- Pull the results as JSON or CSV for downstream use
I’ll show Python and JavaScript examples using the Apify SDKs, plus raw HTTP calls for when you just want a curl-able endpoint.
The Quick Overview
- What It Is: A practical workflow for running Apify Actors via API/SDK and exporting results as JSON or CSV.
- Who It Is For: Developers, data teams, and AI engineers who need web data to land in their code, pipelines, or BI tools—not just in the Apify Console.
- Core Problem Solved: Turning “click Run in the Console” into a stable, automated pipeline where an Actor runs in the cloud and exposes its dataset over HTTP in machine‑friendly formats.
How It Works
At a high level, every Actor run on Apify follows the same pattern:
- You send an API request (or use the SDK) to start a run with some input.
- The Actor executes in Apify’s cloud, using the platform’s proxies, unblocking, and monitoring stack.
- When the run finishes, it writes records to a dataset, which you can access via:
- Apify API (e.g.,
https://api.apify.com/v2/datasets/{datasetId}/items) - Official SDKs (Python/JavaScript clients)
- Export formats (JSON, CSV, Excel, etc.)
- Integrations (Google Sheets, Airbyte, Zapier, MCP clients, etc.)
- Apify API (e.g.,
From your app’s perspective, an Actor is just:
Input → run via API → dataset ID → fetch/export JSON or CSV.
1. Get your API token
You’ll need an Apify account and an API token:
- Go to Apify Console → Integrations → API tokens
- Copy your token (treat it like a password)
You’ll plug this into the SDKs or HTTP headers as APIFY_API_TOKEN.
2. Run an Actor via SDK or HTTP
You can run any Actor you see in the Apify Store, or one you’ve built yourself. An Actor is referenced as username/actor-name or by its internal ID.
Python example (ApifyClient)
from apify_client import ApifyClient
# Initialize the ApifyClient with your Apify API token
client = ApifyClient("<YOUR_API_TOKEN>")
# Prepare the Actor input (this depends on the specific Actor)
run_input = {
"domains": [
"twitter.com",
"apify.com",
]
}
# Run the Actor and wait for it to finish
run = client.actor("curious_coder/similarweb-scraper").call(run_input=run_input)
# The run object contains the dataset ID with results
dataset_id = run["defaultDatasetId"]
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + dataset_id)
JavaScript example (ApifyClient)
import { ApifyClient } from 'apify-client';
const client = new ApifyClient({
token: process.env.APIFY_API_TOKEN,
});
const runInput = {
usernames: ['instagram'],
};
const run = await client.actor('crawlerbros/instagram-follower-scraper').call({
runInput,
});
const datasetId = run.defaultDatasetId;
console.log(`💾 Dataset URL: https://console.apify.com/storage/datasets/${datasetId}`);
Raw HTTP example (curl)
curl "https://api.apify.com/v2/acts/curious_coder~similarweb-scraper/runs?token=YOUR_API_TOKEN" \
-X POST \
-H "Content-Type: application/json" \
-d '{
"runInput": {
"domains": ["twitter.com", "apify.com"]
},
"waitForFinish": 1200
}'
waitForFinish(seconds) tells Apify how long to hold the HTTP connection open while the Actor runs.- The response JSON will include
defaultDatasetIdonce the run is done.
3. Fetch results as JSON or CSV
Every Actor run writes to at least one dataset. The most common one is the default dataset, whose ID you get as run["defaultDatasetId"] (Python) or run.defaultDatasetId (JS).
That dataset exposes endpoints like:
- JSON:
https://api.apify.com/v2/datasets/{datasetId}/items?format=json - CSV:
https://api.apify.com/v2/datasets/{datasetId}/items?format=csv
You can also filter, paginate, and control what fields are returned, but let’s start simple.
How It Works: End‑to‑End Flow
-
Create or choose an Actor:
- From Apify Store (e.g., Instagram Follower Scraper, Similarweb Scraper, Website Content Crawler).
- Or build your own and deploy it as an Actor in Apify Console.
-
Inspect input schema:
- In the Actor’s page in the Store/Console, check “Input” to see what JSON the Actor expects.
- This is the JSON you’ll pass as
run_input(Python) orrunInput(HTTP/JS).
-
Trigger a run:
- Use the Python/JS client’s
.actor('user/actor').call(...), or call the HTTPrunsendpoint. - Optionally wait for completion with
waitForFinish.
- Use the Python/JS client’s
-
Get the dataset ID:
- Read
run["defaultDatasetId"]orrun.defaultDatasetIdfrom the run metadata.
- Read
-
Export items:
- Call the dataset’s
/itemsendpoint, specifyingformat=jsonorformat=csv. - Or use the SDK’s dataset helpers to iterate records.
- Call the dataset’s
Let’s break down the code for JSON and CSV in more detail.
Features & Benefits Breakdown
| Core Feature | What It Does | Primary Benefit |
|---|---|---|
| Run Actors via SDK or API | Starts Actor runs from Python, JavaScript, or HTTP with structured input | Turns any scraper/automation into a callable cloud job |
| Dataset‑backed outputs | Stores each run’s results in a dataset with a stable ID | Gives you a consistent contract for downstream systems |
| JSON/CSV export endpoints | Exposes items endpoints with format=json or format=csv | Lets you plug Apify into data warehouses, BI, and AI/RAG |
Pulling results as JSON
Python: iterate JSON results
from apify_client import ApifyClient
client = ApifyClient("<YOUR_API_TOKEN>")
run_input = {
"usernames": ["instagram"],
}
run = client.actor("crawlerbros/instagram-follower-scraper").call(run_input=run_input)
dataset_id = run["defaultDatasetId"]
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + dataset_id)
# Iterate all items as Python dicts
for item in client.dataset(dataset_id).iterate_items():
print(item) # each item is a JSON-like dict
This is often the simplest path if you’re already in Python and want to process records in‑memory (e.g., transform then write to a DB).
JavaScript: fetch JSON from the dataset
import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: process.env.APIFY_API_TOKEN });
const run = await client.actor('crawlerbros/instagram-follower-scraper').call({
runInput: { usernames: ['instagram'] },
});
const datasetId = run.defaultDatasetId;
// Get all items as an array of objects
const { items } = await client.dataset(datasetId).listItems({
clean: true, // strip internal fields
format: 'json', // default, but explicit is good
});
console.log(items);
HTTP: JSON directly over the API
Once you know datasetId, you can get JSON without any SDK:
curl "https://api.apify.com/v2/datasets/DATASET_ID/items?format=json&clean=true" \
-H "Authorization: Bearer YOUR_API_TOKEN"
clean=trueremoves Apify’s internal metadata fields.- Response is a standard JSON array of objects.
Pulling results as CSV
CSV is handy when your destination is a spreadsheet, a legacy system, or a data warehouse ingestion job that expects flat files.
HTTP: dataset as CSV
curl "https://api.apify.com/v2/datasets/DATASET_ID/items?format=csv&clean=true" \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-o results.csv
You get a streaming CSV where each row is a dataset item.
Optional query parameters you’ll often use:
fields=field1,field2– only include specific fields/columns.delimiter=;– change delimiter if needed for regional settings.
Python: download CSV file
import requests
API_TOKEN = "<YOUR_API_TOKEN>"
DATASET_ID = "<DATASET_ID>"
url = f"https://api.apify.com/v2/datasets/{DATASET_ID}/items"
params = {
"format": "csv",
"clean": "true",
}
headers = {
"Authorization": f"Bearer {API_TOKEN}",
}
response = requests.get(url, headers=headers, params=params)
response.raise_for_status()
with open("results.csv", "wb") as f:
f.write(response.content)
print("Saved CSV to results.csv")
JavaScript: pipe CSV to a file (Node.js)
import fs from 'node:fs';
import fetch from 'node-fetch';
const API_TOKEN = process.env.APIFY_API_TOKEN;
const DATASET_ID = 'YOUR_DATASET_ID';
const url = new URL(`https://api.apify.com/v2/datasets/${DATASET_ID}/items`);
url.searchParams.set('format', 'csv');
url.searchParams.set('clean', 'true');
const res = await fetch(url, {
headers: { Authorization: `Bearer ${API_TOKEN}` },
});
if (!res.ok) {
throw new Error(`Failed to fetch CSV: ${res.status} ${res.statusText}`);
}
await new Promise((resolve, reject) => {
const fileStream = fs.createWriteStream('results.csv');
res.body.pipe(fileStream);
res.body.on('error', reject);
fileStream.on('finish', resolve);
});
console.log('Saved CSV to results.csv');
Ideal Use Cases
- Best for data pipelines and AI workflows: Because you can run Actors on a schedule, export datasets as JSON, and feed them straight into vector databases, RAG pipelines, or ETL tools using Apify’s Python/JS clients, HTTP API, or MCP clients.
- Best for analysts and business teams: Because any Actor’s dataset can be exported as CSV/Excel and pulled into Google Sheets, BigQuery, or BI tools—no need to understand the Actor internals.
Limitations & Considerations
- Run completion vs. timeouts: If your Actor takes longer than your
waitForFinishor HTTP timeout, the “run” call will return without a finished dataset. In that case, poll run status (statusfield) or use webhooks instead of assuming it’s done. - Dataset size and memory: Large datasets can be big enough that loading them entirely into memory (as a single JSON array) is risky. Use SDK iterators (
iterate_items()in Python,listItemswith pagination in JS) or stream CSV instead of doing a single giant fetch.
Pricing & Plans
Apify charges primarily based on Actor compute units (CUs) and, to a lesser extent, storage and data transfer. Running Actors via API/SDK and exporting JSON/CSV uses the same pricing model as running them from the Console.
- Developer / Pay‑as‑you‑go: Best for individuals or small teams needing to trigger Actors from scripts, prototypes, or small production workloads without committing to enterprise contracts.
- Business / Enterprise: Best for teams wiring Apify into critical systems and AI stacks, where you’ll want higher limits, SSO, dedicated support, SLAs (99.95% uptime), and compliance (SOC2, GDPR, CCPA).
For exact numbers and limits, check Apify’s current pricing page in the Console or marketing site.
Frequently Asked Questions
Can I trigger an Actor run and immediately get the JSON results in one call?
Short Answer: Yes, if the Actor finishes within your configured timeout; otherwise you must fetch the dataset separately.
Details:
Using the HTTP API, you can call the runs endpoint with waitForFinish and webhooks options. If the run completes before waitForFinish expires, the response includes the final run object, from which you can get defaultDatasetId and then call the dataset’s JSON endpoint. There isn’t a single “run and return JSON body” endpoint because datasets are the contract, but for small/fast runs the two‑step flow (run → dataset JSON) is effectively synchronous.
Do I need the SDK, or can I just use HTTP for everything?
Short Answer: You can absolutely use HTTP only; SDKs just make common patterns easier.
Details:
Everything you can do with Apify—running Actors, listing runs, fetching dataset items, exporting JSON/CSV—is available via the HTTP API. The Python and JavaScript ApifyClient SDKs wrap those endpoints with typed helpers, automatic pagination, and nicer ergonomics (iterate_items(), actor().call(), etc.). If you’re in a language without an official SDK, or you prefer a bare‑metal approach, just use the documented HTTP endpoints.
Summary
Running an Apify Actor via API/SDK and pulling results as JSON or CSV comes down to a simple, repeatable pattern:
- Trigger a run with a JSON input payload from Python, JavaScript, or raw HTTP.
- Wait for completion (or poll) and grab the
defaultDatasetId. - Export results via the dataset’s
/itemsendpoint as JSON (format=json) or CSV (format=csv), using either SDK helpers or direct HTTP calls.
Once you internalize that “Actors produce datasets” contract, it becomes trivial to wire Apify into AI pipelines, ETL jobs, and internal tools without babysitting scrapers, proxies, and unblocking logic yourself.