How do I run an Apify Actor via API/SDK and pull results as JSON or CSV?
RAG Retrieval & Web Search APIs

How do I run an Apify Actor via API/SDK and pull results as JSON or CSV?

9 min read

Most teams hit the same wall the first time they try to productionize Apify: they can run an Actor in the Console, but wiring it into an app or pipeline—and getting clean JSON or CSV out—feels fuzzy. The good news is that Apify was built to make this exact flow boring and repeatable.

This guide walks through how to:

  • Run any Apify Actor programmatically via API or official SDKs
  • Wait for the run to finish (or not, if you want fire‑and‑forget)
  • Pull the results as JSON or CSV for downstream use

I’ll show Python and JavaScript examples using the Apify SDKs, plus raw HTTP calls for when you just want a curl-able endpoint.


The Quick Overview

  • What It Is: A practical workflow for running Apify Actors via API/SDK and exporting results as JSON or CSV.
  • Who It Is For: Developers, data teams, and AI engineers who need web data to land in their code, pipelines, or BI tools—not just in the Apify Console.
  • Core Problem Solved: Turning “click Run in the Console” into a stable, automated pipeline where an Actor runs in the cloud and exposes its dataset over HTTP in machine‑friendly formats.

How It Works

At a high level, every Actor run on Apify follows the same pattern:

  1. You send an API request (or use the SDK) to start a run with some input.
  2. The Actor executes in Apify’s cloud, using the platform’s proxies, unblocking, and monitoring stack.
  3. When the run finishes, it writes records to a dataset, which you can access via:
    • Apify API (e.g., https://api.apify.com/v2/datasets/{datasetId}/items)
    • Official SDKs (Python/JavaScript clients)
    • Export formats (JSON, CSV, Excel, etc.)
    • Integrations (Google Sheets, Airbyte, Zapier, MCP clients, etc.)

From your app’s perspective, an Actor is just:

Input → run via API → dataset ID → fetch/export JSON or CSV.

1. Get your API token

You’ll need an Apify account and an API token:

  • Go to Apify Console → Integrations → API tokens
  • Copy your token (treat it like a password)

You’ll plug this into the SDKs or HTTP headers as APIFY_API_TOKEN.

2. Run an Actor via SDK or HTTP

You can run any Actor you see in the Apify Store, or one you’ve built yourself. An Actor is referenced as username/actor-name or by its internal ID.

Python example (ApifyClient)

from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input (this depends on the specific Actor)
run_input = {
    "domains": [
        "twitter.com",
        "apify.com",
    ]
}

# Run the Actor and wait for it to finish
run = client.actor("curious_coder/similarweb-scraper").call(run_input=run_input)

# The run object contains the dataset ID with results
dataset_id = run["defaultDatasetId"]
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + dataset_id)

JavaScript example (ApifyClient)

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({
  token: process.env.APIFY_API_TOKEN,
});

const runInput = {
  usernames: ['instagram'],
};

const run = await client.actor('crawlerbros/instagram-follower-scraper').call({
  runInput,
});

const datasetId = run.defaultDatasetId;
console.log(`💾 Dataset URL: https://console.apify.com/storage/datasets/${datasetId}`);

Raw HTTP example (curl)

curl "https://api.apify.com/v2/acts/curious_coder~similarweb-scraper/runs?token=YOUR_API_TOKEN" \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "runInput": {
      "domains": ["twitter.com", "apify.com"]
    },
    "waitForFinish": 1200
  }'
  • waitForFinish (seconds) tells Apify how long to hold the HTTP connection open while the Actor runs.
  • The response JSON will include defaultDatasetId once the run is done.

3. Fetch results as JSON or CSV

Every Actor run writes to at least one dataset. The most common one is the default dataset, whose ID you get as run["defaultDatasetId"] (Python) or run.defaultDatasetId (JS).

That dataset exposes endpoints like:

  • JSON: https://api.apify.com/v2/datasets/{datasetId}/items?format=json
  • CSV: https://api.apify.com/v2/datasets/{datasetId}/items?format=csv

You can also filter, paginate, and control what fields are returned, but let’s start simple.


How It Works: End‑to‑End Flow

  1. Create or choose an Actor:

    • From Apify Store (e.g., Instagram Follower Scraper, Similarweb Scraper, Website Content Crawler).
    • Or build your own and deploy it as an Actor in Apify Console.
  2. Inspect input schema:

    • In the Actor’s page in the Store/Console, check “Input” to see what JSON the Actor expects.
    • This is the JSON you’ll pass as run_input (Python) or runInput (HTTP/JS).
  3. Trigger a run:

    • Use the Python/JS client’s .actor('user/actor').call(...), or call the HTTP runs endpoint.
    • Optionally wait for completion with waitForFinish.
  4. Get the dataset ID:

    • Read run["defaultDatasetId"] or run.defaultDatasetId from the run metadata.
  5. Export items:

    • Call the dataset’s /items endpoint, specifying format=json or format=csv.
    • Or use the SDK’s dataset helpers to iterate records.

Let’s break down the code for JSON and CSV in more detail.


Features & Benefits Breakdown

Core FeatureWhat It DoesPrimary Benefit
Run Actors via SDK or APIStarts Actor runs from Python, JavaScript, or HTTP with structured inputTurns any scraper/automation into a callable cloud job
Dataset‑backed outputsStores each run’s results in a dataset with a stable IDGives you a consistent contract for downstream systems
JSON/CSV export endpointsExposes items endpoints with format=json or format=csvLets you plug Apify into data warehouses, BI, and AI/RAG

Pulling results as JSON

Python: iterate JSON results

from apify_client import ApifyClient

client = ApifyClient("<YOUR_API_TOKEN>")

run_input = {
    "usernames": ["instagram"],
}

run = client.actor("crawlerbros/instagram-follower-scraper").call(run_input=run_input)

dataset_id = run["defaultDatasetId"]
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + dataset_id)

# Iterate all items as Python dicts
for item in client.dataset(dataset_id).iterate_items():
    print(item)  # each item is a JSON-like dict

This is often the simplest path if you’re already in Python and want to process records in‑memory (e.g., transform then write to a DB).

JavaScript: fetch JSON from the dataset

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: process.env.APIFY_API_TOKEN });

const run = await client.actor('crawlerbros/instagram-follower-scraper').call({
  runInput: { usernames: ['instagram'] },
});

const datasetId = run.defaultDatasetId;

// Get all items as an array of objects
const { items } = await client.dataset(datasetId).listItems({
  clean: true,    // strip internal fields
  format: 'json', // default, but explicit is good
});

console.log(items);

HTTP: JSON directly over the API

Once you know datasetId, you can get JSON without any SDK:

curl "https://api.apify.com/v2/datasets/DATASET_ID/items?format=json&clean=true" \
  -H "Authorization: Bearer YOUR_API_TOKEN"
  • clean=true removes Apify’s internal metadata fields.
  • Response is a standard JSON array of objects.

Pulling results as CSV

CSV is handy when your destination is a spreadsheet, a legacy system, or a data warehouse ingestion job that expects flat files.

HTTP: dataset as CSV

curl "https://api.apify.com/v2/datasets/DATASET_ID/items?format=csv&clean=true" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -o results.csv

You get a streaming CSV where each row is a dataset item.

Optional query parameters you’ll often use:

  • fields=field1,field2 – only include specific fields/columns.
  • delimiter=; – change delimiter if needed for regional settings.

Python: download CSV file

import requests

API_TOKEN = "<YOUR_API_TOKEN>"
DATASET_ID = "<DATASET_ID>"

url = f"https://api.apify.com/v2/datasets/{DATASET_ID}/items"
params = {
    "format": "csv",
    "clean": "true",
}
headers = {
    "Authorization": f"Bearer {API_TOKEN}",
}

response = requests.get(url, headers=headers, params=params)
response.raise_for_status()

with open("results.csv", "wb") as f:
    f.write(response.content)

print("Saved CSV to results.csv")

JavaScript: pipe CSV to a file (Node.js)

import fs from 'node:fs';
import fetch from 'node-fetch';

const API_TOKEN = process.env.APIFY_API_TOKEN;
const DATASET_ID = 'YOUR_DATASET_ID';

const url = new URL(`https://api.apify.com/v2/datasets/${DATASET_ID}/items`);
url.searchParams.set('format', 'csv');
url.searchParams.set('clean', 'true');

const res = await fetch(url, {
  headers: { Authorization: `Bearer ${API_TOKEN}` },
});

if (!res.ok) {
  throw new Error(`Failed to fetch CSV: ${res.status} ${res.statusText}`);
}

await new Promise((resolve, reject) => {
  const fileStream = fs.createWriteStream('results.csv');
  res.body.pipe(fileStream);
  res.body.on('error', reject);
  fileStream.on('finish', resolve);
});

console.log('Saved CSV to results.csv');

Ideal Use Cases

  • Best for data pipelines and AI workflows: Because you can run Actors on a schedule, export datasets as JSON, and feed them straight into vector databases, RAG pipelines, or ETL tools using Apify’s Python/JS clients, HTTP API, or MCP clients.
  • Best for analysts and business teams: Because any Actor’s dataset can be exported as CSV/Excel and pulled into Google Sheets, BigQuery, or BI tools—no need to understand the Actor internals.

Limitations & Considerations

  • Run completion vs. timeouts: If your Actor takes longer than your waitForFinish or HTTP timeout, the “run” call will return without a finished dataset. In that case, poll run status (status field) or use webhooks instead of assuming it’s done.
  • Dataset size and memory: Large datasets can be big enough that loading them entirely into memory (as a single JSON array) is risky. Use SDK iterators (iterate_items() in Python, listItems with pagination in JS) or stream CSV instead of doing a single giant fetch.

Pricing & Plans

Apify charges primarily based on Actor compute units (CUs) and, to a lesser extent, storage and data transfer. Running Actors via API/SDK and exporting JSON/CSV uses the same pricing model as running them from the Console.

  • Developer / Pay‑as‑you‑go: Best for individuals or small teams needing to trigger Actors from scripts, prototypes, or small production workloads without committing to enterprise contracts.
  • Business / Enterprise: Best for teams wiring Apify into critical systems and AI stacks, where you’ll want higher limits, SSO, dedicated support, SLAs (99.95% uptime), and compliance (SOC2, GDPR, CCPA).

For exact numbers and limits, check Apify’s current pricing page in the Console or marketing site.


Frequently Asked Questions

Can I trigger an Actor run and immediately get the JSON results in one call?

Short Answer: Yes, if the Actor finishes within your configured timeout; otherwise you must fetch the dataset separately.

Details:
Using the HTTP API, you can call the runs endpoint with waitForFinish and webhooks options. If the run completes before waitForFinish expires, the response includes the final run object, from which you can get defaultDatasetId and then call the dataset’s JSON endpoint. There isn’t a single “run and return JSON body” endpoint because datasets are the contract, but for small/fast runs the two‑step flow (run → dataset JSON) is effectively synchronous.

Do I need the SDK, or can I just use HTTP for everything?

Short Answer: You can absolutely use HTTP only; SDKs just make common patterns easier.

Details:
Everything you can do with Apify—running Actors, listing runs, fetching dataset items, exporting JSON/CSV—is available via the HTTP API. The Python and JavaScript ApifyClient SDKs wrap those endpoints with typed helpers, automatic pagination, and nicer ergonomics (iterate_items(), actor().call(), etc.). If you’re in a language without an official SDK, or you prefer a bare‑metal approach, just use the documented HTTP endpoints.


Summary

Running an Apify Actor via API/SDK and pulling results as JSON or CSV comes down to a simple, repeatable pattern:

  • Trigger a run with a JSON input payload from Python, JavaScript, or raw HTTP.
  • Wait for completion (or poll) and grab the defaultDatasetId.
  • Export results via the dataset’s /items endpoint as JSON (format=json) or CSV (format=csv), using either SDK helpers or direct HTTP calls.

Once you internalize that “Actors produce datasets” contract, it becomes trivial to wire Apify into AI pipelines, ETL jobs, and internal tools without babysitting scrapers, proxies, and unblocking logic yourself.


Next Step

Get Started