CLI
Command-line interface for ScrapingBeeInstallation
Recommended — install with uv (no virtual environment needed):
curl -LsSf https://astral.sh/uv/install.sh | sh
uv tool install scrapingbee-cli
Alternative — install with pip in a virtual environment:
pip install scrapingbee-cli
Verify the installation:
scrapingbee --version
Authentication
Save your API key so all commands can use it automatically.
Interactive prompt (recommended for first-time setup):
scrapingbee auth
Non-interactive (CI/CD, scripts):
scrapingbee auth --api-key YOUR_API_KEY
Environment variable (alternative — no file stored):
export SCRAPINGBEE_API_KEY=YOUR_API_KEY
The CLI also reads .env files in the current directory.
Show stored key location:
scrapingbee auth --show
Remove stored key:
scrapingbee logout
Credits and Plan
Check your API credit balance and plan concurrency:
scrapingbee usage
Example response:
{
"max_api_credit": 1000000,
"used_api_credit": 42150,
"max_concurrency": 100,
"current_concurrency": 0,
"renewal_subscription_date": "2025-07-26T04:57:13.580067"
}
scrape
The scrape command calls the HTML API to fetch any web page. CLI flags map 1:1 to API parameters (underscores become hyphens: render_js → --render-js). For predefined values like sort orders, both hyphens and underscores are accepted (e.g. --sort-by price-low and --sort-by price_low both work).
See the HTML API documentation — every code snippet includes a CLI tab with the equivalent command.
Basic Usage
Scrape a page and print HTML to stdout:
scrapingbee scrape "https://example.com"
Save output to a file (extension auto-detected):
scrapingbee scrape "https://example.com" --output-file result
Scrape with JavaScript rendering and premium proxy:
scrapingbee scrape "https://example.com" --render-js true --premium-proxy true
Extract data with AI:
scrapingbee scrape "https://example.com" --ai-query "extract the main article title and author"
Return page as markdown (great for LLM pipelines):
scrapingbee scrape "https://example.com" --return-page-markdown true
Scrape Parameters
Scrape flags correspond directly to HTML API parameters. The table below groups them by category — click any parameter name to see full documentation on the HTML API page.
Rendering
| Flag | API Parameter | Description |
|---|---|---|
--render-js | render_js | Enable/disable JavaScript rendering |
--js-scenario | js_scenario | JavaScript scenario to execute |
--wait | wait | Wait time in ms before returning |
--wait-for | wait_for | CSS/XPath selector to wait for |
--wait-browser | wait_browser | Browser event to wait for |
--block-ads | block_ads | Block ads on the page |
--block-resources | block_resources | Block images and CSS |
--window-width | window_width | Viewport width in pixels |
--window-height | window_height | Viewport height in pixels |
Proxy
| Flag | API Parameter | Description |
|---|---|---|
--premium-proxy | premium_proxy | Use premium/residential proxies (25 credits with JS) |
--stealth-proxy | stealth_proxy | Use stealth proxies for hard-to-scrape sites (75 credits) |
--country-code | country_code | Proxy country code (ISO 3166-1) |
--own-proxy | own_proxy | Use your own proxy (user:pass@host:port) |
Headers
| Flag | API Parameter | Description |
|---|---|---|
-H / --header | Custom headers | Add custom headers (repeatable: -H "Key:Value") |
--forward-headers | forward_headers | Forward custom headers to target |
--forward-headers-pure | forward_headers_pure | Forward only custom headers |
Output Format
| Flag | API Parameter | Description |
|---|---|---|
--json-response | json_response | Wrap response in JSON |
--return-page-source | return_page_source | Return original HTML before JS rendering |
--return-page-markdown | return_page_markdown | Return content as markdown |
--return-page-text | return_page_text | Return content as plain text |
Screenshots
| Flag | API Parameter | Description |
|---|---|---|
--screenshot | screenshot | Capture a screenshot |
--screenshot-selector | screenshot_selector | CSS selector for screenshot area |
--screenshot-full-page | screenshot_full_page | Capture full-page screenshot |
Extraction
| Flag | API Parameter | Description |
|---|---|---|
--extract-rules | extract_rules | CSS/XPath extraction rules as JSON |
--ai-query | ai_query | Natural language extraction (+5 credits) |
--ai-selector | ai_selector | CSS selector to focus AI extraction |
--ai-extract-rules | ai_extract_rules | AI extraction rules as JSON (+5 credits) |
Request
| Flag | API Parameter | Description |
|---|---|---|
--session-id | session_id | Session ID for sticky IP (0-10000000) |
--timeout | timeout | Timeout in ms (1000-140000) |
--cookies | cookies | Custom cookies |
--device | device | Device type: desktop or mobile |
--custom-google | custom_google | Scrape Google domains (true/false). 15 credits per request. |
--transparent-status-code | transparent_status_code | Return target's status code and body as-is (true/false) |
-X / --method | HTTP method | GET, POST, or PUT |
-d / --data | Request body | Request body for POST/PUT |
Configuration
| Flag | API Parameter | Description |
|---|---|---|
--scraping-config | scraping_config | Apply a pre-saved scraping configuration by name |
Scraping Configurations
Use --scraping-config to apply a pre-saved configuration from your ScrapingBee dashboard. This lets you reuse commonly used settings without typing them each time.
scrapingbee scrape "https://example.com" --scraping-config "My-Config"
Inline options override configuration settings — so you can use a saved config as a base and customize individual parameters per request:
scrapingbee scrape "https://example.com" --scraping-config "My-Config" --premium-proxy false
Create and manage configurations in the ScrapingBee request builder. Configuration names are case-sensitive and only accept alphanumeric characters, hyphens, and underscores.
Presets
Presets apply a predefined set of options. They only set flags you haven't already set, so you can override any preset value.
| Preset | Description |
|---|---|
screenshot | Capture a viewport screenshot (enables --screenshot and --render-js) |
screenshot-and-html | Full-page screenshot + HTML in a single JSON response |
fetch | Fast fetch without JavaScript rendering (--render-js false) |
extract-links | Extract all <a href> links from the page as JSON |
extract-emails | Extract all mailto: links from the page |
extract-phones | Extract all tel: links from the page |
scroll-page | Infinite scroll with JS rendering (loads lazy content) |
scrapingbee scrape "https://example.com" --preset screenshot --output-file page
CLI-Only Scrape Flags
These flags are specific to the CLI and do not have API parameter equivalents.
Escalate Proxy
On 403 or 429 responses, automatically retry with premium proxy, then stealth proxy. Useful for sites with aggressive bot detection.
scrapingbee scrape "https://example.com" --escalate-proxy
Chunk Size
Split text/markdown output into chunks of N characters for LLM or vector DB pipelines. Outputs NDJSON (one JSON object per chunk). Set to 0 to disable.
scrapingbee scrape "https://example.com" --return-page-markdown true --chunk-size 2000 --chunk-overlap 200
Chunk Overlap
Number of overlapping characters between consecutive chunks. Only used when --chunk-size > 0.
Force Extension
Force the output file extension (e.g. html, json). Skips automatic extension inference when --output-file has no extension.
scrapingbee scrape "https://example.com" --output-file result --force-extension md
crawl
The crawl command follows links across pages using Scrapy under the hood. Three modes are available:
1. Quick crawl — start from URL(s), follow same-domain links:
scrapingbee crawl "https://example.com" --max-depth 2 --max-pages 50
2. Sitemap crawl — fetch all URLs from a sitemap:
scrapingbee crawl --from-sitemap "https://example.com/sitemap.xml" --max-pages 100
3. Project spider — run a Scrapy project spider with ScrapingBee middleware:
scrapingbee crawl my_spider --project ./my_scrapy_project
All scrape rendering, proxy, and extraction flags are also available for crawl (e.g. --render-js, --premium-proxy, --ai-query). Batch utility flags are also available: -H/--header, --retries, --backoff, --verbose, --output-file, --extract-field, --fields.
Quick Crawl
Start from one or more URLs and follow same-domain links. Each page is saved as a numbered file in the output directory, with a manifest.json mapping URLs to files.
scrapingbee crawl "https://docs.example.com" \
--max-depth 3 \
--max-pages 200 \
--return-page-markdown true \
--output-dir docs_crawl
Restrict crawling with URL patterns:
scrapingbee crawl "https://example.com" \
--include-pattern "/blog/" \
--exclude-pattern "/tag/" \
--max-pages 50
Save only specific pages while crawling the full site for link discovery:
scrapingbee crawl "https://example.com" \
--save-pattern "/product/" \
--ai-query "extract the product name and price" \
--max-pages 100
Sitemap Crawl
Fetch and parse a sitemap (including sitemap indexes) then crawl all discovered URLs:
scrapingbee crawl --from-sitemap "https://example.com/sitemap.xml" \
--return-page-markdown true \
--concurrency 20
Project Spider
Run any Scrapy spider from a project directory. ScrapingBee middleware and your API key are automatically injected:
scrapingbee crawl my_spider --project /path/to/scrapy/project --concurrency 10
The crawl command also supports --scraping-config to apply a pre-saved configuration from your dashboard. All scrape parameters (rendering, proxy, extraction) are passed to each page request.
Crawl Parameters
type]
(default)string]
requiredboolean]
(false)string]
("")boolean]
(false)integer]
(0)string]
("")integer]
(0)integer]
(0)path]
("crawl_<timestamp>")boolean]
(false)string]
("")Target
The positional argument — one or more URLs to start crawling from. In project spider mode, this is the spider name instead of a URL.
scrapingbee crawl "https://example.com"
scrapingbee crawl "https://example.com" "https://blog.example.com"
scrapingbee crawl my_spider --project ./my_project
From Sitemap
Accepts a URL to a sitemap.xml file. The CLI fetches the sitemap (through the ScrapingBee API for proxy support), parses it (handling sitemap indexes recursively up to depth 2), and starts crawling all discovered page URLs.
scrapingbee crawl --from-sitemap "https://example.com/sitemap.xml"
Max Depth
Controls how many link-hops deep the crawler will follow from the start URLs. A depth of 0 means unlimited. Depth 1 means only pages directly linked from the start URLs.
scrapingbee crawl "https://example.com" --max-depth 2
Max Pages
Limits the total number of pages fetched from the ScrapingBee API. Each page costs API credits. A value of 0 means unlimited.
scrapingbee crawl "https://example.com" --max-pages 100
Save Pattern
When set, only pages whose URL matches this regex are saved to disk. All other pages are still visited for link discovery (using lightweight HTML-only requests) but their content is not saved. This lets you crawl an entire site for structure while only saving the pages you care about.
scrapingbee crawl "https://example.com" --save-pattern "/product/" --ai-query "extract product details"
Resume
When resuming a previous crawl, the CLI reads manifest.json in the output directory to skip already-crawled URLs and continue numbering files from where the previous run left off.
scrapingbee crawl "https://example.com" --output-dir my_crawl --resume
On Complete
Requires advanced features setup. This feature executes shell commands and is disabled by default.
Run a shell command after the crawl finishes. The command receives $SCRAPINGBEE_OUTPUT_DIR, $SCRAPINGBEE_SUCCEEDED, and $SCRAPINGBEE_FAILED environment variables.
scrapingbee crawl "https://example.com" --on-complete "echo 'Done! Files in $SCRAPINGBEE_OUTPUT_DIR'"
Project
Path to a Scrapy project directory for running project spiders. The CLI injects ScrapingBee middleware and your API key into the project's Scrapy settings automatically.
scrapingbee crawl my_spider --project /path/to/scrapy/project
Allowed Domains
Comma-separated list of domains the crawler is allowed to visit. By default, the crawler only follows links on the same domain as the start URL(s). Use this to explicitly whitelist additional domains.
scrapingbee crawl "https://example.com" --allowed-domains "example.com,blog.example.com"
Allow External Domains
Follow links to any domain, not just the start URL's domain. Use with caution — the crawl can expand rapidly. Combine with --max-pages to set a hard limit.
scrapingbee crawl "https://example.com" --allow-external-domains --max-pages 50
Include Pattern
A regex pattern that URLs must match to be followed. Only links whose full URL matches this pattern will be visited. Useful for restricting crawls to specific sections of a site.
scrapingbee crawl "https://example.com" --include-pattern "/docs/" --max-pages 100
Exclude Pattern
A regex pattern for URLs to skip. Links matching this pattern will not be followed, even if they match --include-pattern. Useful for avoiding pagination, tags, or other low-value pages.
scrapingbee crawl "https://example.com" --exclude-pattern "/tag/|/page/|/author/"
Download Delay
Delay in seconds between consecutive requests. Useful for being polite to the target server or avoiding rate limits. Accepts decimal values.
scrapingbee crawl "https://example.com" --download-delay 1.5
Autothrottle
Enable Scrapy's AutoThrottle extension, which automatically adjusts the download delay based on the server's response time and load. Recommended for large crawls where you don't want to overwhelm the target.
scrapingbee crawl "https://example.com" --autothrottle --max-pages 500
Output Directory
Folder where crawl results are saved. Each page is written as a numbered file with a manifest.json mapping URLs to files. Defaults to crawl_<timestamp>.
scrapingbee crawl "https://example.com" --output-dir my_crawl
Concurrency
Maximum number of concurrent requests. Set to 0 (default) to auto-detect from your plan's concurrency limit. Higher values speed up crawls but use more credits in parallel. The CLI caps concurrency at min(--concurrency, --max-pages) to prevent overshoot.
scrapingbee crawl "https://example.com" --concurrency 20 --max-pages 100
Batch Processing
The --input-file flag enables batch mode on scrape, google, and all other scraper commands. Instead of processing a single item, the CLI reads a file of URLs (or queries, ASINs, etc.) and processes them concurrently.
Input
Batch input supports .txt (one URL per line), .csv, and .tsv files. Use --input-column for CSV files:
# Text file (one URL per line)
scrapingbee scrape --input-file urls.txt
# CSV file with a "url" column
scrapingbee scrape --input-file sites.csv --input-column url
# Pipe from stdin
cat urls.txt | scrapingbee scrape --input-file -
Output
Results are saved as numbered files in the output directory (default: batch_<timestamp>):
scrapingbee scrape --input-file urls.txt --output-dir my_results
Alternative output formats:
# Single CSV file
scrapingbee google --input-file queries.txt --output-format csv --output-dir results
# NDJSON to stdout (great for piping)
scrapingbee scrape --input-file urls.txt --output-format ndjson | jq .title
Additional Options
Deduplication and Sampling
Clean up your input before spending credits. --deduplicate normalizes URLs (lowercases domains, strips fragments and trailing slashes) and removes duplicates. --sample picks N random items for testing your configuration before committing to a full run.
scrapingbee scrape --input-file urls.txt --deduplicate --sample 10
Post-Processing
Requires advanced features setup. This feature executes shell commands and is disabled by default.
Transform each result before it's written to disk by piping it through a shell command. The result body is sent to stdin, and the command's stdout replaces it. Works with any tool: jq for JSON filtering, sed for text manipulation, or custom scripts.
# Keep only the first 3 organic results from each Google search
scrapingbee google --input-file queries.txt --post-process "jq '.organic_results[:3]'"
# Extract just the title from each scraped page
scrapingbee scrape --input-file urls.txt --post-process "jq -r '.title // empty'"
Note: --post-process applies to files and ndjson output formats, but not to --update-csv.
Update CSV In-Place
Fetch fresh data for each row and add the results as new columns directly into the original CSV. The existing columns are preserved and new data is merged alongside them. Ideal for enriching datasets with live web data — prices, stock levels, ratings, or any extracted field.
scrapingbee scrape --input-file products.csv --input-column url \
--extract-rules '{"price":".price","title":"h1"}' \
--update-csv
The CLI reads the CSV, scrapes each URL in the specified column, flattens the JSON response, and writes the enriched CSV back. Nested JSON is automatically flattened to dot-notation columns (e.g. buybox.price).
Resume After Interruption
If a batch is interrupted (Ctrl+C, network issue, credit limit), re-run with --resume and the same --output-dir. The CLI scans existing output files and skips already-completed items, continuing from where it left off.
scrapingbee scrape --input-file urls.txt --output-dir my_batch --resume
Extract Specific Fields
Pull values from JSON responses using dot-path notation. The output is one value per line, ready to pipe into another command or save as a list. If the path traverses an array, values from every item are extracted.
# Extract all URLs from Google search results
scrapingbee google "best laptops 2025" --extract-field organic_results.url
# Extract product ASINs from Amazon search, then fetch each product
scrapingbee amazon-search "headphones" --extract-field products.asin > asins.txt
scrapingbee amazon-product --input-file asins.txt --output-dir products
If the path doesn't match any data, the CLI prints a warning with all available dot-paths to help you find the correct one.
Run a Command After Completion
Requires advanced features setup. This feature executes shell commands and is disabled by default.
Trigger a notification, sync results to a database, or start a downstream pipeline when the batch finishes. The command receives environment variables with the results summary.
scrapingbee scrape --input-file urls.txt --on-complete "echo 'Done: $SCRAPINGBEE_SUCCEEDED ok, $SCRAPINGBEE_FAILED failed'"
Batch Parameters
type]
(default)string]
requiredinteger]
(0)boolean]
(false)string]
("")string]
("")path]
("batch_<timestamp>")"files" | "csv" | "ndjson"]
("files")boolean]
(false)Input File
Path to the file containing one item per line. Supports .txt, .csv, and .tsv formats. Use - to read from stdin. For CSV/TSV files, combine with --input-column to specify which column contains the target values.
Output Format
Controls how batch results are written:
- files (default): One file per result in the output directory, with a
manifest.jsonindex. - csv: All results merged into a single CSV file.
- ndjson: Newline-delimited JSON streamed to stdout (ideal for piping to
jqor other tools).
Update CSV
When used with a CSV input file, fetches fresh data for each row and adds the results as new columns in the original CSV. Useful for enriching existing datasets.
scrapingbee scrape --input-file products.csv --input-column url \
--extract-rules '{"price":".price"}' --update-csv
Post Process
Requires advanced features setup.
Pipe each individual result through a shell command before writing to disk. The command receives the result body on stdin. Useful for filtering or transforming JSON.
scrapingbee google --input-file queries.txt --post-process "jq '.organic_results[:5]'"
Resume
When resuming a previous batch, the CLI scans the output directory for already-completed items and skips them. Numbering continues from the previous run.
On Complete
Requires advanced features setup.
Shell command to run after batch completion. Your script receives three environment variables: $SCRAPINGBEE_OUTPUT_DIR (path to the results folder), $SCRAPINGBEE_SUCCEEDED (number of successful requests), and $SCRAPINGBEE_FAILED (number of failed requests) — so it can process the output, trigger downstream workflows, or send alerts based on results.
Input Column
For CSV/TSV input files, specifies which column contains the target values. Accepts a column name (from the header row) or a 0-based index. When omitted, the first column is used.
scrapingbee scrape --input-file sites.csv --input-column url
scrapingbee scrape --input-file data.csv --input-column 2
Output Directory
Folder where batch results are saved. Each result is written as a numbered file (e.g. 1.html, 2.json) with a manifest.json index mapping inputs to files. Defaults to batch_<timestamp>.
Output File
Write output to a specific file instead of stdout. For single-item commands (not batch), this saves the response directly. The file extension is auto-detected from the response type (HTML, JSON, PNG, etc.) unless you include one.
Concurrency
Maximum number of concurrent requests. Set to 0 (default) to auto-detect from your plan's concurrency limit via the usage API. Higher values speed up batch processing but use more credits simultaneously.
scrapingbee scrape --input-file urls.txt --concurrency 20
Deduplicate
Normalize URLs and remove duplicates from the input before processing. URL normalization lowercases the domain, strips fragments, and removes trailing slashes. Useful when your input file may contain duplicate or near-duplicate URLs.
scrapingbee scrape --input-file urls.txt --deduplicate
Sample
Process only N random items from the input file. Useful for testing your batch configuration on a subset before running the full job. Set to 0 (default) to process all items.
scrapingbee scrape --input-file urls.txt --sample 10 --output-dir test_run
No Progress
Suppress the per-item progress counter during batch processing. Useful when piping output or running in CI/CD where the progress updates would clutter logs.
Verbose
Show HTTP status code, credit cost, resolved URL, and other response headers for each request. In verbose mode, the CLI displays exact credit costs for SERP commands (e.g. Credit Cost: 10) based on the request parameters.
Extract Field
Extract values from JSON responses using a dot-path expression, outputting one value per line. Supports nested paths and automatically iterates over arrays. The output is newline-separated, making it ideal for piping into --input-file of another command.
scrapingbee google "pizza" --extract-field organic_results.url
scrapingbee amazon-search "laptop" --extract-field products.asin > asins.txt
If the path doesn't match any data, the CLI prints a warning with all available dot-paths to help you find the correct one.
Fields
Filter JSON output to include only the specified comma-separated top-level keys. Useful for reducing output size when you only need certain parts of the response.
scrapingbee google "test" --fields "organic_results,meta_data"
Retries
Number of retry attempts on transient errors (HTTP 5xx, connection errors). Default is 3. Each retry uses exponential backoff controlled by --backoff.
Backoff
Multiplier for exponential backoff between retries. Default is 2.0, meaning delays of 2s, 4s, 8s between retries. Lower values retry faster; higher values are gentler on the API.
export
The export command merges numbered output files from a batch or crawl into a single file. It reads manifest.json (if present) to annotate each record with its source URL.
Examples
# Merge to NDJSON (default)
scrapingbee export --input-dir batch_20250101_120000 --output-file all.ndjson
# Merge to plain text
scrapingbee export --input-dir crawl_20250101 --format txt --output-file pages.txt
# Merge to CSV with flattened nested JSON
scrapingbee export --input-dir serps/ --format csv --flatten --output-file results.csv
# CSV with specific columns only
scrapingbee export --input-dir serps/ --format csv --columns "title,url,price" --output-file filtered.csv
# Deduplicate CSV rows
scrapingbee export --input-dir batch/ --format csv --deduplicate --output-file unique.csv
Export Parameters
type]
(default)string]
("")boolean]
(false)"ndjson" | "txt" | "csv"]
("ndjson")Format
- ndjson (default): One JSON object per line. If the source file is valid JSON, it's output as-is with an added
_urlfield. Non-JSON files are wrapped as{"content": "...", "_url": "..."}. - txt: Plain text output. Each file's content is separated by a blank line, prefixed with
# URLwhen manifest is available. - csv: Flattens JSON files into tabular rows. JSON arrays inside each file are expanded into individual rows. Use
--flattenfor nested objects and--columnsto select specific fields.
Flatten
In CSV mode, recursively flattens nested dictionaries to dot-notation column names. For example, {"buybox": {"price": 29.99}} becomes a column named buybox.price. Lists of dictionaries are indexed: buybox.0.price, buybox.1.price, etc.
Input Directory
The batch or crawl output directory to read from. The export command looks for numbered files (e.g. 1.json, 2.html) and optionally reads manifest.json to annotate each record with its source URL.
Deduplicate
In CSV mode, remove duplicate rows from the output. Two rows are considered duplicates if all their column values are identical.
Columns
In CSV mode, include only the specified comma-separated column names. Rows missing all selected columns are dropped. Useful for extracting specific fields from large JSON responses.
scrapingbee export --input-dir results/ --format csv --columns "title,url,price" --output-file filtered.csv
Output File
Write the merged output to a file instead of stdout. The default outputs to stdout, which is useful for piping to other tools.
schedule
Requires advanced features setup. The schedule command executes shell commands via cron and is disabled by default.
The schedule command creates cron jobs to run any ScrapingBee CLI command at fixed intervals.
Creating a Schedule
# Monitor a price every 5 minutes
scrapingbee schedule --every 5m --name btc-price \
scrape "https://example.com/price" --extract-rules '{"price":".amount"}'
# Scrape news headlines every hour
scrapingbee schedule --every 1h --name news \
google "breaking news" --search-type news
# Daily crawl
scrapingbee schedule --every 1d --name daily-crawl \
crawl "https://example.com" --max-pages 50 --return-page-markdown true
Managing Schedules
# List all active schedules
scrapingbee schedule --list
# Stop a specific schedule
scrapingbee schedule --stop btc-price
# Stop all schedules
scrapingbee schedule --stop all
How It Works
The CLI uses your system's cron to run commands at the specified interval. Each schedule:
- Creates a cron entry tagged with the schedule name
- Logs output to
~/.config/scrapingbee-cli/logs/<name>.log - Tracks metadata in
~/.config/scrapingbee-cli/schedules.json
Interval syntax: 5s (seconds, converted to minutes), 5m (minutes), 1h (hours), 2d (days). Minimum interval is 1 minute.
Schedule Parameters
type]
(default)string]
requiredboolean]
(false)string]
("")Every
Duration string specifying how often to run the command. Uses cron under the hood:
5m→ runs every 5 minutes (*/5 * * * *)1h→ runs every hour (0 */1 * * *)2d→ runs every 2 days (0 0 */2 * *)
Stop
Stop a schedule by name, removing its cron entry and registry record. Use --stop all to stop all active schedules.
scrapingbee schedule --stop btc-price
Name
A human-readable name for the schedule. Used to identify it in --list output and to stop it with --stop. If omitted, a name is auto-generated from the command arguments.
scrapingbee schedule --every 1h --name hourly-news google "breaking news"
List
Display all active schedules in a table showing the name, interval, how long each has been running, and the full command. Useful for checking what's scheduled before adding or removing jobs.
scrapingbee schedule --list
Pipelines
The real power of the CLI emerges when you chain commands together. Every command is designed to compose — output from one step feeds naturally into the next. This turns the CLI into a data pipeline engine where web scraping is just the first stage.
Scrape to LLM: Building a Knowledge Base
Large language models and RAG (Retrieval-Augmented Generation) systems need clean text. The CLI can crawl an entire documentation site and convert every page to markdown — ready for embedding and indexing in a vector database.
scrapingbee crawl "https://docs.example.com" \
--return-page-markdown true --max-pages 500 --output-dir knowledge_base
For single-page ingestion, use --chunk-size on the scrape command to split content into overlapping NDJSON chunks with metadata (URL, chunk index, total chunks, timestamp) — ready to pipe directly into an embedding API.
scrapingbee scrape "https://docs.example.com/guide" \
--return-page-markdown true --chunk-size 2000 --chunk-overlap 200
Unix Piping: Composing with Standard Tools
The CLI speaks stdin and stdout fluently. Use --input-file - to read from a pipe and --output-format ndjson to stream structured results — connecting ScrapingBee to the entire Unix ecosystem.
Extract titles from a list of URLs and filter with jq:
cat urls.txt | scrapingbee scrape --input-file - \
--output-format ndjson --extract-rules '{"title":"h1"}' | jq -r '.title'
Chain two ScrapingBee commands — search Google, then scrape the top results:
scrapingbee google "best python libraries 2025" \
--extract-field organic_results.url | scrapingbee scrape --input-file - \
--return-page-markdown true --output-dir articles
Data Enrichment: Augmenting Existing Datasets
Start with a CSV of products, competitors, or leads — and enrich it with live web data. The --update-csv flag adds scraped results as new columns directly into your existing file, preserving all original data.
scrapingbee scrape --input-file products.csv --input-column url \
--extract-rules '{"price":".price","stock":".availability"}' --update-csv
This is particularly powerful for monitoring workflows: run it on a schedule and your CSV accumulates fresh data over time. Use --extract-rules to target exactly the fields you need — keeping your dataset clean and focused.
ETL: Extract, Transform, Load
For larger datasets, the batch → export → transform pattern gives you full control over each stage. Scrape in parallel, merge the results, then reshape into exactly the format your downstream system needs.
scrapingbee amazon-search --input-file queries.txt --output-dir raw_results
scrapingbee export --input-dir raw_results --format csv --flatten --output-file products.csv
The --flatten flag recursively expands nested JSON into dot-notation columns (buybox.price, seller.0.name), turning deeply nested API responses into flat CSV rows that work in any spreadsheet or database.
Monitoring: Scheduled Data Collection
Requires advanced features setup.
Combine schedule with any pipeline to run it automatically. The CLI registers a cron job that executes your command at the specified interval, with output logged for debugging.
scrapingbee schedule --every 1h --name competitor-prices \
scrape --input-file competitors.csv --input-column url \
--extract-rules '{"price":".price"}' --update-csv
Each run appends fresh data. Use --on-complete to trigger a notification, sync to a database, or kick off a downstream analysis when a batch job finishes.
scrapingbee schedule --every 6h --name news-digest \
google --input-file queries.txt --output-dir news_results \
--on-complete "python analyze.py"
Save-Pattern Crawling: Surgical Data Extraction
Sometimes you need to crawl an entire site for navigation structure but only extract data from specific pages. The --save-pattern flag crawls all pages for link discovery (using lightweight HTML requests) but only applies your expensive extraction options to pages whose URLs match the pattern.
scrapingbee crawl "https://store.example.com" \
--save-pattern "/product/" --ai-query "extract product name, price, and reviews" \
--max-pages 500
This can dramatically reduce API credit usage on large sites where only a fraction of pages contain the data you need.
Scraper API Commands
These commands wrap ScrapingBee's specialized scraper APIs. Full parameter documentation lives on each API's page — select the CLI tab for command-line examples.
| Command | API Page |
|---|---|
scrapingbee google "query" | Google Search API → |
scrapingbee fast-search "query" | Fast Search API → |
scrapingbee amazon-product ASIN | Amazon Product API → |
scrapingbee amazon-search "query" | Amazon Search API → |
scrapingbee walmart-product ID | Walmart Product API → |
scrapingbee walmart-search "query" | Walmart Search API → |
scrapingbee youtube-search "query" | YouTube API → |
scrapingbee youtube-metadata VIDEO_ID | YouTube API → |
scrapingbee chatgpt "prompt" | ChatGPT API → |
All scraper commands support --input-file for batch processing and the same output flags (--output-file, --output-format, --extract-field, --fields).
Quick Examples
# Google search
scrapingbee google "web scraping best practices" --output-file results.json
# Fast search (lightweight, 1 credit per request)
scrapingbee fast-search "python web scraping"
# Amazon product
scrapingbee amazon-product B08N5WRWNW --output-file product.json
# Amazon search
scrapingbee amazon-search "wireless headphones" --sort-by bestsellers
# Walmart product
scrapingbee walmart-product 123456789 --output-file product.json
# Walmart search
scrapingbee walmart-search "gaming laptop" --sort-by price-low
# YouTube search
scrapingbee youtube-search "python tutorial" --upload-date this-week
# YouTube metadata
scrapingbee youtube-metadata dQw4w9WgXcQ --output-file video.json
# ChatGPT query
scrapingbee chatgpt "Summarize the latest AI news" --search true
Batch Examples
# Batch Google search
scrapingbee google --input-file queries.txt --output-format csv --output-dir serps
# Batch Amazon products
scrapingbee amazon-product --input-file asins.txt --output-dir products
Advanced Features
The --post-process, --on-complete, and schedule features execute arbitrary shell commands on your machine. To prevent accidental or unauthorized use, these are disabled by default and require explicit setup.
Why Are They Gated?
In AI agent environments, scraped web content could contain prompt injection attempts that trick an AI into constructing malicious shell commands. The exec gate ensures these features can only run when a human has deliberately enabled them.
How to Enable
Three conditions must be met before these features will run:
Step 1 — Set the environment variable:
export SCRAPINGBEE_ALLOW_EXEC=1
Add this to your ~/.bashrc or ~/.zshrc to persist across sessions.
Step 2 — Run the unsafe verification command:
scrapingbee auth --unsafe
This writes a verification flag to your config file (~/.config/scrapingbee-cli/.env).
Step 3 (optional) — Restrict allowed commands:
export SCRAPINGBEE_ALLOWED_COMMANDS="jq,head,python3 /path/to/transform.py"
Comma-separated list of allowed command prefixes. When set, only commands matching these prefixes can be executed by --post-process and --on-complete. If not set, any command is allowed once the first two conditions are met.
Status and Audit
# Check if advanced features are enabled and view audit log
scrapingbee unsafe --list
# View recent shell command audit log
scrapingbee unsafe --audit
# View only the last N lines of the audit log
scrapingbee unsafe --audit --audit-lines 20
Disabling
To revoke advanced features:
scrapingbee logout
This removes both the API key and the unsafe verification flag. Alternatively, unset the environment variable (unset SCRAPINGBEE_ALLOW_EXEC).
To revoke only the unsafe flag while keeping your API key stored:
scrapingbee unsafe --disable
Utility Commands
unsafe
Manage advanced features status and view the shell command audit log:
scrapingbee unsafe --list # Check status
scrapingbee unsafe --audit # View audit log
docs
Print or open the ScrapingBee documentation URL:
scrapingbee docs # Print the URL
scrapingbee docs --open # Open in browser
Version
scrapingbee --version
usage
Check your API credit balance and plan concurrency. See Credits and Plan for details.
scrapingbee usage
auth
Save or display your API key. See Authentication for details.
scrapingbee auth
logout
Remove stored API key and unsafe verification flag. See Authentication for details.
scrapingbee logout
Help
Get help for any command:
scrapingbee --help
scrapingbee scrape --help
scrapingbee crawl --help