Scrapy Cloud: Build Production-Ready Web Scrapers in 30 Minutes

Kevin Sahin | 12 January 2026 | 13 min read

Table of contents

Scrapy Cloud eliminates the need for managing your own servers while providing enterprise-grade infrastructure for your web scraping projects. If you’ve been running Scrapy spiders locally and dealing with server maintenance, uptime monitoring, and scaling challenges, you’re about to discover a much simpler approach. The Scrapy cloud platform transforms how developers deploy and manage their scrapers. As a result, you get everything from automated scheduling to real-time monitoring in one unified dashboard.

In this guide, you’ll learn how to move your existing spiders to the cloud, set up automated runs, and monitor performance without touching a single server configuration file. This isn’t just about convenience; it’s about building production-ready systems that scale with your data extraction needs.

Quick Answer (TL;DR)

Deploy any Scrapy spider to production in minutes: pip install shub, shub login, shub deploy, then run from the Zyte dashboard. Your spider runs in isolated containers with automatic scaling and monitoring. For advanced developers, the Scrapy framework ensures flexibility and control when building robust crawlers.

What Is Scrapy Cloud and Why Use It?

Scrapy Cloud is Zyte’s managed hosting solution that runs your Scrapy spiders in the cloud without requiring you to maintain servers or infrastructure. Think of it as Heroku for web scraping: you focus on writing spiders while the platform handles deployment, scaling, and monitoring.

The Scrapy cloud platform provides enterprise-grade infrastructure built specifically for web scraping operations.

Scrapy

Among all web scraping tools Python developers use, Scrapy Cloud stands out for its managed infrastructure approach. Instead of worrying about server provisioning, Docker configurations, or monitoring setups, you get a complete solution that’s ready for production workloads. The platform runs each spider in isolated containers with 1GB RAM and 2.5GB disk space, automatically handling resource allocation and job queuing.

The real advantage becomes clear when you’re dealing with multiple projects or team collaboration. Traditional local deployments require everyone to maintain identical environments, manage dependencies, and coordinate deployments. Scrapy Cloud centralizes everything through a web interface where team members can deploy, schedule, and monitor spiders regardless of their local setup.

What makes this particularly valuable is the integration with Zyte’s broader ecosystem. Your spiders can leverage Smart Proxy Manager for IP rotation, AI-powered extraction tools, and headless browser capabilities when dealing with JavaScript-heavy sites. This creates a complete data extraction pipeline that would take months to build and maintain independently. You can even integrate with data QA tools to ensure your output remains consistent and accurate across crawls.

If you’re evaluating Scrapy Cloud alternatives, consider whether you need managed hosting or prefer the open-source framework's freedom to host on your own servers. While alternatives exist, Scrapy Cloud offers a full suite of enterprise-grade features and easy integration with the Zyte API for scalable automation.

Zyte Scrapy Cloud vs Local Deployment

Running spiders locally means you’re responsible for everything: server uptime, dependency management, log monitoring, and scaling when traffic increases. Every time your laptop goes to sleep or your internet connection drops, your scraping jobs stop. The Scrapy cloud platform eliminates these reliability issues by providing dedicated infrastructure that runs 24/7.

Furthermore, local deployments also struggle with scheduling and automation. You might set up cron jobs or task schedulers, but managing multiple spiders across different schedules becomes complex quickly. Scrapy Cloud includes built-in scheduling with a visual interface. This means that there's no more debugging cron syntax or dealing with timezone issues. The platform can automatically run your scheduled spiders without manual triggers, simplifying maintenance.

The monitoring difference is significant, too. Local spiders require you to implement logging, error tracking, and performance monitoring from scratch. Scrapy Cloud provides real-time dashboards showing job status, runtime statistics, and error reports. When something breaks at 3 AM, you get alerts instead of discovering failed jobs hours later.

Scaling represents the biggest operational challenge with local deployments. Adding more concurrent jobs means provisioning additional servers, configuring load balancing, and managing resource allocation. The Scrapy cloud platform handles scaling automatically – you simply adjust the number of units and the infrastructure adapts accordingly.

Scrapy Cloud Free Plan Limitations

The free tier gives you unlimited projects and team members, but restricts crawl time to one hour per job with seven-day data retention. This works well for testing and small-scale projects, but becomes limiting for production workloads that need longer runtime or historical data access.

Meanwhile, concurrent job execution is another free plan restriction. You can only run one spider at a time, which creates bottlenecks when managing multiple data sources or time-sensitive scraping operations. The free plan also lacks priority queue management, meaning all jobs run in first-come-first-served order.

For developers evaluating different web scraping tools, the free plan provides enough functionality to test integration and deployment workflows. You can validate that your spiders work correctly in the cloud environment before committing to a paid Scrapy Cloud plan.

Scrapy Cloud Pricing Overview

The Professional plan starts at $9 per unit per month, where each unit provides 1GB RAM and 2.5GB disk space. This pricing model scales with your actual resource usage rather than charging for features you don’t need. Units can be added or removed based on current demand, making it cost-effective for variable workloads.

Scrapy Cloud pricing becomes competitive when you factor in the operational overhead of self-hosting. The Professional plan includes unlimited crawl time, 120-day data retention, concurrent job execution, and priority queue management. When you calculate server costs, monitoring tools, and maintenance time, the managed solution often costs less than DIY approaches.

Enterprise customers with high-volume requirements can negotiate custom pricing that includes dedicated resources, SLA guarantees, and premium support. The pay-as-you-go model means you’re not locked into annual contracts or paying for unused capacity during slower periods.

Step-by-Step Setup in Under 30 Minutes

The deployment process follows a straightforward workflow: create your Zyte account, install the command-line tools, prepare your spider, deploy to the cloud, and run from the dashboard. Each step takes just a few minutes, and you’ll have a production-ready scraper running before your coffee gets cold.

1. Create a Zyte Account and Project

Visit zyte.com and sign up for a free account using your email address. The registration process takes about two minutes and doesn’t require credit card information for the free tier. Once logged in, you’ll see the Scrapy Cloud dashboard where all your projects and spiders will be managed.

Now, create your first project by clicking “New Project” and giving it a descriptive name. The project acts as a container for related spiders – you might have separate projects for different clients or data sources. Each Scrapy Cloud project gets a unique identifier that you’ll use during deployment.

The dashboard provides an overview of all your projects, recent job activity, and resource usage. This becomes your central command center for monitoring and managing all scraping operations across different projects and team members.

2. Install shub and authenticate with API Key

The shub client is Scrapy Cloud’s command-line tool for deploying and managing spiders. Install it using pip:

pip install shub

This lightweight tool handles authentication, deployment, and basic project management from your terminal.

Get your API key from the Scrapy Cloud dashboard by clicking on your profile and selecting “API Keys”. Copy the key and run shub login in your terminal, then paste the key when prompted.

Zyte

This establishes the connection between your local development environment and your cloud projects. Advanced users can use the shub login API key method for programmatic authentication.

The authentication persists across terminal sessions, so you only need to log in once per machine. If you’re working across multiple environments or team members, each person needs their own API key for security and audit purposes. For more advanced HTTP operations, check out our Python Requests guide for handling authentication in your spiders.

3. Clone and Prepare the Booksbot Demo Spider

Start with a working example to understand the deployment process. Clone the sample repository using git clone https://github.com/zytedata/booksbot.git.

Then create a new Scrapy project: scrapy startproject booksbot. This generates the standard Scrapy directory structure with spiders, items, and settings files. You can store your script in a GitHub repository for version control and collaboration.

Here’s a simple spider that scrapes book information:

import scrapy

class BooksSpider(scrapy.Spider):
    name = 'books'
    start_urls = ['http://books.toscrape.com/']
    
    def parse(self, response):
        for book in response.css('article.product_pod'):
            yield {
                'title': book.css('h3 a::attr(title)').get(),
                'price': book.css('p.price_color::text').get(),
                'rating': book.css('p.star-rating::attr(class)').re_first(r'star-rating (\w+)'),
            }
        
        next_page = response.css('li.next a::attr(href)').get()
        if next_page:
            yield response.follow(next_page, self.parse)

This spider demonstrates pagination handling and structured data extraction – common patterns you’ll use in production scrapers.

If you encounter tricky configurations, Scrapy’s documentation and community offer detailed support for every stage of your deployment.

4. Deploy Your Spider Using shub deploy

Navigate to your project directory and run shub deploy [project_id]. The project ID comes from your Scrapy Cloud dashboard URL. The deployment process packages your spider code, uploads it to the cloud, and makes it available for execution.

You’ll see output showing the upload progress and deployment confirmation. The entire process takes 30-60 seconds, depending on your project size and internet connection. Once deployed, your spider appears in the Scrapy Cloud dashboard ready for execution.

The deployment creates a snapshot of your current code, so you can continue local development without affecting the cloud version. When you’re ready to update the production spider, simply run shub deploy again to create a new version.

Learn more about Web Scraping with Python for deeper insights into scraping logic and best practices.

5. Run Your Spider from the Dashboard

In the Scrapy Cloud dashboard, navigate to your project and click on your deployed spider. The “Run” button starts a new job with default settings. You can also specify custom arguments, priority levels, and resource allocation before starting the job.

The job begins immediately, and you can monitor progress in real-time. The dashboard shows current status, runtime duration, items scraped, and any errors encountered. Logs stream live, giving you the same visibility you’d have running locally but with better formatting and search capabilities.

Scheduling and Monitoring Your Spiders

Production scraping requires reliable automation and monitoring. Scrapy Cloud provides built-in scheduling that eliminates cron job complexity while offering detailed monitoring that surpasses most custom solutions. The platform handles timezone management, retry logic, and failure notifications automatically.

Using Periodic Jobs for Automation

Set up recurring jobs through the dashboard’s “Periodic Jobs” section. You can schedule spiders to run hourly, daily, weekly, or on custom intervals using cron-like syntax. The visual scheduler helps you avoid common cron mistakes while providing timezone-aware execution. This simplifies scheduling jobs without needing external task runners.

Each periodic job can include custom arguments, priority settings, and resource allocation. This means your daily full-site crawl can use more units than your hourly update check, optimizing both performance and costs. The scheduler automatically handles daylight saving time transitions and leap year adjustments.

Failed periodic jobs trigger automatic retries based on your configuration. You can set retry limits, backoff intervals, and failure notification thresholds to match your data freshness requirements and operational preferences.

Job Priority and Queue Management

The priority system ensures critical scraping jobs run before routine maintenance tasks. Set high priority for time-sensitive data extraction and normal priority for bulk operations. The queue management prevents resource contention while maintaining fair scheduling across different projects.

Priority levels range from lowest to highest, with the system processing higher priority jobs first when resources become available. This prevents a long-running bulk scraper from blocking urgent data updates that business operations depend on.

Accessing Logs and Runtime Stats

Real-time logs provide detailed insight into spider execution, including request/response cycles, item extraction, and error conditions. The log viewer includes search and filtering capabilities that make debugging much easier than parsing local log files.

Runtime statistics show memory usage, request rates, response times, and success ratios. These metrics help identify performance bottlenecks and optimization opportunities. Historical data lets you track performance trends and capacity planning over time.

Exporting Data in CSV, JSON, or XML

Scraped data exports in multiple formats directly from the dashboard. Click the download button on any completed job to get CSV, JSON, or XML files containing all extracted items. Large datasets can be downloaded in compressed formats to reduce transfer time.

The export system handles character encoding automatically and provides consistent formatting across different data types.

For automated data processing, the ScrapingBee API offers similar export capabilities with additional proxy management and anti-bot features for complex scraping scenarios.

Tips for Production-Ready Scraping

Moving from development to production requires attention to reliability, monitoring, and scale. The Scrapy cloud platform provides the infrastructure foundation, but your spider design and configuration determine long-term success. Focus on error handling, resource optimization, and monitoring integration to build systems that run reliably for months without intervention.

Using Proxies with Scrapy Cloud

Configure Zyte Smart Proxy Manager directly in your spider settings for automatic IP rotation and geo-targeting. The integration requires minimal code changes; just add the proxy middleware and authentication credentials. Smart Proxy Manager handles proxy selection, rotation timing, and failure recovery automatically.

For external proxy services, configure them through Scrapy’s standard proxy settings. The containerized environment supports most proxy authentication methods, including username/password and IP whitelisting. Test proxy configurations thoroughly in the free tier before scaling to production volumes.

Integrating Spidermon for Error Tracking

Spidermon provides automated monitoring and alerting for production spiders. Configure it to send notifications when error rates exceed thresholds, data quality drops, or jobs fail completely. The integration works seamlessly with Scrapy Cloud’s monitoring infrastructure.

Set up data validation rules that check extracted items for completeness and accuracy. Spidermon can automatically pause spiders when data quality issues are detected, preventing bad data from entering your downstream systems.

Scaling with Multiple Spiders and Units

Horizontal scaling involves running multiple spider instances simultaneously instead of trying to make individual spiders faster. The Scrapy Cloud platform simplifies this process, just increase your unit allocation and run multiple jobs concurrently.

Design spiders to handle overlapping URLs smoothly using Scrapy’s built-in duplicate filtering. This enables multiple instances to work on different parts of large sites without conflicts or duplicated data extraction. The system gives you full control over your crawlers, with the ability to retain performance data and optimize configurations over time.

Avoiding Common Deployment Pitfalls

Dependency mismatches cause the most deployment failures. Use requirements.txt files to specify exact package versions and test deployments in the free tier before moving to production. The containerized environment may have different package versions than your local development setup.

Keep your scrapy.cfg file updated with correct project settings and ensure all custom modules are included in your project directory. Missing imports that work locally often fail in the cloud environment due to different Python path configurations.

Build, Scale, and Monitor Your Scrapers with Confidence

Scrapy Cloud transforms web scraping from a local development activity into a production-ready operation. You’ve learned how to deploy spiders in minutes, set up automated scheduling, and monitor performance through professional dashboards. The managed infrastructure eliminates server maintenance while providing enterprise-grade reliability and scaling capabilities.

The platform’s integration with the broader Zyte ecosystem means your scraping operations can grow from simple data extraction to sophisticated AI-powered workflows. Whether you’re handling a few hundred pages or millions of URLs, the pay-as-you-go model ensures you only pay for resources you actually use.

For teams requiring even simpler scaling solutions with built-in proxy management and anti-bot handling, consider the ScrapingBee API as a complementary tool that handles the most challenging aspects of modern web scraping automatically.

Frequently Asked Questions (FAQs)

Is Scrapy Cloud suitable for beginners in web scraping?

Yes, if you know basic Scrapy development. The deployment process is straightforward, but you need existing Python and Scrapy knowledge to write effective spiders.

How does Scrapy Cloud compare to running spiders locally?

Scrapy Cloud provides managed infrastructure, automated scheduling, professional monitoring, and team collaboration features that local deployments lack. It eliminates server maintenance overhead.

What are the limitations of the Scrapy Cloud free plan?

Free accounts get one-hour job runtime limits, seven-day data retention, and single concurrent job execution. Professional plans remove these restrictions for production use.

Can I schedule automated runs for my spiders on Scrapy Cloud?

Yes, the platform includes visual scheduling tools for periodic jobs with timezone support, retry logic, and failure notifications built in.

How can I optimize my scrapers for production use on Scrapy Cloud?

Focus on error handling, proxy configuration, monitoring integration, and horizontal scaling. Use Spidermon for automated alerts and design spiders for concurrent execution.

Kevin Sahin

Kevin worked in the web scraping industry for 10 years before co-founding ScrapingBee. He is also the author of the Java Web Scraping Handbook.