Zyte Scrapy Cloud

The Manager for Your Robot Workers

What is Zyte?

Think of Zyte Scrapy Cloud as a manager for your robot workers.

When you write a web crawler (a "spider"), it's like building a robot that goes out to the internet to collect information for you. But running this robot on your own computer can be annoying: you have to keep your computer on, the internet might cut out, or your IP address might get blocked.

Zyte is a cloud platform where you can send your robots to live and work.

Offers

Free Forever Scrapy Cloud Unit

Zyte offers a generous free tier for developers:

Technical Examples for Developers

Here is how you actually use Zyte Scrapy Cloud with a standard Scrapy project.

1. Prerequisites

You will need the shub command-line tool.

pip install shub

2. Configuration (scrapinghub.yml)

In the root of your Scrapy project, you create a scrapinghub.yml file. This tells the deployment tool which project on Zyte you want to deploy to.

First, create a project in the Zyte Dashboard and get your Project ID (e.g., 12345).

File: scrapinghub.yml

projects:
  default: 12345

stacks:
  default: scrapy:2.11  # Specify the Scrapy environment version

requirements:
  file: requirements.txt  # List extra Python dependencies here

3. Deployment

Once your configuration is ready, you deploy your code to the cloud with a single command:

shub deploy

If it works, you will see output like:

Packing version 1.0
Deploying to Scrapy Cloud project "12345"
Run your spiders at: https://app.zyte.com/p/12345/

4. Running a Spider

You can run a spider from the web dashboard, or you can schedule it directly from your terminal using shub:

# shub schedule <spider_name>
shub schedule myspider

5. Example Spider Code

Your code doesn't need to change much to run on the cloud. Here is a simple standard Scrapy spider:

File: myproject/spiders/quotes.py

import scrapy

class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = [
        'http://quotes.toscrape.com/page/1/',
    ]

    def parse(self, response):
        for quote in response.css('div.quote'):
            yield {
                'text': quote.css('span.text::text').get(),
                'author': quote.css('small.author::text').get(),
            }

        next_page = response.css('li.next a::attr(href)').get()
        if next_page is not None:
            yield response.follow(next_page, callback=self.parse)

When this runs on Zyte Scrapy Cloud, all the yielded data (text, author) is automatically captured and stored in the database, ready for you to export as CSV or JSON.

Get Started