Airbnb web scraping with ScrapingBee: 2026 step-by-step guide

Q: Can I use this script for Booking.com or other travel sites?

Yes, the overall approach applies to other travel sites as well; you'll need to update the search URLs and adjust the parsing selectors to match each site's HTML, but the ScrapingBee setup, request logic, and defensive parsing patterns can be reused with minimal changes.

Q: How many Airbnb pages can I scrape safely?

There's no fixed number, so start slowly and monitor errors; for better stability, space out requests, use residential or stealth proxies when needed, avoid sudden bursts of traffic, and remember that scraping a few pages per minute is usually safer than pulling hundreds at once.

Q: Can I export the scraped Airbnb data to CSV or a database?

Absolutely—once listings are stored as Python dictionaries, exporting is straightforward, whether that's writing to JSON, converting to CSV, or inserting the data into a database like PostgreSQL or SQLite with just a few extra lines of code.

Ilya Krukowski | 11 January 2026 | 28 min read

Table of contents

Airbnb web scraping sounds scary at first, but it's actually pretty chill once you know what you're doing. In this guide, we'll walk through a real, working way to scrape Airbnb listings using ScrapingBee, without guessing, hacks, or magic steps.

This is a practical, code-first tutorial. We'll start from a real Airbnb search results page and show how to extract structured listing data you can actually use. Descriptions, prices, ratings, and all the usual stuff you care about. We'll use Python, keep the setup simple, and focus on getting clean JSON output at the end. The same approach can be reused for other Airbnb searches with minimal changes, so once you get it, you're set.

If you're a dev who wants working code instead of screenshots, you're in the right place!

Airbnb web scraping with ScrapingBee: 2026 step-by-step guide

Quick answer (TL;DR)

If you just want it working, run the full script below. It fetches an Airbnb search results page with ScrapingBee, waits for real listings to load, parses listing cards with BeautifulSoup, and saves the output to a JSON file.

To customize it, change the place_id, checkin, checkout, and adults values (you can add more values as well, for example, to add number of kids to the search). Also tweak the offset to manage pagination.

import os
import re
import json
from urllib.parse import urlencode

from bs4 import BeautifulSoup
from scrapingbee import ScrapingBeeClient


# --- Small helpers -----------------------------------------------------------

def extract_text_clean(tag):
    """Get visible text from a tag, removing aria-hidden junk."""
    if not tag:
        return None
    for hidden in tag.find_all(attrs={"aria-hidden": "true"}):
        hidden.decompose()
    return tag.get_text(" ", strip=True)


def save_to_json(listings, filename="airbnb_results.json"):
    """Save scraped results to a JSON file (UTF-8, pretty printed)."""
    with open(filename, "w", encoding="utf-8") as f:
        json.dump(listings, f, ensure_ascii=False, indent=2)


# --- Extraction logic --------------------------------------------------------

def extract_rating_and_reviews(container):
    """
    Airbnb often includes an accessibility string like:
    "4.99 out of 5 average rating, 154 reviews"
    """
    if not container:
        return (None, None)

    rating_span = container.find(
        "span",
        string=lambda s: isinstance(s, str) and "out of 5 average rating" in s,
    )
    if not rating_span:
        return (None, None)

    text = rating_span.get_text(" ", strip=True)

    rating_match = re.search(r"([0-9]+(?:\.[0-9]+)?)\s+out of 5", text)
    reviews_match = re.search(r"(\d[\d,]*)\s+reviews", text)

    rating = float(rating_match.group(1)) if rating_match else None
    reviews = int(reviews_match.group(1).replace(",", "")) if reviews_match else None
    return (rating, reviews)


_PRICE_LABEL_RE = re.compile(
    r"\bfor\s+\d+\s+nights?\b", re.IGNORECASE
)

def extract_price(container):
    """
    Tries to grab the most likely price summary from aria-labels.

    Assumes English-ish labels like:
      "€98 for 2 nights"
      "$106 for 2 nights, originally $124"
    """
    if not container:
        return None

    # 1) collect candidate aria-labels
    candidates = []
    for tag in container.find_all(["span", "div", "button"], attrs={"aria-label": True}):
        label = tag.get("aria-label")
        if not isinstance(label, str):
            continue
        label = label.replace("\xa0", " ").strip()
        if not label:
            continue

        if _PRICE_LABEL_RE.search(label):
            candidates.append(label)

    if not candidates:
        return None

    # 2) prefer the "clean" summary:
    # - shorter tends to be the main price, not some long combined accessibility blob
    # - but still keep discount strings when present (they're still valid price summaries)
    candidates.sort(key=len)
    return candidates[0]


# --- URL + parsing -----------------------------------------------------------

def build_airbnb_search_url(place_id, checkin, checkout, adults=2, offset=0):
    """Build an Airbnb search URL with date + guest params."""
    base_url = "https://www.airbnb.com/s/homes"
    params = {
        "refinement_paths[]": "/homes",
        "place_id": place_id,
        "checkin": checkin,
        "checkout": checkout,
        "adults": adults,
        "items_offset": offset,
    }
    return f"{base_url}?{urlencode(params)}"


def parse_listing_card(card):
    """
    Parse a single listing card (schema.org ListItem).
    We combine stable schema meta tags (name/url/position) with visible UI bits.
    """
    item = {
        "position": None,
        "name": None,
        "url": None,
        "card_title": None,
        "subtitles": [],
        "price": None,
        "rating": None,
        "reviews": None,
    }

    # Schema meta tags (usually the most stable)
    name_meta = card.find("meta", itemprop="name")
    url_meta = card.find("meta", itemprop="url")
    pos_meta = card.find("meta", itemprop="position")

    if pos_meta and pos_meta.get("content"):
        try:
            item["position"] = int(pos_meta["content"])
        except ValueError:
            item["position"] = None

    if name_meta and name_meta.get("content"):
        item["name"] = name_meta["content"]

    # You might want to check the `url_meta["content"]` to make sure it's not
    # an absolute URL (if it is, no need to add https://... prefix)
    if url_meta and url_meta.get("content"):
        item["url"] = "https://www.airbnb.com" + url_meta["content"]

    # The actual card UI container (testid selectors are somewhat stable, but not guaranteed forever)
    container = card.find("div", attrs={"data-testid": "card-container"})
    if not container:
        return item

    # Title + subtitles
    title_tag = container.find("div", attrs={"data-testid": "listing-card-title"})
    item["card_title"] = extract_text_clean(title_tag)

    subtitle_tags = container.find_all("div", attrs={"data-testid": "listing-card-subtitle"})
    seen = set()
    for tag in subtitle_tags:
        text = extract_text_clean(tag)
        if text and text not in seen:
            seen.add(text)
            item["subtitles"].append(text)

    # Price + rating/reviews
    item["price"] = extract_price(container)
    item["rating"], item["reviews"] = extract_rating_and_reviews(container)

    return item


def parse_listings(html):
    """Find all schema.org listing cards and parse them."""
    soup = BeautifulSoup(html, "lxml")

    cards = soup.find_all(
        "div",
        attrs={
            "itemprop": "itemListElement",
            "itemtype": re.compile(r"schema\.org/ListItem$", re.I),
        },
    )

    print("Found listings:", len(cards))
    return [parse_listing_card(card) for card in cards]


# --- Main -------------------------------------------------------------------

def main():
    api_key = os.environ.get("SCRAPINGBEE_API_KEY")
    if not api_key:
        raise RuntimeError("SCRAPINGBEE_API_KEY is not set")

    client = ScrapingBeeClient(api_key=api_key)

    url = build_airbnb_search_url(
        place_id="ChIJ7T0H5bDP7kYRMP7yaM3PAAQ",
        checkin="2026-01-25",
        checkout="2026-01-27",
        adults=2,
        offset=0,
    )

    print("Requesting URL:\n", url)

    response = client.get(
        url,
        params={
            # Airbnb is very anti-bot. "stealth_proxy" is doing the heavy lifting here.
            "stealth_proxy": True,
            "render_js": True,
            "wait_browser": "load",
            "wait_for": 'div[itemprop="itemListElement"]',
            "window_width": 1365,
            "window_height": 900,
        },
    )

    # Fail fast if anything smells off
    response.raise_for_status()

    html = response.text
    print("Status code:", response.status_code)
    print("Response size:", len(response.content), "bytes")

    # Cheap sanity checks: are the elements we rely on present?
    print("itemListElement count:", html.count('itemprop="itemListElement"'))
    print("card-container count:", html.count('data-testid="card-container"'))
    print("listing-card-title count:", html.count('data-testid="listing-card-title"'))
    print("listing-card-subtitle count:", html.count('data-testid="listing-card-subtitle"'))

    listings = parse_listings(html)
    print("Parsed listings:", len(listings))

    # Save results for later processing / debugging
    save_to_json(listings, filename="airbnb_results.json")
    print("Saved to:", "airbnb_results.json")

    # Print a small sample so your terminal doesn't explode
    for item in listings[:3]:
        print(item)


if __name__ == "__main__":
    main()

Now let's dive deeper into all the dirty details.

Check out ScrapingBee's dedicated Airbnb Scraper API that does all the heavy lifting for you.

Choosing the right tools for Airbnb scraping in 2026

If you want Airbnb web scraping to work reliably in 2026, you need a stack that can handle dynamic pages, frequent layout changes, and aggressive anti-bot behavior. The good news is you don't need anything exotic.

We'll keep things pretty simple. The setup in this guide uses Python 3, ScrapingBee's Python client, and classic HTML parsing tools. This combo works well for scraping live Airbnb search pages and stays readable when you come back to it months later.

Here's what we'll use:

Python 3.12 or newer
ScrapingBee Python client
BeautifulSoup for parsing HTML
lxml for fast and reliable parsing

This setup works on macOS, Linux, and Windows. No OS-specific hacks needed.

Python 3.12+ and environment setup

First, make sure you're running a recent Python version. Airbnb pages are heavy, and newer Python releases typically give better performance and tooling.

Check your version:

python --version

or on some systems:

python3 --version

If you're below 3.12, install the latest version from python.org or your system package manager.

Next, we'll create a new project using uv. It's fast and simple:

uv init airbnb-scraper
cd airbnb-scraper

That's it. You now have an isolated environment ready for scraping work (well, technically a venv will appear after you install a package or run a script). If you've built scrapers for travel sites before, this setup will feel familiar. If not, it still stays out of your way.

Learn about booking.com Scraper API we offer.

Installing scrapingbee, beautifulsoup4, and lxml

Now let's install the libraries we'll actually use in the Airbnb scraper.

uv add scrapingbee beautifulsoup4 lxml

Here's what each one does:

scrapingbee fetches Airbnb pages using residential proxies and executes JS
beautifulsoup4 lets us select and extract elements from the HTML
lxml makes parsing faster and more reliable, especially on large pages

Common install issues are rare, but they can happen:

On Windows, you may need Visual C++ build tools if lxml fails to install
On Linux, make sure libxml2 and libxslt are installed before retrying

Once these are in place, installs usually work fine.

Use ScrapingBee for scraping Airbnb data

Scraping Airbnb directly with raw HTTP requests is painful. You deal with blocked IPs, missing headers, and JavaScript-heavy pages. ScrapingBee removes most of that low-level work.

With ScrapingBee, you get:

Premium proxies, plus an optional stealth (residential) proxy pool for harder sites
Proper browser headers and fingerprints
Optional JavaScript rendering (in fact, we will need it)
Ability to wait for page load or specific render conditions

That means you can focus on selecting listing elements and saving structured data instead of fighting the network layer.

To use it, you'll need to register for a ScrapingBee account and grab your API key from the dashboard. No credit card is required, and you get 1,000 free scraping credits, which is more than enough for testing Airbnb search results.

ScrapingBee also offers Google Hotels Scraper API.

Setting up ScrapingBee API for Airbnb listings

Now that the environment is ready, let's make the first real request. The goal here is simple: fetch an Airbnb search results page and confirm we get valid HTML back. No parsing yet, no data extraction. Just a clean request that works.

We'll use a real Airbnb search URL. For example, a search for places to stay in Riga in January 2026 for two adults. (Honestly, winter in Riga is not particularly cozy, so if you're actually planning a trip, summer might be the wiser life choice.)

Our URL might look like this:

https://www.airbnb.com/s/Riga/homes?refinement_paths[]=%2Fhomes&place_id=ChIJ7T0H5bDP7kYRMP7yaM3PAAQ&checkin=2026-01-25&checkout=2026-01-27&adults=2&acp_id=example

You may notice extra parameters like acp_id. Airbnb adds these for tracking and experiments. They're not required for scraping and we'll probably ditch them later. For now, it's useful to know they exist and that you don't need to panic about every query param you see.

We'll fetch this page using ScrapingBee so we don't have to deal with blocked IPs or JavaScript quirks ourselves.

Storing your ScrapingBee API key

First rule of Scraping Club: you do not hardcode your API key. Don't commit it to Git. Don't paste it into blog posts. Don't casually drop it into Stack Overflow answers. Treat it like a password. The proper way is to store it in an environment variable called SCRAPINGBEE_API_KEY. You can also load it from a .env file (for local development), or inject it as a secret via your CI/CD or deployment platform.

On macOS or Linux:

export SCRAPINGBEE_API_KEY="your_api_key_here"

On Windows (PowerShell):

setx SCRAPINGBEE_API_KEY "your_api_key_here"

In your code, you'll read it from the environment. If the variable is missing, the script should fail early instead of silently doing the wrong thing.

Related reference if you want a dedicated API later: Airbnb scraper API.

Using proxies, rendering JS, and waiting for browser

Airbnb pages are dynamic and pretty aggressive when it comes to blocking bots, so the request parameters matter a lot here. We'll use four ScrapingBee options that work well together for search result pages.

Proxies

You'll notice two proxy-related options in ScrapingBee: premium_proxy and stealth_proxy.

My recommendation: start with premium_proxy first. It's cheaper and works fine for many sites. That said, during my own tests with Airbnb, I consistently got those infamous grey skeleton pages: the page loaded, but no real data ever appeared. The only thing that reliably fixed this for me was switching to stealth_proxy=true.

Under the hood, stealth proxy uses a newer pool of residential IPs designed specifically for very aggressive anti-bot sites (Airbnb definitely qualifies). It costs more credits and requires JavaScript rendering to be enabled, but in practice it's the difference between "this worked once" and "this works every time."

Also worth noting: stealth_proxy is still a beta feature and has certain limitations, so treat it as a powerful tool rather than a default setting; use it when premium proxies just aren't enough.

Rendering JS

render_js=true tells ScrapingBee to load the page in a real browser instead of just downloading raw HTML.

Airbnb is a heavily JavaScript-driven site: the initial HTML response mostly contains placeholders, and the actual listings are fetched and rendered dynamically after the page loads. If JavaScript rendering is disabled, you'll often see incomplete markup, grey "skeleton" cards, or no listings at all, even though the page looks fine in a normal browser.

In short: if you're scraping Airbnb search results, JavaScript rendering isn't optional, it's required.

Waiting for the page to actually load

wait_browser controls when ScrapingBee decides the page is "done loading" from the browser's point of view. It doesn't sleep for a fixed number of seconds; instead, it waits for a specific browser event.

Supported values are:

domcontentloaded — waits until the initial HTML is parsed
load — waits until all page resources finish loading
networkidle0 — waits until there are no active network requests
networkidle2 — waits until there are at most two active network requests

In theory, waiting for zero network activity sounds great, but in practice it's often a bad idea. Pages like Airbnb keep firing background requests for analytics, tracking, and experiments, which means the browser may never reach a true "no requests" state.

In my own experiments, using load actually turned out to be the most reliable option. It waits until the browser finishes loading the page and its resources, without getting stuck waiting for background noise that never really stops. Unfortunately, I can't guarantee this approach will work in a year so you might need to do some more experimenting.

It's also worth avoiding fixed delays like "wait 2 seconds" or "sleep 5 seconds". Page load times vary depending on network conditions, proxy speed, and how much data Airbnb decides to fetch. Sometimes 2 seconds isn't enough, and other times 10 seconds is just wasted time. Event-based waiting (load, networkidle*, or wait_for) is far more predictable than guessing how long the page might take to load.

If you want to be extra safe, wait_for is your final fallback. Instead of guessing timing, it waits for a specific element to appear in the DOM — either via a CSS selector or XPath.

For example:

"wait_for": 'div[itemprop="itemListElement"]'

Sending GET requests to Airbnb search pages

Let's send a first request and just check that it works. Here's a minimal Python example that fetches a search page and prints the HTTP status code:

import os
from scrapingbee import ScrapingBeeClient

client = ScrapingBeeClient(api_key=os.environ["SCRAPINGBEE_API_KEY"])

url = "https://www.airbnb.com/s/Riga/homes?refinement_paths[]=%2Fhomes&place_id=ChIJ7T0H5bDP7kYRMP7yaM3PAAQ&checkin=2026-01-25&checkout=2026-01-27&adults=2"

response = client.get(
    url,
    params={
        "stealth_proxy": True,
        "render_js": True,
        "wait_browser": "load",
        "wait_for": 'div[itemprop="itemListElement"]',
        "window_width": 1365,
        "window_height": 900,
    },
)

print(response.status_code)

If you see 200, you're good. That means you successfully fetched the Airbnb search page.

To keep things clean, it helps to build search URLs using a small helper function. That way you can change dates, cities, or pagination without rewriting strings.

from urllib.parse import urlencode

def build_airbnb_search_url(place_id, checkin, checkout, adults=2, offset=0):
    base_url = "https://www.airbnb.com/s/homes"
    params = {
        "refinement_paths[]": "/homes",
        "place_id": place_id,
        "checkin": checkin,
        "checkout": checkout,
        "adults": adults,
        "items_offset": offset,
    }
    return f"{base_url}?{urlencode(params)}"

The items_offset parameter controls pagination. Offset is basically "skip N listings"; common values are 18/20/24 depending on layout.

In this guide, we focus only on public search result pages. No logins, no private data, no user-specific pages. That keeps things simpler and much more stable.

So, here's the full code so far:

import os
from urllib.parse import urlencode
from scrapingbee import ScrapingBeeClient


def build_airbnb_search_url(place_id, checkin, checkout, adults=2, offset=0):
    base_url = "https://www.airbnb.com/s/homes"
    params = {
        "refinement_paths[]": "/homes",
        "place_id": place_id,
        "checkin": checkin,
        "checkout": checkout,
        "adults": adults,
        "items_offset": offset,
    }
    return f"{base_url}?{urlencode(params)}"


def main():
    api_key = os.environ.get("SCRAPINGBEE_API_KEY")
    if not api_key:
        raise RuntimeError("SCRAPINGBEE_API_KEY is not set")

    client = ScrapingBeeClient(api_key=api_key)

    url = build_airbnb_search_url(
        place_id="ChIJ7T0H5bDP7kYRMP7yaM3PAAQ",
        checkin="2026-01-25",
        checkout="2026-01-27",
        adults=2,
        offset=0,
    )

    print("Requesting URL:")
    print(url)

    try:
        response = client.get(
            url,
            params={
                "stealth_proxy": True,
                "render_js": True,
                "wait_browser": "load",
                "wait_for": 'div[itemprop="itemListElement"]',
                "window_width": 1365,
                "window_height": 900,
            },
        )

        print("Status code:", response.status_code)
        print("Response size:", len(response.content), "bytes")

        if response.status_code != 200:
            print("Non-200 response received")
            print(response.text[:500])

    except Exception as e:
        print("Unexpected error occurred")
        print(str(e))


if __name__ == "__main__":
    main()

Parsing Airbnb HTML with BeautifulSoup

At this point, we're getting real HTML back from Airbnb. Now we need to turn that big blob of markup into structured data we can actually use. We'll employ BeautifulSoup for this. The idea is simple: find each listing card, extract a few fields from it, and store everything in a clean Python dictionary. All parsing logic stays in one place so it's easy to debug and update when Airbnb changes something.

Here's how a typical "listing card" looks on Airbnb:

A typical listing card on Airbnb ready to be scraped

Before writing code, it helps to open the Airbnb search page in your browser, right-click a listing, and inspect the HTML. That's how you discover stable hooks like itemprop and data-testid.

Locating listing containers

Each listing card lives inside a container that looks like this:

<div itemprop="itemListElement" itemtype="http://schema.org/ListItem">

This is a great starting point because it's part of schema.org markup and tends to be more stable than random class names.

Meta tags provided for every listing

In code, we grab all listing containers like this:

from bs4 import BeautifulSoup
import re

def parse_listings(html):
    """Find all schema.org listing cards and parse them."""
    soup = BeautifulSoup(html, "lxml")

    cards = soup.find_all(
        "div",
        attrs={
            "itemprop": "itemListElement",
            # schema.org URL might be changed in future
            # so let's use a regexp
            "itemtype": re.compile(r"schema\.org/ListItem$", re.I),
        },
    )

    print("Found listings:", len(cards))
    return [parse_listing_card(card) for card in cards]

Always print the number of cards you find. If it suddenly drops to zero, you know Airbnb changed something and you need to recheck selectors.

Extracting title, description, and beds

Each search result on Airbnb is wrapped in an itemListElement, which conveniently contains schema.org meta tags alongside the actual UI markup. We take advantage of this by extracting the listing's marketing title and URL from meta tags first — these are intended for search engines and tend to be more stable than visual selectors.

Inside the same listing card, there's a data-testid="card-container" block. This is where most of the visible, human-readable information lives.

From there, we extract:

the marketing title from schema meta tags (itemprop="name")
the card title shown in the UI (listing-card-title)
subtitles such as bedrooms, beds, and cancellation info

Airbnb frequently duplicates text for accessibility purposes, so the same content may appear multiple times in the DOM. To avoid messy or repeated output, we strip out elements marked with aria-hidden="true" before reading the text.

def extract_text_clean(tag):
    """Get visible text from a tag, removing aria-hidden junk."""
    if not tag:
        return None
    for hidden in tag.find_all(attrs={"aria-hidden": "true"}):
        hidden.decompose()
    return tag.get_text(" ", strip=True)

Now we can parse text fields defensively:

def parse_listing_card(card):
    data = {}

    name_meta = card.find("meta", itemprop="name")
    url_meta = card.find("meta", itemprop="url")

    data["name"] = name_meta["content"] if name_meta else None
    data["url"] = (
        "https://www.airbnb.com" + url_meta["content"]
        if url_meta
        else None
    )

    container = card.find("div", attrs={"data-testid": "card-container"})
    if not container:
        return data

    title_tag = container.find(
        "div", attrs={"data-testid": "listing-card-title"}
    )
    data["card_title"] = extract_text_clean(title_tag)

    subtitle_tags = container.find_all(
        "div", attrs={"data-testid": "listing-card-subtitle"}
    )
    subtitles = []
    for tag in subtitle_tags:
        text = extract_text_clean(tag)
        if text and text not in subtitles:
            subtitles.append(text)

    data["subtitles"] = subtitles
    return data

If something is missing, we return None instead of crashing. This makes the scraper more robust.

Extracting price, rating, and date range

Price and ratings are the two fields that usually break first when you scrape Airbnb, mostly because the markup is deeply nested and full of duplicated accessibility text. The trick is to grab the most "meaningful" text Airbnb already exposes for humans and screen readers.

Price

Pricing on Airbnb search cards is a bit chaotic in the HTML, but the good news is that Airbnb exposes a clean, human-readable summary via accessibility attributes.

Instead of relying on brittle layout-specific elements, we look for an aria-label inside the listing card that mentions nights. These labels usually contain exactly what we want, for example:

€ 98 for 2 nights
€ 106 for 2 nights, originally € 125

That single string already includes the total price and any discount information, normalized and ready to use. We don't need to reconstruct anything from scattered DOM elements.

import re


_PRICE_LABEL_RE = re.compile(
    r"\bfor\s+\d+\s+nights?\b", re.IGNORECASE
)

def extract_price(container):
    """
    Tries to grab the most likely price summary from aria-labels.

    Assumes English-ish labels like:
      "€98 for 2 nights"
      "$106 for 2 nights, originally $124"
    """
    if not container:
        return None

    # 1) collect candidate aria-labels
    candidates = []
    for tag in container.find_all(["span", "div", "button"], attrs={"aria-label": True}):
        label = tag.get("aria-label")
        if not isinstance(label, str):
            continue
        label = label.replace("\xa0", " ").strip()
        if not label:
            continue

        if _PRICE_LABEL_RE.search(label):
            candidates.append(label)

    if not candidates:
        return None

    # 2) prefer the "clean" summary:
    candidates.sort(key=len)
    return candidates[0]

If a listing doesn't show a price (for example, if Airbnb wants you to adjust dates or guest count), the function simply returns None. That's expected, and we just skip it and move on to the next listing.

Rating and reviews

Ratings don't have nice, stable attributes on Airbnb search cards. CSS classes change all the time, and relying on them is a fast way to break your scraper. What is fairly reliable is the accessibility text Airbnb includes for screen readers. Many listings contain a sentence like:

4.99 out of 5 average rating, 154 reviews

This text usually lives inside a span, so instead of chasing classes, we search for a span that contains the phrase out of 5 average rating and then extract the numbers using regular expressions.

This logic assumes Airbnb is serving English (en) accessibility text. If the page is rendered in another language, these strings might not exist. In this case you'll either need to update the scraping logic or enforce the locale via a GET param, for example locale=en-US.

import re

def extract_rating_and_reviews(container):
    """
    Airbnb often includes an accessibility string like:
    "4.99 out of 5 average rating, 154 reviews"
    """
    if not container:
        return (None, None)

    rating_span = container.find(
        "span",
        string=lambda s: isinstance(s, str) and "out of 5 average rating" in s,
    )
    if not rating_span:
        return (None, None)

    text = rating_span.get_text(" ", strip=True)

    rating_match = re.search(r"([0-9]+(?:\.[0-9]+)?)\s+out of 5", text)
    reviews_match = re.search(r"(\d[\d,]*)\s+reviews", text)

    rating = float(rating_match.group(1)) if rating_match else None
    reviews = int(reviews_match.group(1).replace(",", "")) if reviews_match else None
    return (rating, reviews)

If a listing doesn't have any ratings yet (common for new listings), the function simply returns (None, None). That's expected behavior, nothing to worry about.

Building structured listing dictionaries

Now we just wire everything together. One listing card in, one clean Python dictionary out. Later on, we simply collect all of these dictionaries into a list. The idea here is to centralize all parsing logic in one place. We first grab the most stable data from schema meta tags (name, URL, position), then layer in the visible UI details like titles, subtitles, price, and ratings.

This approach is intentionally defensive:

every field is optional
missing elements don't crash the scraper
new or unrated listings are handled gracefully

If Airbnb changes something, you only need to fix this one function instead of hunting through the entire codebase.

Here's an implementation using the helper functions we built earlier:

def parse_listing_card(card):
    """
    Parse a single listing card (schema.org ListItem).
    We combine stable schema meta tags (name/url/position) with visible UI bits.
    """
    item = {
        "position": None,
        "name": None,
        "url": None,
        "card_title": None,
        "subtitles": [],
        "price": None,
        "rating": None,
        "reviews": None,
    }

    # Schema meta tags (usually the most stable)
    name_meta = card.find("meta", itemprop="name")
    url_meta = card.find("meta", itemprop="url")
    pos_meta = card.find("meta", itemprop="position")

    if pos_meta and pos_meta.get("content"):
        try:
            item["position"] = int(pos_meta["content"])
        except ValueError:
            item["position"] = None

    if name_meta and name_meta.get("content"):
        item["name"] = name_meta["content"]

    if url_meta and url_meta.get("content"):
        item["url"] = "https://www.airbnb.com" + url_meta["content"]

    # The actual card UI container (testid selectors are somewhat stable, but not guaranteed forever)
    container = card.find("div", attrs={"data-testid": "card-container"})
    if not container:
        return item

    # Title + subtitles
    title_tag = container.find("div", attrs={"data-testid": "listing-card-title"})
    item["card_title"] = extract_text_clean(title_tag)

    subtitle_tags = container.find_all("div", attrs={"data-testid": "listing-card-subtitle"})
    seen = set()
    for tag in subtitle_tags:
        text = extract_text_clean(tag)
        if text and text not in seen:
            seen.add(text)
            item["subtitles"].append(text)

    # Price + rating/reviews
    item["price"] = extract_price(container)
    item["rating"], item["reviews"] = extract_rating_and_reviews(container)

    return item

And here's the loop that builds the final list:

from bs4 import BeautifulSoup
import re

def parse_listings(html):
    """Find all schema.org listing cards and parse them."""
    soup = BeautifulSoup(html, "lxml")

    cards = soup.find_all(
        "div",
        attrs={
            "itemprop": "itemListElement",
            "itemtype": re.compile(r"schema\.org/ListItem$", re.I),
        },
    )

    print("Found listings:", len(cards))
    return [parse_listing_card(card) for card in cards]

The output is a list of dictionaries with predictable keys. That's exactly what you want if you plan to export to JSON, CSV, or a database.

If you need extra fields later, just add them inside parse_listing_card. Keep keys short and consistent, and you'll thank yourself later.

Saving and exporting Airbnb data

Once you can parse listings into a list of dictionaries, saving the data is the easy part. In this guide we'll export to a JSON file called airbnb_results.json. It's a good default because it keeps the full structure, works with any language, and is simple to load later for analysis.

We'll also keep the export step separate from scraping and parsing. That way you can rerun the script with a new city or date range and write new files like airbnb_riga_2026_01_25.json without touching the parser. If you want to go further, you can export to CSV too. JSON is just the clean starting point.

Related API for another rentals site: apartments.com Scraper API.

Writing scraped data to a JSON file

Here's a small helper that saves your parsed listings to a JSON file.

import json

def save_to_json(listings, filename="airbnb_results.json"):
    """Save scraped results to a JSON file (UTF-8, pretty printed)."""
    with open(filename, "w", encoding="utf-8") as f:
        json.dump(listings, f, ensure_ascii=False, indent=2)

A couple of details here are worth calling out:

ensure_ascii=False keeps non-English characters readable instead of turning them into ugly \uXXXX escape sequences
indent=2 pretty-prints the JSON so it's easy to scan and debug in a code editor

After the scrape finishes, open airbnb_results.json and search for a listing title you recognize. That's usually the fastest way to sanity-check that everything worked as expected.

If you rerun the script with different search params, save to a new file each time:

save_to_json(listings, filename="airbnb_riga_2026-01-25_2026-01-27.json")

Ensuring UTF-8 encoding and readable formatting

Airbnb listings often include international characters in titles, neighborhoods, and descriptions, so UTF-8 is the safest default.

If you forget about encoding, a few annoying things can happen:

errors on Windows when writing certain characters
strange or broken symbols in your output file
escaped Unicode sequences if you also forget ensure_ascii=False

The fix is simple: always open files with UTF-8 when reading and writing.

with open("airbnb_results.json", "w", encoding="utf-8") as f:
    ...

Same rule applies when reading saved HTML or JSON back from disk:

with open("airbnb_results.json", "r", encoding="utf-8") as f:
    data = f.read()

Sample JSON output structure for listings

Here's what the output can look like. This matches the structure we build in the script.

[
  {
    "position": 1,
    "name": "Apartment un OLD town",
    "url": "https://www.airbnb.com/rooms/12197756?adults=2&check_in=2026-01-25&check_out=2026-01-27&search_mode=regular_search&source_impression_id=p3_1767197962_P3Nzmz-AGrRuN0WS&previous_page_section_name=1000",
    "card_title": "Apartment in Riga",
    "subtitles": [
      "Apartment un OLD town",
      "1 bedroom 2 beds",
      "Free cancellation"
    ],
    "price": "$106 for 2 nights, originally $124",
    "rating": 4.8,
    "reviews": 643
  },
  {
    "position": 2,
    "name": "Charming Loft in Riga's Historic Old Town",
    "url": "https://www.airbnb.com/rooms/1168680674742785971?adults=2&check_in=2026-01-25&check_out=2026-01-27&search_mode=regular_search&source_impression_id=p3_1767197962_P3Kn6NGQ9wC37oOo&previous_page_section_name=1000",
    "card_title": "Apartment in Riga",
    "subtitles": [
      "Charming Loft in Riga's Historic Old Town",
      "1 bedroom 1 queen bed",
      "Free cancellation"
    ],
    "price": "$90 for 2 nights",
    "rating": 4.83,
    "reviews": 116
  },
]

Here's what one scraped result looks like in JSON. At this point, we've turned a messy, JavaScript-heavy page into something clean and structured.

Quick rundown of what each field means:

position — the listing's rank in the search results
name — the marketing title, pulled from schema meta tags
url — the listing URL you can open later or crawl in more detail
card_title — the shorter UI title shown on the search card
subtitles — extra card details like bedrooms, beds, and cancellation info
price — the human-readable price summary (often includes discounts)
rating and reviews — may be missing for brand-new listings

This structure maps cleanly to a database table or an analytics pipeline. You can load it into pandas, insert it into PostgreSQL, or ship it straight into whatever BI tool you prefer.

Related API for another travel platform: Tripadvisor Scraper API.

Handling website changes and anti-scraping measures

Airbnb will change things. Not because of you, just because that's how modern frontends work. New experiments, new layouts, new loading flows. The goal isn't to make your scraper "unbreakable", it's to make it easy to fix when something shifts.

This section covers what usually breaks first, how to spot it quickly, and how ScrapingBee helps reduce the boring parts like IP blocks and captchas.

Related solution if you've dealt with similar issues elsewhere: Google Flights Scraper API.

Dealing with dynamic class names and DOM changes

Airbnb relies heavily on auto-generated class names like lxq01kf or atm_mk_h2mmj6. These classes are not designed to be stable and can change at any time, often without any visible changes to the page.

That's why the parser in this guide avoids class selectors altogether and instead leans on signals that tend to be more reliable:

data-testid attributes for major UI blocks
schema.org attributes like itemprop for structured data
accessibility text such as aria-label and screen-reader strings

When something eventually breaks (and it will), the fix is usually mechanical:

Save the rendered HTML to a file again.
Open it in a browser.
Inspect a single listing card.
Check which attributes still exist and which ones changed.
Update the selector in the relevant helper function.

If you're unsure whether the page actually rendered correctly, ScrapingBee can also capture a page screenshot (set the screenshot request param to True and simply save the response content in an image file). This is especially useful when debugging skeleton pages, cookie walls, or partially rendered content: you can see exactly what the browser saw at scrape time. Note that in this case ScrapingBee will literally return an image, not HTML.

This is also why all parsing logic lives in small, focused functions. If price extraction stops working, you update extract_price(). If ratings disappear, you look at extract_rating_and_reviews(). You're never hunting through a giant, monolithic script.

Adding a bit of lightweight logging helps too. Printing how many cards were found or how many prices were parsed makes problems obvious immediately, without having to stare at raw HTML dumps.

Using ScrapingBee to avoid IP blocks

If you try to scrape Airbnb with plain requests from your own IP, you'll hit a wall pretty quickly. Blocks, empty pages, weird partial responses — the usual fun. ScrapingBee takes care of most of that pain for you.

What it gives you out of the box:

premium proxy support (a good default starting point)
realistic browser headers and fingerprints
full JavaScript rendering
event-based waiting instead of guesswork

In practice, premium_proxy=true is the first thing you should try. It works for many sites and costs fewer credits. That said, Airbnb is especially aggressive. In my own tests, premium proxies often resulted in skeleton pages with no real data. Switching to stealth_proxy=true was the only reliable fix in those cases. It costs more, but it actually works.

For larger or repeated scraping jobs, stability matters more than raw speed. A few practical tips:

start with premium_proxy, but don't hesitate to upgrade to stealth_proxy if you see empty or partial pages
avoid firing dozens of requests per second
add small delays between pages when paginating
batch jobs instead of scraping everything in one burst

Even a simple 1–2 second pause between requests can dramatically reduce errors and retries. And, as always, make sure you're respecting Airbnb's terms of service and any applicable local laws when collecting data.

Monitoring for structural updates on Airbnb

The easiest way to deal with breaking changes is to catch them early, before they quietly poison your data.

A simple maintenance routine goes a long way:

run a small test scrape once a week
log the number of listings found
log how many listings have prices and titles
alert if any of those counts suddenly drop to zero (or close to it)

You can also add basic unit tests around your parsing functions. Feed them a saved HTML file and assert that at least one title, price, or rating is extracted. This lets you debug parser changes without hitting Airbnb at all.

Keeping a few sample HTML files around is extremely useful. When Airbnb changes something, you can iterate on selectors locally, fix the parser, and only then go back to live scraping. In practice, this is usually enough to keep an Airbnb scraper running smoothly, without constant firefighting or surprise breakages.

Start scraping Airbnb data with ScrapingBee

At this point, you've got everything you need, and you've also seen why the tooling matters. In this tutorial alone, we hit multiple cases where plain HTTP requests simply don't work. Without proper proxies and real JavaScript rendering, you're not scraping Airbnb, you're scraping placeholders.

ScrapingBee solves that problem. With browser-based rendering and the right proxy setup, we were able to load real search pages, wait for listings to appear, and reliably extract data that's actually useful. You can take the script from this guide, swap in your own city, dates, and guest count, and start pulling listings in minutes. From there, exporting to JSON, CSV, or pushing the data into a database is just a few extra lines of code.

If you haven't already, sign up for ScrapingBee, grab your API key from the dashboard, and run the script end to end. You get free credits to test the full workflow: more than enough to confirm that your setup works before thinking about scale.

Once this works for Airbnb search pages, the same pattern applies to other travel sites too. Change the URL, adjust a few selectors, and reuse the same fetching and parsing logic. Keep it simple. Keep it testable. And use the right tools, because scraping modern, JavaScript-heavy sites without them is just pain.

Conclusion

You now have a realistic, solid setup for scraping Airbnb search results. You've seen how to fetch pages reliably, deal with JavaScript-heavy rendering, parse listing cards, and turn messy HTML into structured data you can actually use. This approach won't prevent Airbnb from changing things, but it does make those changes manageable. When something breaks, you know exactly where to look and how to fix it, instead of starting from scratch.

Before you go, here are a couple of related reads you might find useful:

Frequently asked questions (FAQs)

Is Airbnb web scraping legal?

Whether scraping Airbnb is legal depends on what data you collect and how you use it. Public listing information is generally safer than private or user-specific data. Always review Airbnb's terms of service, respect robots.txt where applicable, and make sure you're complying with local laws and regulations.

Can I use this script for Booking.com or other travel sites?

Yes — the overall approach applies to other travel sites as well. You'll need to adjust the search URLs and update the parsing selectors to match each site's HTML, but the ScrapingBee setup, request logic, and defensive parsing patterns can be reused with minimal changes.

How many Airbnb pages can I scrape safely?

There's no hard limit. Start slow and watch your error rates. For better stability, space out requests, use residential or stealth proxies when needed, and avoid sudden bursts of traffic. Scraping a few pages per minute is usually much safer than trying to pull hundreds at once.

Can I export the scraped Airbnb data to CSV or a database?

Absolutely. Once listings are stored as Python dictionaries, exporting is easy. You can write them to JSON, convert them to CSV, or insert them into a database like PostgreSQL or SQLite with just a few extra lines of code.

Ilya Krukowski

Ilya is an IT tutor and author, web developer, and ex-Microsoft/Cisco specialist. His primary programming languages are Ruby, JavaScript, Python, and Elixir. He enjoys coding, teaching people and learning new things. In his free time he writes educational posts, participates in OpenSource projects, tweets, goes in for sports and plays music.