Publicly available data from platforms like Zillow and Indeed drives critical business decisions—from real estate investment strategies to hiring trends and salary benchmarks. While these sites don’t offer free, comprehensive APIs, their data can still be accessed through ethical, technically sound scraping methods.
Why these datasets matter in 2026
Real estate data—including property prices, rental rates, square footage, school ratings, and neighborhood trends—powers investment decisions and market analysis. Job market data reveals in-demand skills, hiring trends by city and role, and compensation benchmarks across industries. Both datasets are publicly accessible but often hidden behind anti-bot protections and complex JavaScript rendering.
For entrepreneurs, researchers, or data professionals building models, understanding how to collect this data responsibly can unlock valuable insights without crossing legal or ethical lines. The key is to use production-grade scraping pipelines that respect rate limits and terms of service.
Comparing real estate data sources: from scrapers to APIs
When sourcing real estate data, you have several options, each with trade-offs in cost, coverage, and legal risk:
- Zillow (scraping): Offers rich data and broad coverage but employs aggressive anti-bot measures and relies heavily on JavaScript rendering.
- Redfin (scraping or CSV export): Provides cleaner HTML and allows direct CSV downloads in the U.S., simplifying data collection.
- Realtor.com (scraping): Delivers solid search results but faces moderate anti-bot protection.
- Government sources (direct download): Free, accurate, and legal, though often outdated and lacking in detail.
- ATTOM Data / CoreLogic (paid APIs): Comprehensive, legal, and reliable, ideal for commercial use but expensive.
For learning or small-scale projects, scraping Zillow or Redfin can be instructive. For production systems or commercial applications, licensed data providers are recommended to avoid legal exposure.
Building a Zillow scraper with Playwright: step-by-step
Zillow’s search pages are fully rendered in JavaScript, making traditional tools like BeautifulSoup ineffective. Instead, you need a browser automation tool like Playwright to execute JavaScript and extract data dynamically.
First, install Playwright and the stealth plugin to mimic human behavior and reduce detection:
pip install playwright playwright-stealth pandas
playwright install chromiumNext, create a Python script that handles human-like delays, gradual scrolling to trigger lazy-loaded content, and structured data extraction. The script below uses asynchronous programming for efficiency and includes stealth techniques to avoid bot detection:
import asyncio
import random
import time
import pandas as pd
from playwright.async_api import async_playwright
from playwright_stealth import stealth_async
async def human_delay(min_s=1.5, max_s=4.0):
await asyncio.sleep(random.uniform(min_s, max_s))
async def slow_scroll(page, steps=5):
for _ in range(steps):
scroll_amount = random.randint(300, 700)
await page.evaluate(f"window.scrollBy(0, {scroll_amount})")
await asyncio.sleep(random.uniform(0.4, 1.0))
async def scrape_zillow_listings(search_query: str, max_pages: int = 3):
all_listings = []
encoded = search_query.replace(" ", "-").lower()
base_url = f"
async with async_playwright() as p:
browser = await p.chromium.launch(
headless=True,
args=["--disable-blink-features=AutomationControlled", "--no-sandbox", "--disable-dev-shm-usage"]
)
context = await browser.new_context(
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
viewport={"width": 1440, "height": 900},
locale="en-US",
timezone_id="America/Chicago"
)
page = await context.new_page()
await stealth_async(page)
await context.route("**/*.{png,jpg,jpeg,gif,woff,woff2,ttf}", lambda route: route.abort())
for page_num in range(1, max_pages + 1):
url = base_url if page_num == 1 else f"{base_url}{page_num}_p/"
await page.goto(url, wait_until="domcontentloaded")
await human_delay(2, 4)
await slow_scroll(page, steps=6)
await human_delay(1, 2)
listings = await page.evaluate("""
() => {
const scripts = document.querySelectorAll('script[type="application/json"]');
for (const script of scripts) {
try {
const data = JSON.parse(script.textContent);
const results = data?.props?.pageProps?.searchPageState?.cat1?.searchResults?.listResults;
if (results) return results;
} catch {}
}
return [];
}
""")
if not listings:
listings = await extract_zillow_cards_css(page)
all_listings.extend(listings)
next_btn = await page.query_selector("a[title='Next page']")
if not next_btn:
break
await human_delay(3, 6)
await browser.close()
return all_listingsThis script extracts property listings by parsing embedded JSON data within the page’s HTML, a more reliable method than fragile CSS selectors. It also includes a fallback to extract visible listing cards via CSS if the JSON method fails. The key to avoiding detection is randomizing delays, mimicking human scrolling behavior, and rotating user agents.
Ethical considerations and long-term reliability
Scraping public websites like Zillow or Indeed is not inherently illegal, but it must be done responsibly. Avoid overwhelming servers with rapid requests, respect rate limits, and never use scraped data for spamming or unethical purposes. For commercial applications, consider licensing data from official providers to ensure compliance and reliability.
As anti-bot technologies evolve, scraping pipelines must adapt. Regularly update your scripts to handle new detection mechanisms and consider using proxies or IP rotation if scaling up. For job market data, similar principles apply—use tools like Selenium or Playwright to navigate Indeed’s job listings while maintaining human-like interaction patterns.
The future of data-driven decision-making depends on ethical access to high-quality datasets. Whether for personal projects or business intelligence, mastering responsible scraping techniques ensures you can harness these valuable resources without compromising integrity or violating terms of service.
AI summary
Learn ethical web scraping of real estate and job listings from Zillow, Redfin, and Indeed using Python, Playwright, and production-ready techniques.