iToverDose/Software· 31 MAY 2026 · 12:09

How to scrape Bluesky starter packs for under $2 per 1,000 profiles

Bluesky starter packs are reshaping professional networking on the platform, but extracting their membership data requires scraping. Here’s how a new tool automates the process for pennies per profile while handling edge cases like rate limits and nested API responses.

DEV Community4 min read0 Comments

Bluesky’s rapid growth in early 2024 wasn’t just about new users joining the platform—it was about how those users found each other. A key driver was the introduction of starter packs, curated lists of accounts that could be followed with a single click. According to research published in 2024, starter packs accounted for 43% of all follows during the platform’s most explosive period. For researchers, marketers, and analysts, these packs represent some of the most valuable audience segments on social media today.

Yet turning a pack’s membership into actionable data—like a spreadsheet for a CRM—isn’t straightforward. The platform’s web interface displays members one page at a time, and there’s no official way to export the entire list. That’s where a new Bluesky starter pack scraper comes in, automating the process for less than two dollars per thousand profiles while handling the platform’s unique API quirks.

What exactly is a Bluesky starter pack?

Under the hood, a starter pack is a structured record stored in the AT Protocol, the decentralized framework powering Bluesky. Each pack includes:

  • A title and optional description, often targeting a specific niche like "ML Researchers on Bluesky" or "London Founders."
  • A creator’s handle, such as pfrazee.com, which identifies who curated the list.
  • A list of member DIDs (decentralized identifiers), the protocol’s equivalent of account IDs.
  • A stable AT URI, formatted as at://did:plc:.../app.bsky.graph.starterpack/..., which acts as a permanent link to the pack across the federated network.

When a new user follows a starter pack, every listed account receives a follower instantly. This viral mechanism explains the 43% stat—packs aren’t just recommendations; they’re tools for reshaping entire professional networks overnight.

Why can’t you just download the data directly?

The AT Protocol does offer public APIs, but they’re not designed for bulk exports. The AppView endpoint (`) provides three relevant methods:

  • app.bsky.graph.getStarterPack – retrieves a single pack by its AT URI.
  • app.bsky.graph.getActorStarterPacks – lists all packs created by a specific user.
  • app.bsky.graph.getList – pulls the member profiles associated with a pack.

What’s missing is a straightforward way to search packs by topic or export their members in one step. The protocol includes a searchStarterPacks method, but the public AppView returns an XRPCNotSupported error (HTTP 404), meaning keyword searches aren’t available to the general public. To find relevant packs, you’ll need to identify influential curators first and then manually enumerate their packs.

What does the extracted data actually look like?

The scraper outputs clean, typed JSON for every member, with pack metadata embedded directly into each row. This ensures the dataset is self-contained and ready for analysis or export to tools like spreadsheets or CRMs. A typical record includes:

{
  "pack_uri": "at://did:plc:abc123/app.bsky.graph.starterpack/xyz789",
  "pack_name": "AI Researchers on Bluesky",
  "pack_description": "Curated list of ML/AI researchers who migrated from Twitter.",
  "pack_creator_handle": "alice.bsky.social",
  "member_did": "did:plc:def456",
  "member_handle": "bob.bsky.social",
  "member_display_name": "Bob Smith",
  "member_followers_count": 1204,
  "member_following_count": 380,
  "member_posts_count": 841,
  "member_indexed_at": "2024-11-14T09:22:01.000Z",
  "scraped_at": "2026-05-16T12:00:00.000Z"
}

Twelve fields per row, with most values always present. Fields like member_display_name and member_indexed_at may be null if the API omits them for a profile. The scraper preserves these rows instead of dropping them, ensuring no profiles are lost during extraction.

How the scraper handles AT Protocol’s hidden complexities

Building a reliable scraper for Bluesky’s API isn’t just about making a few HTTP requests—it’s about navigating the protocol’s design choices and edge cases. Here’s how the tool tackles them:

Cursor-based pagination across multiple endpoints.

A single pack’s member list isn’t fetched in one call. First, the scraper queries getStarterPack to retrieve the embedded list’s AT URI. Then, it uses getList with that URI to pull a page of members, along with a cursor for the next batch. This loop continues until the response omits the cursor. For large packs, this can mean a dozen sequential calls—all of which must succeed and reassemble in order. The scraper caps the process with a maxMembersPerPack setting to prevent runaway costs.

Deeply nested API responses.

The starterPack object buries key fields like name and description inside a nested record sub-object. A naive parser looking for pack["name"] would return None every time. The correct path is pack["record"]["name"]. The scraper hardcodes this structure and validates output using Pydantic to ensure consistency before writing to the dataset.

Rate limit resilience.

AT Protocol doesn’t publish official rate limits, but it enforces them aggressively. The scraper implements exponential backoff for 429 (Too Many Requests) and 503 (Service Unavailable) errors, starting with a 2-second delay and doubling each attempt up to 30 seconds, with a maximum of 5 retries. It also rotates browser fingerprints (Chrome, Firefox, Safari TLS profiles) to mimic real user traffic and avoid being flagged as a bot.

Validation-first data pipeline.

Every row passes through a strict validation layer using Pydantic’s ResultRow.model_validate(...) before being written to storage. If the API contract changes—say, a required field disappears—the scraper fails visibly with a clear error message instead of emitting corrupted data. No data, no charge.

The scraper abstracts all of this complexity into a single tool, making it possible for researchers, marketers, and analysts to turn Bluesky’s most influential audience lists into structured data—without writing a line of code.

Looking ahead, as Bluesky’s ecosystem matures, starter packs will likely become even more central to audience discovery and growth strategies. Tools that bridge the gap between curated communities and actionable data will only grow in value.

AI summary

Bluesky starter pack’lerinin üyelerini CSV’ye dönüştürmek artık çok kolay. Bu rehberde, verilerinizi nasıl toplu halde alabileceğinizi ve analiz edebileceğinizi öğrenin.

Comments

00
LEAVE A COMMENT
ID #PN7HY4

0 / 1200 CHARACTERS

Human check

3 + 9 = ?

Will appear after editor review

Moderation · Spam protection active

No approved comments yet. Be first.