Skip to content
Twitter ScrapingWeb ScrapingPythonBest PracticesTwitter API

GUIDE

Twitter Scraping, Best Practices for Production in 2026

Production-grade Twitter scraping patterns, retry logic, pagination, proxy strategy, rate-limit handling, and cost optimization for any third-party API.

TwitterAPIs··Updated May 7, 2026
Twitter scraping best practices for production workflows in 2026

Twitter scraping in 2026 is a different game than it was even a year ago. The official X API moved to pay-per-use ($5+ per 1,000 tweets), open-source scrapers like snscrape are mostly broken after Twitter's anti-scraping updates, and direct browser-automation hits IP-level rate limits within minutes.

This guide covers the production patterns that separate a working Twitter scraper from one that gets rate-limited, blocked, or returns incomplete data. Every pattern applies whether you're using TwitterAPIs, a self-hosted scraper with rotating proxies, or any other third-party Twitter API, the principles transfer across stacks.

This is the production-scale guide, not a getting-started how-to. New to scraping? Start with how to scrape tweets or the best Twitter scraper comparison. For the scraper product and pricing, see the Twitter scraper page.

The examples use TwitterAPIs's $0.0008-per-call REST endpoints (the cheapest production-ready Twitter scraping API at $0.04 per 1,000 tweets), but the engineering patterns work with any provider.


Scraping publicly accessible Twitter/X data is generally not a federal crime in the United States, the Ninth Circuit's 2022 ruling in hiQ Labs v. LinkedIn established that scraping public web data does not violate the Computer Fraud and Abuse Act (CFAA). That precedent is broadly applied to other public-web scraping cases.

That said, "not a CFAA violation" is not the same as "no operational friction":

Activity Precedent Practical reality
Reading public profiles, tweets, search results Generally permitted under hiQ Labs precedent IP blocks within hours, residential proxy costs add up
Behind-login content (timelines, DMs, bookmarks) Requires authentication, different access pattern Account-suspension exposure, browser-session fragility
Re-syndicating collected data Varies by jurisdiction Personal-data handling triggers PDPL/GDPR/CCPA review
Collecting for ML/AI training datasets Currently litigated case-by-case New territory, consult counsel for production work

The simplest way to avoid building scraping infrastructure is to use a third-party Twitter data API (like TwitterAPIs) that runs the infrastructure layer for you. You only deal with the API provider's developer terms.

The rest of this guide focuses on the technical patterns that make Twitter scraping reliable. For legal questions about your specific use case, consult a lawyer.


1. Twitter Scraping Retry Logic with Exponential Backoff

This is rule #1 for any API integration, not just TwitterAPIs. Network blips, upstream hiccups, and rate limits happen. If you don't retry, you lose data. If you retry too aggressively, you make things worse.

Which errors to retry

Status Code Meaning Retry?
200 Success No (you're done)
400 Bad request (invalid params) No, fix your request
401 Invalid API key or auth_token No, fix your credentials
404 User/tweet doesn't exist No, it's gone
429 Rate limit exceeded Yes, wait and retry
502 Bad gateway (upstream issue) Yes, wait and retry
503 Service temporarily unavailable Yes, wait and retry

Retry logic with exponential backoff

Why jitter matters

Without jitter, if 100 clients hit a rate limit at the same time, they all retry at exactly the same moment, creating a "thundering herd" that makes the problem worse. Adding a random 0-1 second delay spreads the retries out.

Don't retry everything

This is a common mistake. Retrying a 401 (bad API key) 3 times just wastes 3 API calls. Retrying a 404 (deleted tweet) won't bring it back. Only retry transient errors: 429, 502, 503, and network timeouts.


2. Proxy Strategy for Twitter Scraping (Write Endpoints)

Write endpoints like Create Tweet and DM Send execute actions on Twitter using your auth_token. By default, these requests originate from TwitterAPIs's servers, which means Twitter sees TwitterAPIs's IP, not yours.

For higher reliability and to avoid detection patterns, pass your own proxy so the request appears to come from your IP or a residential proxy.

Proxy architecture for read vs write endpoints

Which endpoint supports proxy

Currently only POST /twitter/tweet/create supports the proxy parameter. Pass your residential proxy URL in the request body so the tweet is posted from your IP instead of TwitterAPIs's servers.

Proxy best practices

  1. Use residential proxies, datacenter IPs get flagged faster
  2. Rotate proxies if you're posting from multiple accounts
  3. Match geography, if your Twitter account is based in the US, use a US proxy
  4. Test before scaling, verify your proxy works with a single tweet before running bulk operations
  5. Never share proxies across accounts that shouldn't be linked

When you don't need a proxy

Read endpoints (search, user info, followers, etc.) don't need proxies. They fetch public data and don't write to any account. Save your proxy budget for write operations only.


3. Pagination Patterns for Twitter Scrapers

Most TwitterAPIs endpoints return roughly 20 results per call and use cursor-based pagination: each response includes a next_cursor string and a has_more boolean, and you pass next_cursor back as the cursor parameter on the next call. Stopping when has_more is false is the only correct termination condition; hardcoding a page count will silently truncate results on variable-size datasets.

Cursor-based pagination flow

Which endpoints support pagination

Endpoint Results per Page Cursor Field
tweet/advanced_search ~20 tweets next_cursor
tweet/replies ~20 replies next_cursor
user/search ~20 users next_cursor
user/followers up to 200 next_cursor
user/followers_v2 ~70 next_cursor
user/following up to 200 next_cursor
user/following_v2 ~70 next_cursor
user/verified_followers ~20 next_cursor
user/media ~20 posts next_cursor
user/tweets ~20 tweets next_cursor
user/tweets_and_replies ~20 tweets next_cursor
user/likes ~20 tweets next_cursor
user/home_timeline ~20 tweets next_cursor
user/bookmark_search ~20 tweets next_cursor
user/followers_you_know ~20 next_cursor
list/members ~20 members next_cursor

Advanced Search pagination

Pass cursor=<next_cursor> from the previous response to fetch the next page. Verified clean across consecutive pages (no duplicate tweet IDs, monotonically descending by snowflake ID), so you can rely on cursor pagination for deep pulls without the duplicate-results issue that affected this endpoint earlier in 2026.

For very deep pulls (50+ pages on a high-volume query), it can still be cheaper and more parallelizable to split the query into date-range chunks using since: and until: operators instead of relying on a single deep cursor chain:

  • q=AI lang:en since:2026-01-01 until:2026-01-07
  • q=AI lang:en since:2026-01-07 until:2026-01-14
  • q=AI lang:en since:2026-01-14 until:2026-01-21
  • ...and so on

Each chunk gets its own fresh cursor chain. This is also useful when you want to parallelize across workers, different chunks can be fetched concurrently from different processes.

If results are changing by the minute or second (e.g., trending topics, breaking news), add time precision to since: and until::

  • q=from:elonmusk since:2026-01-01_12:00:00_UTC until:2026-01-01_18:00:00_UTC

This gives you hourly or even minute-level control over which tweets you fetch.

Date range chunking for Advanced Search

For a full reference of all Advanced Search operators (from:, to:, min_faves:, filter:, lang:, etc.), see twitter-advanced-search on GitHub.

Pagination mistakes to avoid

  1. Don't ignore has_more, always check it. If you just check next_cursor, you might make one extra unnecessary call.
  2. Don't hardcode page counts, use has_more as the stop condition, but set a maxPages safety limit.
  3. Add a delay between pages if you're paginating aggressively (e.g., 200ms between calls) to avoid hitting rate limits.
  4. Store cursors if your job might crash mid-pagination, you can resume from where you left off instead of starting over.

4. Choose the Right Endpoint for Each Twitter Scraping Task

TwitterAPIs has 52+ endpoints and several pairs look similar but serve different purposes. Using the wrong endpoint costs the same but returns incomplete data: user/info gives basic profile fields while user/about adds creation date and username history; user/followers returns 200 per page for bulk export while user/followers_v2 returns 70 per page but includes richer DM-eligibility signals.

User Info vs User About

user/info user/about
Basic profile Yes Yes
Extended metadata No Yes (creation date, location, username history)
Cost $0.0008 $0.0008

Rule of thumb: Use user/info for quick lookups (name, bio, follower count). Use user/about when you need full account history.


Start building with TwitterAPIs

$0.04 per 1,000 tweets. $0.50 free credits. No credit card required.

5. Auth Tokens for Twitter Scraping (Write & Private Endpoints)

A subset of TwitterAPIs endpoints require an auth_token, a Twitter session token tied to a specific logged-in account. This is separate from your TwitterAPIs API key. Write endpoints (tweet create, follow, like) and private-data endpoints (home timeline, bookmarks) all need it. You get the token either by extracting the auth_token cookie from a logged-in browser session or by calling POST /twitter/user_login with your credentials.

Auth token flow, two ways to get and use tokens

Which endpoints need auth_token

Endpoint Needs auth_token Why
tweet/create Yes Posts as a specific user
tweet/favorite Yes Likes as a specific user
tweet/retweet Yes Retweets as a specific user
user/home_timeline Yes User's personalized timeline
user/bookmark_search Yes User's private bookmarks
user/likes Yes User's liked tweets
user/followers_you_know Yes Mutual followers context

Token handling best practices

  1. Never log auth tokens, treat them like passwords
  2. Store tokens in environment variables, not in code
  3. Tokens expire, if you get a 401, re-authenticate
  4. One token per account, don't share tokens across different Twitter accounts
  5. TwitterAPIs never stores your tokens, they're used in-flight and discarded

6. Cost Optimization for Twitter Scraping

Every API call costs $0.0008 and returns roughly 20 tweets. The six most impactful cost-reduction practices are: caching tweet IDs and user profiles to avoid re-fetching, using user/followers v1 (200/page) instead of v2 (70/page) for bulk exports, narrowing searches with operators like min_faves:100 so fewer low-signal tweets consume page slots, stopping pagination once you have what you need, batching multi-account jobs to share pagination state, and never calling tweet/detail for tweets already returned by advanced_search.

  1. Don't re-fetch data you already have. Cache tweet IDs and user profiles locally. Check your cache before making an API call.

  2. Use tweet/detail sparingly. If you already got tweet data from advanced_search, don't call tweet/detail for the same tweet.

  3. Use v1 followers for bulk, v2 for DM outreach. v1 returns 200/page vs v2's 70/page, fewer calls for the same follower list.

  4. Use search operators to narrow results. min_faves:100 filters out low-engagement tweets before they consume a page slot.

  5. Paginate with a purpose. If you only need the first 100 tweets, set maxPages = 5. Don't paginate to the end unless you need everything.

  6. Batch your work. Instead of checking one user at a time, design your pipeline to process users in batches with shared pagination state.

Cost math at scale

Volume API Calls Cost What You Get
1K tweets 50 calls $0.04 Quick analysis
10K tweets 500 calls $0.40 Small dataset
100K tweets 5,000 calls $4.00 Research project
1M tweets 50,000 calls $40.00 Full-scale pipeline

Twitter Scraping in Python

Python with requests against TwitterAPIs is the recommended stack for Twitter scraping in 2026. It costs $0.0008 per call with no OAuth setup, works with any Python HTTP client, and returns all fields inline without field expansion parameters. The alternatives all have material tradeoffs: tweepy adds OAuth complexity only needed for user-delegated flows, snscrape is largely broken after Twitter's 2023-2026 anti-scraping updates, and browser automation hits IP bans within hours.

Tool When to use it Reality check
requests + TwitterAPIs Production scraping at any scale Single Bearer header, no OAuth, $0.0008 per call returning ~20 tweets
tweepy + Official X API When you need OAuth user-delegated flows Rare for scraping; pay-per-use makes it ~100x more expensive than TwitterAPIs
snscrape Historical projects only Largely broken in 2026, most endpoints fail after Twitter's anti-scraping updates
Self-hosted browser automation (Selenium / Playwright) Edge cases not covered by APIs IP bans within hours, proxy costs often exceed third-party API pricing

For a complete Python tutorial covering search, user profiles, followers, replies, DMs, pagination, retries, async patterns with httpx, a tweepy migration guide, and a drop-in SDK class, see How to Use the Twitter API with Python, 2026 Tutorial.

A minimal Twitter scraping loop in Python with retry + pagination:

import os
import time
import requests

API_KEY = os.environ["TWITTERAPIS_KEY"]
HEADERS = {"Authorization": f"Bearer {API_KEY}"}

def scrape_tweets(query: str, max_pages: int = 10) -> list[dict]:
    all_tweets = []
    cursor = None

    for _ in range(max_pages):
        params = {"q": query, "product": "Latest"}
        if cursor:
            params["cursor"] = cursor

        for attempt in range(3):
            r = requests.get(
                "https://api.twitterapis.com/twitter/tweet/advanced_search",
                params=params,
                headers=HEADERS,
                timeout=15,
            )
            if r.status_code == 200:
                break
            if r.status_code in (429, 502, 503):
                time.sleep(2 ** attempt)  # exponential backoff
                continue
            r.raise_for_status()

        data = r.json()
        all_tweets.extend(data.get("tweets", []))

        if not data.get("has_more"):
            break
        cursor = data.get("next_cursor")

    return all_tweets

tweets = scrape_tweets("AI min_faves:100 lang:en since:2026-01-01")
print(f"Scraped {len(tweets)} tweets")

That's the entire production loop: pagination + retry + backoff in 30 lines.


Quick Reference Cheat Sheet

Five production rules for Twitter scraping with TwitterAPIs: retry only on 429, 502, and 503 (never on 400/401/404); use proxies only for write endpoints, not reads; check has_more and next_cursor for pagination rather than hardcoding page counts; store auth tokens in environment variables and rotate on a 401 response; and cache results to avoid re-fetching data you already have.

Practice Do Don't
Retry logic Retry 429, 502, 503 with backoff Retry 400, 401, 404
Proxy Use for write endpoints (create, DM) Use for read endpoints
Pagination Check has_more + next_cursor Hardcode page counts
Auth token Store in env vars, rotate on 401 Hardcode in source
Cost Cache results, use search operators Re-fetch data you already have

Start Scraping Twitter the Right Way

TwitterAPIs gives you $0.50 in free credits at signup, that's ~625 API calls (~12,500 tweets) with no credit card. Enough to test every pattern in this guide and build a working scraper before committing.

  1. Sign up at twitterapis.com
  2. Get your API key from the dashboard
  3. Read the full API documentation for endpoint-specific parameters and response schemas

For deeper context, see Twitter API v2 vs TwitterAPIs, the Twitter API cost guide, our Python Twitter API tutorial, and the Twitter advanced search operators guide for query construction patterns that pair with the scraping loop below.


The cheapest Twitter API. Try it free.

$0.04 per 1,000 tweets. $0.50 free credits. No credit card required.

7. Deduplication at Scale

At high volume, the same tweet or user record can appear across multiple paginated responses or parallel workers. Without deduplication, you overcount engagement, inflate user stats, and store redundant rows. The simplest approach is a seen-IDs set in memory or a Redis SET for multi-worker jobs.

import os
import requests

API_KEY = os.environ["TWITTERAPIS_KEY"]
HEADERS = {"Authorization": f"Bearer {API_KEY}"}

def scrape_unique(query: str, max_pages: int = 20) -> list[dict]:
    """Paginate search and deduplicate by tweet ID."""
    seen_ids: set[str] = set()
    unique_tweets: list[dict] = []
    cursor = None

    for _ in range(max_pages):
        params = {"q": query, "product": "Latest"}
        if cursor:
            params["cursor"] = cursor

        r = requests.get(
            "https://api.twitterapis.com/twitter/tweet/advanced_search",
            params=params,
            headers=HEADERS,
            timeout=15,
        )
        r.raise_for_status()
        data = r.json()

        for tweet in data.get("tweets", []):
            tid = tweet.get("id")
            if tid and tid not in seen_ids:
                seen_ids.add(tid)
                unique_tweets.append(tweet)

        if not data.get("has_more"):
            break
        cursor = data.get("next_cursor")

    return unique_tweets

For parallel workers, replace seen_ids with a shared Redis SADD / SISMEMBER call so IDs are shared across processes.


8. Caching User Profiles to Save Credits

User profile data changes slowly: follower counts, bios, and verification status shift daily, not by the minute. If you re-fetch a profile on every tweet you encounter, you pay $0.0008 per lookup when you could pay once per day.

import time

_profile_cache: dict[str, tuple[dict, float]] = {}
PROFILE_TTL = 3600  # 1 hour

def get_user_cached(username: str) -> dict:
    """Fetch user profile, cached for PROFILE_TTL seconds."""
    entry = _profile_cache.get(username)
    if entry:
        data, fetched_at = entry
        if time.time() - fetched_at < PROFILE_TTL:
            return data

    r = requests.get(
        "https://api.twitterapis.com/twitter/user/info",
        params={"userName": username},
        headers=HEADERS,
        timeout=15,
    )
    r.raise_for_status()
    profile = r.json()["data"]
    _profile_cache[username] = (profile, time.time())
    return profile

For production, replace the in-process dict with Redis or Memcached and set the TTL at the cache layer. A team doing influencer enrichment on 100,000 unique authors per day goes from 100,000 user lookups to under 20,000 per day just by adding a 12-hour cache layer, a 5x reduction in credit spend with no change to data freshness requirements.


9. Endpoint Selection Quick Reference

TwitterAPIs has 52+ endpoints and the most common selection mistakes cost credits without returning the right data. The six highest-impact rules: use tweet/advanced_search for keyword searches rather than looping user/tweets across accounts; use user/followers v1 for bulk export (200/page) and user/followers_v2 for DM outreach (richer DM signals); use user/about when you need account creation date or username history rather than the lighter user/info.

Task Correct endpoint Common mistake
Search tweets by keyword tweet/advanced_search Using user/tweets and looping multiple accounts
Get one user's recent tweets user/tweets Running a from:user search query
Bulk follower export user/followers (v1, 200/page) Using user/followers_v2 (70/page, costs the same)
DM outreach list user/followers_v2 (DM metadata richer) Using v1 and missing DM-eligibility signals
Full account history user/about Using user/info (missing creation date, history)
Replies to a specific tweet tweet/replies Search conversation_id:ID (less complete)

For a side-by-side endpoint map against the official X API, see the Twitter API v2 vs TwitterAPIs comparison.


10. Error Classification and Alerting

Production scrapers should classify errors into three buckets: retryable transient errors, permanent request errors, and account-level errors. Mixing them causes either silent data loss (not retrying when you should) or wasted API budget (retrying when you should not).

import os
import time
import logging
import requests

API_KEY = os.environ["TWITTERAPIS_KEY"]
HEADERS = {"Authorization": f"Bearer {API_KEY}"}

RETRYABLE = {429, 500, 502, 503, 504}
PERMANENT = {400, 401, 403, 404, 422}

logger = logging.getLogger(__name__)


def classified_fetch(url: str, params: dict, max_retries: int = 3) -> dict:
    """
    Classify HTTP errors and retry only transient ones.
    Raises on permanent errors. Returns data on success.
    """
    delay = 1.0
    for attempt in range(max_retries):
        r = requests.get(url, params=params, headers=HEADERS, timeout=15)

        if r.status_code == 200:
            return r.json()

        if r.status_code in PERMANENT:
            logger.error("Permanent error %s on %s: %s", r.status_code, url, r.text[:200])
            r.raise_for_status()  # don't retry

        if r.status_code in RETRYABLE:
            wait = float(r.headers.get("Retry-After", delay))
            logger.warning("Transient %s on attempt %d, retrying in %.1fs", r.status_code, attempt + 1, wait)
            time.sleep(wait)
            delay = min(delay * 2, 60)  # cap at 60s
            continue

        # Unknown status: log and raise
        logger.error("Unexpected status %s: %s", r.status_code, r.text[:200])
        r.raise_for_status()

    raise RuntimeError(f"Exhausted {max_retries} retries for {url}")

This pattern is safe to wrap around any TwitterAPIs endpoint. The key insight is that 401 (bad API key) and 404 (user/tweet not found) should never be retried, while 429 and 502 almost always resolve on the next attempt. Add alerting on permanent errors to catch key expiry or account suspension before they silently kill your data pipeline.


11. Complete Production Scraper Template

The TwitterAPIsScraper class below combines every pattern from this guide into one reusable template: exponential-backoff retry on transient errors, cursor-based pagination with has_more termination, tweet-ID deduplication via a seen-IDs set, per-user profile caching with lru_cache, and three-bucket error classification. Drop it into any project as a starting point.

import os
import time
import requests
from functools import lru_cache

class TwitterAPIsScraper:
    BASE = "https://api.twitterapis.com"
    RETRYABLE = {429, 500, 502, 503, 504}

    def __init__(self):
        self.session = requests.Session()
        self.session.headers["Authorization"] = f"Bearer {os.environ['TWITTERAPIS_KEY']}"
        self._seen_tweet_ids: set[str] = set()

    def _get(self, path: str, params: dict, max_retries: int = 3) -> dict:
        delay = 1.0
        for attempt in range(max_retries):
            r = self.session.get(f"{self.BASE}{path}", params=params, timeout=15)
            if r.status_code == 200:
                return r.json()
            if r.status_code in self.RETRYABLE:
                time.sleep(min(delay * 2 ** attempt, 60))
                continue
            r.raise_for_status()
        raise RuntimeError(f"Exhausted retries: {path}")

    def search(self, query: str, max_pages: int = 10) -> list[dict]:
        """Paginate search with deduplication."""
        results, cursor = [], None
        for _ in range(max_pages):
            params = {"q": query, "product": "Latest"}
            if cursor:
                params["cursor"] = cursor
            data = self._get("/twitter/tweet/advanced_search", params)
            for t in data.get("tweets", []):
                if t["id"] not in self._seen_tweet_ids:
                    self._seen_tweet_ids.add(t["id"])
                    results.append(t)
            if not data.get("has_more"):
                break
            cursor = data.get("next_cursor")
        return results

    @lru_cache(maxsize=1000)
    def user(self, username: str) -> dict:
        """Cached user profile lookup."""
        return self._get("/twitter/user/info", {"userName": username})["data"]

# Usage
scraper = TwitterAPIsScraper()
tweets = scraper.search("AI min_faves:500 lang:en since:2026-01-01", max_pages=5)
author_profile = scraper.user(tweets[0]["author"]["userName"])
print(f"{len(tweets)} unique tweets, author: {author_profile['name']}")

For sentiment analysis on top of this data collection pattern, see the Twitter sentiment analysis guide. For the full operator syntax that goes into the query parameter, see the Twitter search operators reference.



12. Environment Setup and Key Rotation

Production API keys must never be hardcoded in source. Store your TWITTERAPIS_KEY in environment variables or a secrets manager, verify the key works with a lightweight test call before deploying, rotate it on any 401 response or if the key is exposed in logs or version control, and never commit a .env file containing credentials to git.

# Set in shell profile or .env file (never commit .env to git)
export TWITTERAPIS_KEY="your-api-key-here"

# Verify the key works before deploying
python3 -c "
import os, requests
r = requests.get(
    'https://api.twitterapis.com/twitter/user/info',
    params={'userName': 'elonmusk'},
    headers={'Authorization': f\"Bearer {os.environ['TWITTERAPIS_KEY']}\"},
    timeout=10
)
print('OK' if r.status_code == 200 else f'ERROR: {r.status_code}')
"

Key rotation checklist:

  1. Generate a new key from the TwitterAPIs dashboard
  2. Deploy the new key to all environments (staging first, then production)
  3. Verify the new key works in production via a test call
  4. Revoke the old key from the dashboard
  5. Update any monitoring alerts that check for 401 responses (a spike indicates a key rotation gap)

If you use a secrets manager (AWS Secrets Manager, HashiCorp Vault, GCP Secret Manager), store the key there and inject it at runtime via the secrets manager SDK rather than environment variables. This enables centralized rotation without a redeploy.

For the complete guide to setting up a production Python scraper from scratch, including the SDK wrapper class, async patterns, and tweepy migration guide, see the Python Twitter API tutorial. For advanced search query patterns that feed into the q parameter throughout this guide, see the Twitter search operators reference. For a full breakdown of what each API call costs at your volume, including the owned-read vs standard-read distinction and the real monthly bills at light, medium, heavy, and enterprise volumes, see the Twitter API cost guide. Key rotation should happen quarterly at minimum, or immediately whenever a key is shared outside the team, committed to source control by accident, or exposed in logs. A practical rotation policy: set a calendar reminder for the 1st of every quarter, generate a new key, deploy it to all environments in a rolling window starting with development, verify in production, then revoke the old key. Keep the last two keys active for a 24-hour overlap window during rotation to avoid downtime if a stale key is cached somewhere in your stack. Document each rotation event in your ops runbook with the date, reason, and who completed it. A completed rotation that is not documented creates ambiguity the next time you need to rotate under pressure.


Best-practice patterns verified against live TwitterAPIs endpoints May 2026. hiQ Labs v. LinkedIn precedent sourced from the Ninth Circuit opinion (2022). Rate-limit and pricing data from the official X API pricing page as of May 2026.

Frequently Asked Questions

Scraping public Twitter/X data is generally not a federal crime in the US under *hiQ Labs v. LinkedIn* precedent (the Ninth Circuit ruled scraping public web data isn't a CFAA violation). The simpler path for most production use cases is a third-party Twitter data API (like TwitterAPIs) that runs the infrastructure layer for you. For specific legal questions about your use case, consult a lawyer.

Technically yes, browser automation with Puppeteer or Playwright can scrape Twitter's web UI. In practice, it's increasingly unreliable in 2026: Twitter's anti-scraping defenses detect headless browsers, fingerprint requests, and rate-limit by IP within hours. Self-hosted scrapers also incur rotating residential-proxy costs ($5-$15 per GB) that frequently exceed third-party API pricing for the same data volume.

Direct browser scraping hits per-IP rate limits within minutes. The official X API enforces 15-minute and 24-hour windows per endpoint with `429 Too Many Requests` responses. TwitterAPIs has no platform-level rate caps for normal-volume workloads, see the [Twitter API rate limits comparison](/twitter-api-rate-limits) for endpoint-by-endpoint detail.

A Twitter API (official or third-party) returns structured JSON via documented HTTP endpoints, the provider handles auth, retries, anti-bot defenses, and rate limits on your behalf. Scraping refers to extracting data directly from the rendered Twitter web UI using browser automation or HTML parsing. APIs are far more reliable; scraping breaks every time Twitter ships a UI change.

For most production workloads, **TwitterAPIs** is the most cost-effective at $0.04 per 1,000 tweets ($0.0008 per call returning ~20 tweets), about 100x cheaper than the official X API standard read rate. Open-source tools like `snscrape` are largely broken in 2026 due to Twitter's anti-scraping updates. Self-hosted browser automation (Selenium, Playwright) hits IP-level rate limits within hours and the rotating-proxy costs often exceed third-party API pricing.

Costs vary significantly by approach: $5-$10 per 1,000 tweets on the official X API standard read rate, $0.04 per 1,000 tweets on TwitterAPIs, $0.15 per 1,000 on twitterapi.io, $0.25-$0.40 per 1,000 on Apify scrapers, and $0 plus proxy costs for self-hosted scrapers (which typically work out to $1-$5 per 1,000 tweets in practice). See the [Twitter API cost guide](/blogs/twitter-api-cost) for the full pricing breakdown.

Largely no. The maintainers paused active development in 2023, and most endpoints (search, user timelines, followers) are unreliable or fully broken after Twitter's tightened anti-scraping defenses. For working Python alternatives, use `requests` against TwitterAPIs, see our [Python Twitter API tutorial](/blogs/python-twitter-api-tutorial) for working code.

Check out similar blogs

More guides on the Twitter/X API, scraping, and pricing.

How to scrape the full tweet history of any public X account in 2026, past the 3,200-tweet timeline limit, using date-window search and cursor pagination
Tweet HistoryWeb Scraping

Scrape Full Tweet History of Any Account in 2026 (Beyond the 3,200 Limit)

Why the X timeline stops at 3,200 tweets and how to pull an account's full history with date-window search, cursor pagination, and dedup. Live-tested code in Python and curl.

TwitterAPIs·
Building a Twitter bot in 2026, no-code and Python paths, runnable code, and the real X API cost reality after the free tier ended
Twitter BotX Bot

How to Build a Twitter Bot in 2026: The Complete Guide

Build a Twitter bot in 2026 with no-code or Python. Working Tweepy and requests code, auth explained, and the cheap API path at $0.04 per 1,000 reads.

TwitterAPIs·
Comparison of static residential and ISP proxy providers for scraping Twitter and X data in 2026, with verified per-IP pricing and the build-versus-buy decision
Residential ProxiesWeb Scraping

Best Residential Proxies for Twitter Scraping in 2026 (Verified Pricing) and When You Do Not Need One

Verified June 2026 per-IP pricing for static residential and ISP proxies (Decodo, Webshare, Bright Data, IPRoyal, Oxylabs and more), the fake-ISP risk nobody warns you about, and the build-vs-buy math for scraping X data.

TwitterAPIs·
How to scrape tweets in 2026: the legal line on public data and a read-API fetch pattern that does not get blocked
ScrapingPython

How to Scrape Tweets in 2026 (Without Getting Blocked)

How to scrape tweets in 2026 without getting blocked: why browser scraping breaks, where the legal line on public data sits, and a runnable read-API fetch script.

TwitterAPIs·
Twitter trends API tutorial: pull trending topics and hashtags by location in 2026
TrendsPython

Twitter Trends API: Pull Trending Topics by Location in 2026

A 2026 guide to building a Twitter trends API: pull trending topics and hashtags by location using a per-call search endpoint, with runnable Python and curl.

TwitterAPIs·
Twitter API tutorial 2026 complete developer guide, pricing collapse era, with auth flows, endpoints, code samples, and cost math
TutorialDeveloper Guide

Twitter API Tutorial 2026: The Complete Developer Guide

The 2026 Twitter API tutorial built after the pricing collapse. Auth, endpoints, code, rate limits, real costs, and the alternative when official gets too expensive.

TwitterAPIs·
Apify Twitter Scraper vs TwitterAPIs comparison 2026
ApifyTwitter Scraper

Apify Twitter Scraper vs TwitterAPIs: Cost, Speed and Reliability in 2026

Head-to-head comparison of Apify Twitter Scraper and TwitterAPIs REST API. Cost per 1,000 tweets, response time benchmarks, rate limits, and when to pick each one.

TwitterAPIs·
Twitter sentiment analysis Python tutorial, TextBlob, VADER, and RoBERTa compared on real tweets
Sentiment AnalysisPython

Twitter Sentiment Analysis in Python, Full Tutorial

Step-by-step Python tutorial for Twitter sentiment analysis. Compare TextBlob, VADER, and transformer models on real tweets. Code, costs, and pitfalls covered.

TwitterAPIs·