GUIDE
Twitter Sentiment Analysis in Python, Full Tutorial
Step-by-step Python tutorial for Twitter sentiment analysis. Compare TextBlob, VADER, and transformer models on real tweets. Code, costs, and pitfalls covered.

Most "Twitter sentiment analysis" tutorials online were written before the free Twitter API died, and they have not aged well. They tell you to install tweepy, point you at endpoints that now cost real money, and lean on nltk.sentiment examples that quietly mislabel half of modern Twitter copy because they were trained on movie reviews. The good news is that doing this properly in 2026 is not hard, you need one API key, three Python libraries, and about 50 lines of code per pass.
This tutorial walks through a complete Twitter sentiment-analysis pipeline in Python: pull recent tweets via the TwitterAPIs Twitter API (cheaper than the official X API, no developer-account approval), score them with three different methods (TextBlob, VADER, and a transformer model), compare accuracy on real tweets, and visualize the results. Every section has runnable code, and the full pipeline costs about $0.50 to score 10,000 tweets end-to-end.
TL;DR, the 3-step pipeline
Every Twitter sentiment-analysis workflow runs the same three steps: fetch tweets matching your query via the Twitter API, score each tweet as positive, negative, or neutral using a sentiment model, then aggregate and visualize the scores. The hard part is picking the right model: TextBlob is fastest but only about 60-65% accurate on social-media text, VADER was built for Twitter and reaches 75-85%, and RoBERTa is the most accurate at 75-80% on TweetEval benchmarks but is 100-1,000x slower than VADER.
- Fetch tweets matching a query (brand name, hashtag, keyword) via the Twitter API.
- Score each tweet as positive, negative, or neutral using a sentiment model.
- Aggregate and visualize the scores, counts, time series, score distribution.
The methods covered below trade off speed and accuracy:
| Method | Speed | Accuracy on Twitter | Cost | Best for |
|---|---|---|---|---|
| TextBlob | ⚡⚡⚡ Very fast | ⭐⭐ OK | Free | Quick prototyping, dashboards where speed matters |
| VADER | ⚡⚡⚡ Very fast | ⭐⭐⭐ Good | Free | Default for social-media text (handles emojis, slang) |
| RoBERTa (Hugging Face) | ⚡ Slower | ⭐⭐⭐⭐⭐ Best | Free model, compute cost | Production brand monitoring, research |
Skip to whichever section matches your need, the code samples are independent.
Step 1, Fetch tweets via the Twitter API
The fastest path to a Twitter API key in 2026 is to skip the official X developer console (multi-day approval queue, OAuth 1.0a, four-credential setup) and use a pay-per-use third-party API. TwitterAPIs gives you a Bearer token in 30 seconds, $0.50 in free credits at signup (about 12,500 tweets), and a single REST endpoint for tweet search.
Install the only dependency you need for fetching:
pip install requests
Then this is the entire data-fetch script:
import requests
import json
from typing import List, Dict
TWITTERAPIS_BASE = "https://api.twitterapis.com"
TWITTERAPIS_TOKEN = "YOUR_TWITTERAPIS_TOKEN" # get one at twitterapis.com
HEADERS = {"Authorization": f"Bearer {TWITTERAPIS_TOKEN}"}
def search_tweets(query: str, max_tweets: int = 200) -> List[Dict]:
"""
Fetch tweets matching `query`. Handles pagination via the cursor.
Returns a list of tweet dicts with id, text, author, and created_at.
"""
tweets: List[Dict] = []
cursor = None
while len(tweets) < max_tweets:
params = {"query": query, "queryType": "Latest"}
if cursor:
params["cursor"] = cursor
r = requests.get(
f"{TWITTERAPIS_BASE}/twitter/tweet/advanced_search",
headers=HEADERS,
params=params,
timeout=15,
)
r.raise_for_status()
data = r.json()
batch = data.get("tweets", [])
if not batch:
break
tweets.extend(batch)
cursor = data.get("next_cursor")
if not cursor:
break
return tweets[:max_tweets]
if __name__ == "__main__":
results = search_tweets("ChatGPT lang:en -is:retweet", max_tweets=500)
print(f"Fetched {len(results)} tweets")
print(json.dumps(results[0], indent=2))
A few things worth knowing:
- One call returns roughly 20 tweets at $0.0008 per call, so 500 tweets costs $0.04. A full 10,000-tweet sample is around $0.50.
lang:en -is:retweetis a standard advanced-search filter combination that removes retweets and forces English. Add operators likemin_faves:50orfrom:elonmuskfor more targeted samples. See the Twitter Search API guide for the full operator list, or the Twitter search API page for the endpoint reference this script runs on.- Pagination is cursor-based. Each response includes a
next_cursoryou pass back on the next request until it stops returning one. - No OAuth. The Bearer header is the whole auth flow.
If you would rather see how to handle retries, async fetching with httpx, and other production concerns, the Python Twitter API tutorial covers those patterns in depth.
Step 2a, Score with TextBlob (the easy one)
TextBlob is the friendliest sentiment library in Python. It returns a polarity score between -1 (negative) and +1 (positive). Accuracy on Twitter is mediocre because TextBlob's lexicon was built mostly from product reviews, but it is fine for quick prototypes and dashboards where the user does not need surgical precision.
pip install textblob
python -m textblob.download_corpora
from textblob import TextBlob
def textblob_sentiment(text: str) -> dict:
blob = TextBlob(text)
polarity = blob.sentiment.polarity # -1.0 to 1.0
subjectivity = blob.sentiment.subjectivity # 0.0 to 1.0
if polarity > 0.1:
label = "positive"
elif polarity < -0.1:
label = "negative"
else:
label = "neutral"
return {"polarity": polarity, "subjectivity": subjectivity, "label": label}
# Apply to your fetched tweets
for tweet in results[:5]:
s = textblob_sentiment(tweet["text"])
print(f"{s['label']:8s} ({s['polarity']:+.2f}) {tweet['text'][:90]}")
TextBlob's main weaknesses on Twitter:
- Ignores emojis entirely. A tweet of just 🔥🔥🔥 scores 0.
- Fails on negation in informal text. "not bad" scores negative.
- Slang and abbreviations are mostly unscored.
If those misses matter for your use case, jump to VADER.
Step 2b, Score with VADER (the smart one for social media)
VADER (Valence Aware Dictionary and sEntiment Reasoner) was built specifically for social-media text. It handles emojis, slang, intensifiers ("really good" scores higher than "good"), and negations correctly. It is still rule-based and still free, but it is dramatically better on Twitter than TextBlob.
pip install vaderSentiment
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
vader = SentimentIntensityAnalyzer()
def vader_sentiment(text: str) -> dict:
scores = vader.polarity_scores(text)
# scores = {"neg": 0.0, "neu": 0.5, "pos": 0.5, "compound": 0.7}
compound = scores["compound"] # -1 to 1, the standard summary score
if compound >= 0.05:
label = "positive"
elif compound <= -0.05:
label = "negative"
else:
label = "neutral"
return {**scores, "label": label}
for tweet in results[:5]:
s = vader_sentiment(tweet["text"])
print(f"{s['label']:8s} ({s['compound']:+.2f}) {tweet['text'][:90]}")
What makes VADER different in practice:
- Emojis count. 🔥 reads as positive, 💀 as negative, 😭 as either depending on context.
- Intensifiers work. "ABSOLUTELY AMAZING" scores higher than "amazing".
- Negation flips correctly. "not bad" scores positive.
- Speed. Around 30,000 tweets per second on a single CPU core, same order as TextBlob.
For most brand-monitoring or general sentiment dashboards, VADER is the right default. It will only let you down on heavy sarcasm, irony, and complex multi-clause sentences, which brings us to the transformer model.
Start building with TwitterAPIs
$0.04 per 1,000 tweets. $0.50 free credits. No credit card required.
Step 2c, Score with a transformer (the accurate one)
The best Twitter sentiment model on Hugging Face today is cardiffnlp/twitter-roberta-base-sentiment-latest, a RoBERTa model fine-tuned on roughly 124M tweets. It catches sarcasm and context that VADER misses, at the cost of being 100-1,000× slower (still tractable: a few hundred tweets per second on a modern CPU, thousands per second on GPU).
pip install transformers torch
from transformers import pipeline
# Loads about 500MB on first run; cached locally after.
classifier = pipeline(
"sentiment-analysis",
model="cardiffnlp/twitter-roberta-base-sentiment-latest",
)
def roberta_sentiment(texts: list[str]) -> list[dict]:
"""Batch-score for efficiency. Each result has 'label' and 'score'."""
results = classifier(texts, truncation=True, max_length=128)
# results: [{"label": "positive", "score": 0.93}, ...]
return results
# Batch the tweets for speed
texts = [t["text"] for t in results[:50]]
scores = roberta_sentiment(texts)
for tweet, s in zip(results[:5], scores[:5]):
print(f"{s['label']:8s} ({s['score']:.2f}) {tweet['text'][:90]}")
When the transformer earns its compute cost:
- Sarcasm. "Oh great, another Twitter outage", VADER scores positive ("great"), RoBERTa scores negative.
- Irony. "Just love spending three hours on hold", same story.
- Context-dependent words. "sick" can mean "ill" or "amazing" depending on the surrounding text.
- Subtle tonal shifts between polite and passive-aggressive.
If you are building a research project, a brand intelligence tool, or anything where misclassifying 10% of tweets matters, use RoBERTa. If you are building a real-time dashboard with thousands of tweets per minute, VADER is the practical choice.
Step 3, Aggregate, visualize, decide
A list of {label, score, tweet} rows is not insight. The next layer is aggregation: counts by sentiment class, sentiment over time, and the most-positive / most-negative example tweets.
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
# Build a DataFrame combining tweets + VADER scores
rows = []
for tweet in results:
s = vader_sentiment(tweet["text"])
rows.append({
"id": tweet["id"],
"created_at": datetime.fromisoformat(
tweet["created_at"].replace("Z", "+00:00")
),
"text": tweet["text"],
"compound": s["compound"],
"label": s["label"],
})
df = pd.DataFrame(rows)
# Sentiment distribution
counts = df["label"].value_counts()
print(counts)
# Hourly time series
df["hour"] = df["created_at"].dt.floor("h")
hourly = df.groupby(["hour", "label"]).size().unstack(fill_value=0)
ax = hourly.plot(kind="area", stacked=True, alpha=0.7, figsize=(10, 5))
ax.set_title("Sentiment Over Time")
ax.set_xlabel("Hour")
ax.set_ylabel("Tweet count")
plt.tight_layout()
plt.savefig("sentiment_time_series.png", dpi=120)
# Top 5 most positive and most negative tweets
print("\nTop 5 positive:")
print(df.nlargest(5, "compound")[["compound", "text"]])
print("\nTop 5 negative:")
print(df.nsmallest(5, "compound")[["compound", "text"]])
This is the minimum analytical surface for a useful dashboard. From here you would typically layer on top:
- Filter by author / verified-only to weight influencer voices.
- Geo grouping if your fetched tweets included location metadata.
- Topic clustering with
BERTopicorsentence-transformersto break the data into themes before measuring sentiment per theme. - Anomaly alerts when negative-tweet velocity spikes by some threshold.
Real example, brand sentiment for an app launch
This worked example applies the full pipeline to a realistic question: what was public sentiment around the ChatGPT 5 launch in the first 48 hours? It fetches 2,000 English tweets matching the query, scores them with RoBERTa, builds a DataFrame combining raw counts and engagement-weighted scores, and prints a single headline number where the engagement-weighted positivity typically tells a different story than the raw percentage because a small number of highly-liked negative tweets can dominate the conversation.
# 1. Fetch a focused sample
tweets = search_tweets(
"ChatGPT 5 lang:en -is:retweet",
max_tweets=2000,
)
# 2. Score with RoBERTa (worth the compute for a one-time analysis)
texts = [t["text"] for t in tweets]
scores = roberta_sentiment(texts)
# 3. Combine
df = pd.DataFrame([
{
"created_at": datetime.fromisoformat(
t["created_at"].replace("Z", "+00:00")
),
"text": t["text"],
"author": t.get("author", {}).get("userName"),
"favorite_count": t.get("favorite_count", 0),
"label": s["label"],
"confidence": s["score"],
}
for t, s in zip(tweets, scores)
])
# 4. Headline numbers
total = len(df)
pct_pos = (df["label"] == "positive").mean() * 100
pct_neg = (df["label"] == "negative").mean() * 100
print(f"{total} tweets · {pct_pos:.1f}% positive · {pct_neg:.1f}% negative")
# 5. Weight by engagement (likes act as amplification)
df["weighted"] = df["favorite_count"].clip(lower=1)
weighted_pos = (
df.loc[df["label"] == "positive", "weighted"].sum() / df["weighted"].sum() * 100
)
print(f"Engagement-weighted positivity: {weighted_pos:.1f}%")
The engagement-weighted score usually tells a different story than the raw count. A small number of very-popular negative tweets can dominate the conversation even when the average tweet is neutral.
Model Accuracy on Twitter Data
TextBlob reaches roughly 60-65% accuracy on Twitter text because it was trained on movie reviews and product descriptions, not social media. VADER, purpose-built for social media, originally hit 96.2% in Hutto and Gilbert's 2014 evaluation but now runs at 75-85% on modern Twitter vocabulary. RoBERTa fine-tuned on tweets achieves 75-80% on the TweetEval benchmark, making it the best option for production brand monitoring where misclassification has a real cost.
TextBlob accuracy on Twitter: TextBlob was trained on movie reviews and product descriptions, not social-media text. Independent evaluations place its accuracy on Twitter sentiment at roughly 60 to 65 percent on balanced datasets, only slightly better than a random classifier that picks "neutral" for everything. The main failure modes are slang, emojis, negation in informal phrasing, and short text with no lexical anchor.
VADER accuracy on Twitter: VADER was purpose-built for social media. The original paper by Hutto and Gilbert (2014) reported 96.2% accuracy on Twitter data in their evaluation, outperforming even human-labeled baselines in several categories. In practice, accuracy on modern Twitter text (2024-2026) is lower, roughly 75 to 85 percent, because the platform's vocabulary has shifted and VADER's lexicon is not actively updated. VADER still handles emojis and intensifiers correctly, which gives it a meaningful edge over TextBlob on real-world brand monitoring queries.
RoBERTa accuracy on Twitter: The cardiffnlp/twitter-roberta-base-sentiment-latest model achieves around 75 to 80 percent accuracy on the TweetEval benchmark, which is the standard academic comparison suite for Twitter NLP models. For production-quality brand intelligence where classification errors cost money (misclassifying a PR crisis as neutral, for example), RoBERTa is worth the compute overhead. The Cardiff NLP TweetEval leaderboard tracks updated model benchmarks if you want to compare newer alternatives.
When to switch models mid-analysis
If you start with VADER for speed and notice that a category of tweets is consistently miscategorized (all ironic tweets score as positive, all technical complaints score as neutral), this is the signal to run a sample through RoBERTa and compare. A practical workflow: run VADER on all 10,000 tweets, pull the 500 most positive by VADER score, manually review 50 of them, and measure what fraction RoBERTa would have correctly classified as negative. If the error rate is above 20 percent on your most-positive category, upgrade to RoBERTa for that domain.
The cheapest Twitter API. Try it free.
$0.04 per 1,000 tweets. $0.50 free credits. No credit card required.
Common pitfalls (read this before you ship)
Six mistakes appear repeatedly in Twitter sentiment projects: sampling bias from overly narrow queries that miss misspellings and abbreviations, survivorship bias from rate-limited fetches that only capture the most-recent slice, bot accounts contaminating the signal, multilingual tweets breaking English-only models, sarcasm fooling VADER consistently, and time-zone bugs in time-series plots caused by storing UTC timestamps without converting to local time before binning by hour.
- Sample bias from search syntax. A query for
"AcmeCorp"will under-index complaints (people often spell the brand wrong when they are angry). Use OR-broadened queries ("AcmeCorp" OR "Acme Corp" OR "@acme") and check the recall. - Survivorship bias from rate-limited fetches. If your fetch caps at 1,000 tweets and the topic has 10,000 mentions, you are sampling the most-recent slice. Either sample randomly across the time window or scale up your fetch budget.
- Bot tweets contaminate the signal. Filter out accounts with zero followers and zero following (often spam), or filter for
verified:trueif you want only signal from "real people". The Twitter API rate limits guide covers how to scale a fetch without tripping platform caps. - Multilingual tweets break English-only models. Set
lang:enin your search query, or use a multilingual model likexlm-roberta-base-sentimentinstead. - Sarcasm fools VADER consistently. If sarcasm is common in your domain (politics, gaming, tech), pay the transformer compute cost.
- Time-zone bugs in the time series. Tweets are timestamped in UTC. Convert to the user's local time before plotting hourly trends or "8am vs 6pm" patterns will be wrong.
How much does this cost end to end?
A complete analysis of 10,000 tweets run once on a developer laptop costs roughly $0.50 total: 500 TwitterAPIs calls at $0.0008 each for the fetch, and effectively zero compute cost for TextBlob and VADER running on CPU. RoBERTa on a laptop CPU adds about 10 minutes of compute with no incremental cost; on a GPU instance it runs in under a minute for about $0.01. The same 10,000-tweet fetch on the official X API would cost $50-$100 at standard read rates.
| Line item | Cost |
|---|---|
| Fetch 10,000 tweets via TwitterAPIs (~500 calls × $0.0008) | $0.40 |
| TextBlob / VADER scoring (CPU, no extra cost) | $0.00 |
| RoBERTa scoring on CPU (~10 minutes of compute) | ~$0.00 (your laptop) |
| RoBERTa scoring on a GPU instance (under 1 minute) | ~$0.01 |
| Total per 10K-tweet pass | ~$0.50 |
For comparison, doing the same fetch on the official X API would cost about $50-$100 at the same volume, because the X API charges $0.005-$0.01 per post read versus TwitterAPIs's $0.0008 per call (~20 tweets per call). Our Twitter API pricing comparison page breaks down the per-tweet economics in detail.
Pulling the whole pipeline together
The full end-to-end script below combines tweet fetch with cursor pagination, VADER scoring, RoBERTa scoring with batch processing, time-series aggregation, and a matplotlib visualization into one runnable file. Drop in a TwitterAPIs token and a query, and you have a complete brand sentiment tool in about 80 lines of Python. The three scoring functions (TextBlob, VADER, RoBERTa) are modular, so you can swap the scoring method without touching the fetch or visualization layers.
import requests
import pandas as pd
from datetime import datetime
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
TWITTERAPIS_TOKEN = "YOUR_TWITTERAPIS_TOKEN"
HEADERS = {"Authorization": f"Bearer {TWITTERAPIS_TOKEN}"}
vader = SentimentIntensityAnalyzer()
def search_tweets(query, max_tweets=500):
tweets, cursor = [], None
while len(tweets) < max_tweets:
params = {"query": query, "queryType": "Latest"}
if cursor:
params["cursor"] = cursor
r = requests.get(
"https://api.twitterapis.com/twitter/tweet/advanced_search",
headers=HEADERS, params=params, timeout=15,
)
r.raise_for_status()
d = r.json()
batch = d.get("tweets", [])
if not batch:
break
tweets.extend(batch)
cursor = d.get("next_cursor")
if not cursor:
break
return tweets[:max_tweets]
def score(text):
s = vader.polarity_scores(text)
c = s["compound"]
label = "positive" if c >= 0.05 else "negative" if c <= -0.05 else "neutral"
return c, label
def analyze(query, max_tweets=1000):
tweets = search_tweets(query, max_tweets)
rows = []
for t in tweets:
c, label = score(t["text"])
rows.append({
"created_at": datetime.fromisoformat(t["created_at"].replace("Z", "+00:00")),
"text": t["text"],
"compound": c,
"label": label,
})
df = pd.DataFrame(rows)
summary = df["label"].value_counts(normalize=True).round(3)
return df, summary
if __name__ == "__main__":
df, summary = analyze("ChatGPT lang:en -is:retweet", max_tweets=1000)
print(summary)
print(f"\nMost positive tweet:\n{df.loc[df['compound'].idxmax(), 'text']}")
print(f"\nMost negative tweet:\n{df.loc[df['compound'].idxmin(), 'text']}")
Run it, swap the query for whatever brand or topic you want to monitor, and you have a complete Twitter sentiment dashboard in roughly 70 lines of code.
Where to go from here
Once the pipeline is working, the three natural next steps are: adding a cron job and database layer for continuous brand monitoring, switching to RoBERTa with a wider time window for research projects that need higher accuracy, and reading the scraping best practices guide to handle retries, deduplication, and cost controls at million-tweet volume. The script from this post is the foundation for all three paths.
- Brand monitoring dashboards: drop the pipeline behind a cron job, ship results to a database, build a dashboard. The Twitter Monitoring use-cases page covers 14 patterns that build on this same sentiment foundation.
- Research projects: switch to the RoBERTa model, expand the time window, and start clustering by topic before measuring sentiment per cluster.
- Production at scale: read the Twitter scraping best practices guide for handling retries, deduplication, and rate-limit-aware fetches at million-tweet volume. For operator patterns that improve sample quality, see the Twitter search operators guide.
Or just start with the script: paste it into a file, drop your TwitterAPIs token at the top (get one with $0.50 in free credits at twitterapis.com), and run it against any query you want to understand. Sentiment analysis is one of those tools that earns its place the moment you have it.
Scaling Your Sentiment Pipeline
Scaling the pipeline to production requires three additions to the basic script: query broadening with OR-expanded brand terms to improve recall, deduplication of tweet IDs across concurrent fetches to avoid counting the same tweet twice in engagement-weighted aggregations, and a cron-based polling architecture that stores results to a database so the sentiment history accumulates over time rather than being recomputed on each run.
Query broadening for better recall: The default brand query ("BrandName" lang:en -is:retweet) often misses a significant fraction of mentions because users misspell brand names, use abbreviations, or tag the brand without using its exact name. A broader query looks like: "BrandName" OR "Brand Name" OR "@brand_handle" OR "#BrandHashtag" lang:en -filter:retweets. Measure recall improvement by comparing total tweet counts between the narrow and broad queries on the same time window. If the broad query returns 3x more tweets, most of that volume is genuine signal you were missing.
Sampling for large volumes: If your brand generates 50,000 mentions per day and you are running continuous monitoring, you do not need to score every tweet with RoBERTa. A statistically valid sample for daily sentiment tracking is around 1,000 to 2,000 tweets per day, which gives you a margin of error under 3 percent at 95 percent confidence. Use VADER on all tweets for quick anomaly detection, and run RoBERTa on a daily random sample of 1,000 for the authoritative daily score.
Database schema for time-series sentiment: Store tweet ID, created_at (UTC), text (truncated to 280 chars), VADER compound score, label (positive/neutral/negative), and the query string that produced the tweet. Index on created_at for time-series queries and on label for distribution queries. This schema supports all standard dashboard views: hourly sentiment trend, rolling 7-day average, label distribution, most-positive/most-negative example tweets.
Alerting thresholds: A useful default alert fires when the 1-hour negative tweet rate exceeds the 7-day daily average by more than 2 standard deviations. This catches PR crises and product failures as they emerge rather than hours later. Wire the alert to a Slack webhook or PagerDuty. False positives are common for brands with volatile audiences; tune the threshold by looking at historical crisis events in your data and confirming the alert would have fired within 30 minutes.
For the Python code that powers the data collection layer of this pipeline, see the Python Twitter API tutorial. For production scraping patterns including deduplication, cost controls, and parallel date-range fetches at scale, see the Twitter scraping best practices guide. For a detailed breakdown of API call costs at different monthly volumes, see the Twitter API cost guide. If you are evaluating whether to use the official X API or TwitterAPIs as your data source for sentiment work, the Twitter API v2 vs TwitterAPIs comparison covers the endpoint, pricing, and operator differences in detail. The short answer for sentiment analysis: TwitterAPIs at $0.04 per 1,000 tweets gives you the same tweet data as the official X API at $5 per 1,000 tweets, plus the full web search operator set including engagement filters that the official API does not expose, at 100x lower cost per tweet. For a workload of 10,000 tweets per day, that is roughly $12/month on TwitterAPIs versus $1,500/month on the official API, a meaningful difference when the sentiment scoring libraries themselves are already free. At 100,000 tweets per day, the gap grows to $120/month versus $15,000/month, at which point the API cost alone exceeds the revenue of many subscription-based brand monitoring products. For most sentiment analysis use cases, choosing TwitterAPIs over the official X API is a product economics decision as much as a technical one.
Sentiment model benchmarks sourced from Cardiff NLP, TweetEval (ACL Anthology 2022). VADER accuracy reference from Hutto and Gilbert, 2014, VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text (AAAI). TwitterAPIs pricing verified May 2026 from TwitterAPIs pricing page. Official X API cost comparison sourced from docs.x.com/x-api/getting-started/pricing.
Frequently Asked Questions
`cardiffnlp/twitter-roberta-base-sentiment-latest` on Hugging Face is the most accurate mainstream model for Twitter-specific text in 2026. It is a RoBERTa model fine-tuned on roughly 124 million tweets and consistently outperforms TextBlob and VADER on Twitter-specific benchmarks from the [Cardiff NLP research group](https://aclanthology.org/2022.lrec-1.543/). The tradeoff is compute cost: it is 100 to 1,000 times slower than VADER depending on hardware. For real-time dashboards where latency matters, VADER is the better default. For offline batch analysis where accuracy is the priority, RoBERTa is the correct choice.
For sentiment analysis specifically, the official X API returns the same public tweet data but at roughly 100 times the cost: $0.005 to $0.01 per post read versus $0.0008 per call returning roughly 20 tweets via TwitterAPIs. The official API also requires a multi-day developer account application and a four-credential OAuth setup before you can make your first request. Unless you specifically need filtered streams, compliance-grade data sourcing, or PowerTrack-tier firehose access, a third-party API is the more practical choice for sentiment work.
Sarcasm detection is its own active research area and even RoBERTa misses obvious sarcasm sometimes. Two practical options for production use: first, use a sarcasm-specific model like `helinivan/english-sarcasm-detector` as a second-pass filter, flagging positive tweets for re-scoring before they are counted. Second, filter your input to high-confidence RoBERTa predictions only (score above 0.8) and accept that sarcasm-heavy tweets will fall into the low-confidence middle band that is discarded. For most brand monitoring use cases, the second approach is simpler and still catches 80 to 90 percent of actionable signal.
A cron-based polling job that runs the analysis script every 15 minutes against your tracked queries, writes results to a database, and triggers alerts when the negative-tweet rate spikes above a threshold. The compute cost for VADER is effectively zero. The API cost is around $0.50 per 10,000 tweets fetched, which works out to under $5 per month per brand you monitor at typical query volumes. For production reliability including retry logic, deduplication, and cost controls, see the [Twitter scraping best practices guide](/blogs/twitterapis-best-practices).
Only if you are scoring with RoBERTa at high volume (10,000 or more tweets per pass). TextBlob and VADER are CPU-only and process tens of thousands of tweets per second on a single core. RoBERTa on CPU runs about 10 to 50 tweets per second depending on your hardware, which is fine for batches of a few hundred but slow for batches of 100,000. A consumer GPU (RTX 3060 or better) pushes RoBERTa to 1,000 or more tweets per second. For a one-time analysis of 10,000 tweets, a CPU run takes roughly 5 to 10 minutes, which is acceptable for most research workflows.
The sentiment libraries (TextBlob, VADER, transformers) are all free and open source. The only recurring cost is fetching the tweets. TwitterAPIs gives $0.50 in free credits at signup with no credit card, which covers about 12,500 tweets, enough to run a meaningful pilot analysis. After the free credits are used, the rate is $0.04 per 1,000 tweets. For a full cost breakdown at higher volumes, see the [Twitter API cost guide](/blogs/twitter-api-cost).
Replace the model with a multilingual sentiment model. `cardiffnlp/twitter-xlm-roberta-base-sentiment` covers roughly 8 languages including Spanish, French, German, Portuguese, Italian, Arabic, and English. It uses the same RoBERTa architecture as the English-only model and drops into the same `pipeline()` code with no other changes needed. Remove the `lang:en` filter from your search query and the model will score tweets in any of the supported languages. For less common languages, `lxyuan/distilbert-base-multilingual-cased-sentiments-student` is a lighter alternative that covers 7 additional languages.
Yes. TwitterAPIs's `tweet/advanced_search` endpoint covers historical tweets by date range using the `since:` and `until:` search operators. A query like `"AcmeCorp" since:2025-01-01 until:2025-03-01` fetches all indexed tweets about AcmeCorp in Q1 2025. For large time ranges, split into weekly or monthly chunks to avoid cursor-stability issues on very deep pagination. The full operator syntax including date filters is covered in the [Twitter search operators guide](/blogs/twitter-advanced-search-operators).
Filter at the search-query level using `min_faves:1` to remove most zero-engagement spam accounts. At the analysis level, drop rows where the author `follower_count` is 0 and `following_count` is above 1,000, a common bot-account signature. For higher-signal samples, add `filter:blue_verified` to the query to restrict to X Premium accounts, which have a paid verification barrier. The tradeoff is a smaller sample size, but cleaner signal for brand intelligence use cases where outlier bot accounts can skew the aggregate score meaningfully.
Yes for public tweets. The legality question usually arises around storing tweets or republishing them, which is governed by Twitter/X Terms of Service and local data-protection law (GDPR, CCPA). The analysis itself, reading public tweets and computing a sentiment score, is standard academic and commercial practice. The Ninth Circuit's 2022 ruling in hiQ Labs v. LinkedIn confirmed that accessing publicly available web data does not violate the Computer Fraud and Abuse Act. Do not republish raw tweet text without attribution or in ways that violate the X Terms of Service developer agreement.
Check out similar blogs
More guides on the Twitter/X API, scraping, and pricing.







