What is Sentiment Analysis?
Sentiment analysis is a text analytics technique that automatically determines whether a piece of text
expresses a positive, negative, or neutral opinion. It's widely used in marketing to analyze customer reviews,
social media posts, survey responses, and any other text-based feedback at scale.
Instead of reading thousands of comments manually, sentiment analysis lets you quickly answer questions like:
"Are customers happy with our new product?" or "How does sentiment differ between our brand and competitors?"
How VADER Works
This lab uses VADER (Valence Aware Dictionary and sEntiment Reasoner), a rule-based sentiment
analysis tool specifically designed for social media and informal text. Unlike machine learning approaches that
require training data, VADER works "out of the box" using a curated dictionary of words with pre-assigned sentiment scores.
The VADER process:
- Tokenization: The text is split into individual words (tokens)
- Lexicon lookup: Each word is checked against a dictionary of ~7,500 words rated for sentiment
(e.g., "excellent" = +3.2, "terrible" = -2.9, "okay" = +0.9)
- Context rules: VADER applies smart adjustments:
- Intensifiers boost sentiment: "very good" is more positive than "good"
- Negations flip sentiment: "not good" becomes negative
- Punctuation adds emphasis: "Great!!!" is more positive than "Great"
- CAPS increase intensity: "AMAZING" is stronger than "amazing"
- Contrastive conjunctions: "but" shifts focus to the second clause
- Score normalization: Raw scores are normalized to a -1 to +1 scale
Understanding the Four Scores
VADER produces four sentiment scores for each text record:
| Score |
Range |
What It Means |
| Compound |
-1 to +1 |
The overall sentiment. This is the most useful single metric.
Scores ≥ 0.05 are positive, ≤ -0.05 are negative, and between is neutral. |
| Positive |
0 to 1 |
Proportion of text that is positive. Example: 0.45 means 45% of the emotional content is positive. |
| Neutral |
0 to 1 |
Proportion of text that is neutral (factual statements, filler words). |
| Negative |
0 to 1 |
Proportion of text that is negative. Example: 0.20 means 20% of the emotional content is negative. |
Note: Positive + Neutral + Negative always sum to 1.0 (100%), while Compound is calculated separately.
💡 Which score should I use?
For most marketing analyses, focus on the Compound score—it's the overall sentiment.
The pos/neu/neg proportions are useful for understanding why the compound is what it is. For example,
a compound of 0.15 with pos=0.35, neu=0.60, neg=0.05 tells you "moderately positive, but mostly factual text with little emotion."
When to Use Sentiment Analysis
Great use cases:
- Monitoring brand perception across social media
- Analyzing customer reviews to identify pain points
- Comparing sentiment across product lines, time periods, or customer segments
- Prioritizing which feedback to investigate first (focus on negative comments)
- Tracking sentiment trends over time (before/after a campaign or product launch)
Limitations to keep in mind:
- Sarcasm and irony are often missed ("Oh great, another update that breaks everything")
- Domain-specific language may not be in the lexicon (technical jargon, slang)
- Context matters: "This phone is sick!" is positive in slang but appears negative
- Mixed sentiment can average out: "The food was amazing but the service was terrible" may score neutral
Always spot-check results and use sentiment analysis as a starting point for deeper investigation, not a final answer.
What This Lab Shows You
Per-record sentiment: Each row of your data receives all four sentiment scores,
plus a categorical label (positive / neutral / negative) based on the compound value.
Summary statistics: See the overall distribution of sentiment across all records,
including averages, standard deviations, and the proportion falling into each category.
Group comparisons: If your data has a grouping column (like brand or category),
compare sentiment across groups to identify leaders and laggards.
Worked examples: For randomly chosen positive and negative samples, the lab shows a
token-by-token breakdown so you can see exactly how VADER computed the final scores.