Sentiment Analysis Lab

Text Analysis

Upload text data and explore how lexicon-based sentiment analysis scores each record. This lab uses a VADER-style approach and walks through both overall scores and a detailed, token-by-token breakdown for a sample sentence.

OVERVIEW & CONCEPTS

What is Sentiment Analysis?

Sentiment analysis is a text analytics technique that automatically determines whether a piece of text expresses a positive, negative, or neutral opinion. It's widely used in marketing to analyze customer reviews, social media posts, survey responses, and any other text-based feedback at scale.

Instead of reading thousands of comments manually, sentiment analysis lets you quickly answer questions like: "Are customers happy with our new product?" or "How does sentiment differ between our brand and competitors?"

How VADER Works

This lab uses VADER (Valence Aware Dictionary and sEntiment Reasoner), a rule-based sentiment analysis tool specifically designed for social media and informal text. Unlike machine learning approaches that require training data, VADER works "out of the box" using a curated dictionary of words with pre-assigned sentiment scores.

The VADER process:

  1. Tokenization: The text is split into individual words (tokens)
  2. Lexicon lookup: Each word is checked against a dictionary of ~7,500 words rated for sentiment (e.g., "excellent" = +3.2, "terrible" = -2.9, "okay" = +0.9)
  3. Context rules: VADER applies smart adjustments:
    • Intensifiers boost sentiment: "very good" is more positive than "good"
    • Negations flip sentiment: "not good" becomes negative
    • Punctuation adds emphasis: "Great!!!" is more positive than "Great"
    • CAPS increase intensity: "AMAZING" is stronger than "amazing"
    • Contrastive conjunctions: "but" shifts focus to the second clause
  4. Score normalization: Raw scores are normalized to a -1 to +1 scale
Understanding the Four Scores

VADER produces four sentiment scores for each text record:

Score Range What It Means
Compound -1 to +1 The overall sentiment. This is the most useful single metric. Scores ≥ 0.05 are positive, ≤ -0.05 are negative, and between is neutral.
Positive 0 to 1 Proportion of text that is positive. Example: 0.45 means 45% of the emotional content is positive.
Neutral 0 to 1 Proportion of text that is neutral (factual statements, filler words).
Negative 0 to 1 Proportion of text that is negative. Example: 0.20 means 20% of the emotional content is negative.

Note: Positive + Neutral + Negative always sum to 1.0 (100%), while Compound is calculated separately.

💡 Which score should I use?

For most marketing analyses, focus on the Compound score—it's the overall sentiment. The pos/neu/neg proportions are useful for understanding why the compound is what it is. For example, a compound of 0.15 with pos=0.35, neu=0.60, neg=0.05 tells you "moderately positive, but mostly factual text with little emotion."

When to Use Sentiment Analysis

Great use cases:

  • Monitoring brand perception across social media
  • Analyzing customer reviews to identify pain points
  • Comparing sentiment across product lines, time periods, or customer segments
  • Prioritizing which feedback to investigate first (focus on negative comments)
  • Tracking sentiment trends over time (before/after a campaign or product launch)

Limitations to keep in mind:

  • Sarcasm and irony are often missed ("Oh great, another update that breaks everything")
  • Domain-specific language may not be in the lexicon (technical jargon, slang)
  • Context matters: "This phone is sick!" is positive in slang but appears negative
  • Mixed sentiment can average out: "The food was amazing but the service was terrible" may score neutral

Always spot-check results and use sentiment analysis as a starting point for deeper investigation, not a final answer.

What This Lab Shows You

Per-record sentiment: Each row of your data receives all four sentiment scores, plus a categorical label (positive / neutral / negative) based on the compound value.

Summary statistics: See the overall distribution of sentiment across all records, including averages, standard deviations, and the proportion falling into each category.

Group comparisons: If your data has a grouping column (like brand or category), compare sentiment across groups to identify leaders and laggards.

Worked examples: For randomly chosen positive and negative samples, the lab shows a token-by-token breakdown so you can see exactly how VADER computed the final scores.

CASE STUDIES

Use these case studies to practice reading sentiment outputs before uploading your own data. Presets include simulated Reddit-style posts about a university’s online enrollment system and detailed reviews of a new influencer swimwear brand.

LOAD TEXT DATA

Upload or paste text

Choose which column contains the text you want to analyze.

Drop a CSV/TSV file here or click to browse.

The first row should contain column names; text should be in one of the columns.

Optional: group results by a categorical column (e.g., brand, source, rating).

Optional: select a column to use as row identifiers in the output.

If you paste text here, the tool will ignore any uploaded file and use these lines instead.

SUMMARY & VISUALS

Sentiment summary

Average compound score:
Records labeled positive:
Records labeled neutral:
Records labeled negative:

Run the analysis to see overall sentiment across your text records.

Labels distribution

This bar chart shows how many records are classified as positive, neutral, or negative based on VADER's compound score thresholds.

DETAILED RESULTS & WORKED EXAMPLE

Export Per-Record Results

Download the sentiment scores for all analyzed records as a CSV file.

How VADER scored two examples

Run the analysis to see how each token in a relatively positive record contributes to its final sentiment score.

Run the analysis to see how each token in a relatively negative record contributes to its final sentiment score.

How this example is computed

The analyzer first tokenizes the sentence and looks up each token in a sentiment lexicon. Tokens with known sentiment (e.g., "great", "terrible") get a base score; others are neutral.

Next, local rules adjust those base scores. Intensifiers like "very" or "extremely" boost nearby sentiment words; negations like "not" flip or attenuate scores; and all-caps or repeated punctuation can add emphasis.

Finally, the adjusted token scores are combined into overall positive, neutral, and negative proportions and a single compound score in \([-1, 1]\). The example above shows each token's contribution so you can see how the final numbers arise from the text.

🔗 Connecting Examples to Patterns

These examples show how VADER processes individual texts token-by-token. To understand patterns across ALL your records, scroll down to the Sentiment Distribution (Histogram) and Box Plot—they reveal whether your entire dataset skews positive/negative or shows mixed opinions across many records.

ANALYSIS REPORT

APA-Style Statistical Reporting

Run the analysis to generate an APA-style report of your sentiment results.

Managerial Interpretation

Run the analysis to generate a managerial interpretation of your sentiment results.

Summary of Estimates

Measure Estimate Std. Dev. Min Max

Sentiment Distribution: Histogram

📊 How to read bar colors: Each bar represents a range of compound scores (e.g., 0.10 to 0.21). Bars are colored based on where the bin center falls: green for positive territory (≥0.05), red for negative (≤−0.05), and gray for neutral. The bar's height shows how many records fall within that score range.

Run the analysis to see how compound sentiment scores are distributed across your text records.