Jan 2024 - May 2024

Sentiment Analysis using Naive Bayes

NLP classification pipeline for movie review sentiment.

This project builds a text classification pipeline that predicts review sentiment using preprocessing, tokenization, word frequency distributions, Laplace smoothing, and log-probability scoring.

PythonNLPMachine LearningText ProcessingNaive Bayes

GitHub All Projects

Conceptual Visual

NLP Classification Pipeline

Raw Review

Preprocessing

Tokenization

Naive Bayes

Positive / Negative

Highlight

NLP Pipeline

Highlight

Naive Bayes Model

Highlight

Laplace Smoothing

Highlight

CLI Workflow

Executive Summary

This project builds a text classification pipeline that predicts review sentiment using preprocessing, tokenization, word frequency distributions, Laplace smoothing, and log-probability scoring.

Problem Statement

Raw text needs structured preprocessing and robust probability scoring before it can be classified reliably. This project demonstrates a foundational NLP workflow from data processing to prediction.

What I Built

Text preprocessing

Tokenization

Laplace smoothing

Configurable datasets

CLI execution

How It Works

A conceptual workflow showing how the project moves from input to processing and output.

Step 1

Dataset

Step 2

Cleaning

Step 3

Tokenization

Step 4

Word Frequency Training

Step 5

Log Probability Scoring

Step 6

Sentiment Prediction

Architecture / System Design

A simplified system view of the major project components and how responsibilities connect.

Step 1

Text Input

Step 2

Preprocessor

Step 3

Feature Extractor

Step 4

Naive Bayes Classifier

Step 5

Prediction Output

Technical Implementation

Preprocessing

Lowercasing
Punctuation removal
Tokenization

Model

Word frequency distributions
Laplace smoothing
Log-probability scoring

Workflow

Configurable datasets
CLI execution
Positive/negative classification

Tools

Python
NLP fundamentals
Probabilistic modeling

Visual Showcase

Conceptual preview panels for the project experience. These are intentional placeholders, not fake screenshots.

NLP Pipeline Diagram

Conceptual flow from raw text to sentiment prediction.

Tokenization Preview

Placeholder panel showing cleaned tokens prepared for modeling.

Probability Score Panel

Visual concept for comparing class-level log scores.

Classification Output Card

Clean result card for positive or negative prediction output.

Classification Preview

Input:
"The movie was surprisingly emotional and well acted."

Prediction:
Positive Review

Challenges & Solutions

Challenge

Raw text is noisy and cannot be modeled directly.

Solution

Built a preprocessing pipeline for lowercasing, punctuation removal, and tokenization.

Challenge

Unseen words can break simple probability estimates.

Solution

Used Laplace smoothing and log-probability scoring for more stable classification.

Results / Impact

Demonstrates practical software engineering through modular structure, readable workflows, and clear technical documentation.

Shows ability to convert course and research concepts into working systems with real implementation constraints.

GitHub Link Back to All Projects