PORTFOLIO
Technical Skills
Programming Languages
MATLABWeb Technologies
Libraries & Frameworks
NLTK
Visualizations & Tools
TableauFeatured Researches
Decoding Human Dialogue: Sentiment to Deception
It all started with a pair of Nike shoes. Moments after casually mentioning them in a private chat to a friend, an ad for those exact sneakers appeared on my social media feed. That uncanny moment sparked a deep curiosity: How do machines read and understand human conversation?
Decoding the "Why" Behind the Reviews
Determined to understand how algorithms process text, my journey began in R with a dataset of over 14,000 Amazon earphone reviews. I sought to move beyond basic binary classification—identifying whether a review was simply positive or negative—to uncover the specific reasons why consumers felt that way. After training the system to recognize conversational nuances, such as understanding that the phrase "not good" conveys a negative sentiment rather than a positive one, I applied Latent Semantic Analysis (LSA) to automatically cluster thousands of reviews into hidden thematic categories.
The Finding: The algorithm revealed that while customers highly praised the sound quality and battery life, their primary sources of frustration were poor physical fit and unreliable Bluetooth connectivity.
The Finding: The algorithm revealed that while customers highly praised the sound quality and battery life, their primary sources of frustration were poor physical fit and unreliable Bluetooth connectivity.
sentimentr
quanteda
Topic Modeling (LSA)
Teaching Machines Context
With time, it became clear that human language is far too complex for simple keyword dictionaries. I transitioned to Python to build a more advanced predictive pipeline capable of understanding nuance. I developed a "hybrid" feature engineering approach that taught the machine to read multi-word phrases rather than isolated words. This allowed the algorithm to mathematically weigh the true meaning of a phrase based on its surrounding context, completely changing how it processed customer feedback.
The Finding: By equipping the algorithm to recognize contextual cues and negations (such as realizing "not terrible" is a compliment), the Support Vector Machine (SVM) model successfully learned to interpret nuanced sentiment, achieving a robust 91% predictive accuracy.
The Finding: By equipping the algorithm to recognize contextual cues and negations (such as realizing "not terrible" is a compliment), the Support Vector Machine (SVM) model successfully learned to interpret nuanced sentiment, achieving a robust 91% predictive accuracy.
Hybrid Feature Engineering
NLTK
Memory and Massive Scale
To process a massive, noisy dataset of over 1.5 million social media posts efficiently, I hit the computational limits of traditional Machine Learning. I transitioned to Deep Learning, specifically utilizing sequence models like Long Short-Term Memory (LSTM) networks. Instead of just reading isolated phrases, this advanced neural network acts like a human memory—it remembers how a sentence starts so it can accurately interpret how it ends. This allowed the system to capture profound semantic nuances across millions of unstructured records.
The Finding: By utilizing deep sequence memory to process complex language patterns and word embeddings, the LSTM architecture successfully captured context at an unprecedented scale, achieving an exceptional 98% predictive accuracy.
The Finding: By utilizing deep sequence memory to process complex language patterns and word embeddings, the LSTM architecture successfully captured context at an unprecedented scale, achieving an exceptional 98% predictive accuracy.
LSTM Architecture
Word Embeddings
Detecting Deception in Low-Resource Languages
Equipped with a strong foundation in deep learning, I applied these advanced neural networks to a much harder, real-world problem: detecting deceptive fake news in Bengali. Working with a "low-resource" language means operating without the vast, pre-built dictionaries available for English. It required building a system that could inherently understand the structural flow of the language to spot manipulative and misleading patterns without relying on existing translational tools.
The Finding: By designing a custom Bidirectional Gated Recurrent Unit (BiGRU) architecture that mathematically processes text both forwards and backwards, the system successfully analyzed 50,000 news articles to separate truth from deception with an astounding 99.16% accuracy.
The Finding: By designing a custom Bidirectional Gated Recurrent Unit (BiGRU) architecture that mathematically processes text both forwards and backwards, the system successfully analyzed 50,000 news articles to separate truth from deception with an astounding 99.16% accuracy.
BiGRU Architecture
Low-Resource NLP