1. Sentiment Analysis
What is Sentiment Analysis? Sentiment analysis determines whether a piece of text expresses positive, negative, or neutral sentiments.
Example using scikit-learn:
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
# Sample movie reviews
reviews = [
"This movie was amazing! I loved the storyline and the acting was superb.",
"The movie was a complete waste of time. The plot was boring, and the characters were flat.",
"It was an okay movie, nothing too special but not terrible either.",
]
# Create a TF-IDF representation
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(reviews)
# Define the sentiment labels (0: negative, 1: positive, 2: neutral)
y = [1, 0, 2]
# Train a Naive Bayes classifier
clf = MultinomialNB()
clf.fit(X, y)
# Test the classifier on a new review
new_review = "The acting was great, but the storyline was a bit confusing."
new_review_vec = vectorizer.transform([new_review])
sentiment = clf.predict(new_review_vec)
print("Predicted Sentiment:", sentiment[0])
2. Named Entity Recognition (NER)
What is Named Entity Recognition? NER identifies and categorizes named entities in text, such as names of people, organizations, and locations.
Example using spaCy:
import spacy
# Load the pre-trained NER model
nlp = spacy.load("en_core_web_sm")
# Sample text
text = "Apple Inc. is an American multinational technology company headquartered in Cupertino, California."
# Perform named entity recognition
doc = nlp(text)
# Print the named entities and their labels
for ent in doc.ents:
print(f"{ent.text}: {ent.label_}")
Output:
Apple Inc.: ORG
American: NORP
Cupertino, California: GPE
3. Text Classification
What is Text Classification? Text classification assigns predefined categories or labels to text based on its content.
Example using scikit-learn:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
# Sample texts with labels
texts = [
("This is a news article about politics.", "news"),
("I need to buy a new laptop for work.", "tech"),
("What's the best restaurant in the city?", "food"),
("The stock market crashed today due to economic turmoil.", "business"),
("This movie has amazing special effects!", "entertainment"),
]
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
, [label for _, label in texts], test_size=0.2
)
# Create a TF-IDF representation
vectorizer = TfidfVectorizer()
X_train_vec = vectorizer.fit_transform(X_train)
X_test_vec = vectorizer.transform(X_test)
# Train a Naive Bayes classifier
clf = MultinomialNB()
clf.fit(X_train_vec, y_train)
# Test the classifier on new texts
new_texts = [
"I'm looking for a good Italian restaurant nearby.",
"The latest iPhone was just released with new features.",
]
new_texts_vec = vectorizer.transform(new_texts)
predictions = clf.predict(new_texts_vec)
print("Predicted Labels:", predictions)
These examples illustrate how to perform common NLP tasks using fundamental techniques and libraries in Python. Sentiment analysis, named entity recognition, and text classification are essential for various applications like customer feedback analysis, information extraction, and content categorization. Understanding these tasks helps in effectively preprocessing text data and applying machine learning models for meaningful insights.
[…] A- Text preprocessing (tokenization, stemming, lemmatization)B- Feature extraction (bag-of-words, TF-IDF, word embeddings)C- NLP tasks (sentiment analysis, named entity recognition, text classification) […]