0 likes | 16 Views
https://projectgurukul.org/python-sentiment-analysis-machine-learning/
E N D
Python Sentiment Analysis using Machine Learning Sentiment analysis is a natural language processing technique that determines whether the data is positive, negative, or neutral. Basically, sentiment analysis is performed on textual data. This is a very useful technique that is used to help businesses monitor brands and products according to customer feedback and understand customer needs. What is Sentiment Analysis? Sentiment analysis is nothing, it just recognizes the sentiment behind the text. It is often used by businesses to detect sentiment in social data, product reputation, and understanding customers. Nowadays, customers share their thoughts and feelings more openly (In feedback forms and in comments on a particular product). So that’s why sentiment analysis has become very essential to monitor and understand the sentiment of customers. There are basically two types of sentiment analysis: ■ Emotion detection: In this sentiment analysis technique we generally detect emotions, like happiness, frustration, anger, sadness, and so on. ■ Aspect-based Sentiment Analysis: In this type of sentiment analysis we basically analyze the sentiments behind the texts. For example, aspect-based sentiment analysis can be used when you want particular aspects or features of people giving a product review as positive, neutral, and negative.
So these are the basic two types of sentiment analysis and in this python sentiment analysis project, we will perform Aspect-based Sentiment Analysis. Before discussing the Project that we are going to implement, let’s first understand what natural language processing is? About Natural language Processing: Natural language processing (NLP) enables computers to communicate with humans. With the help of NLP, it is possible for computers to read text, hear speech, interpret it, measure sentiments, and also with the help of NLP we can determine which part of text is important. NLP allows machines to break down and interpret human language. Natural Language processing techniques are widely used for Projects like chatbot creation, spam filters, Social media monitoring, etc. About Sentiment Analysis Project: This is a Machine Learning project, in which with the help of machine learning algorithms and techniques we will classify the sentiment of text that is positive, negative, or neutral. Sentiment Analysis Dataset The dataset we will be using to develop this machine learning sentiment analysis project is Twitter US Airline Sentiment which is available on the following link: Sentiment Analysis Dataset
The dataset is basically a CSV file that consists of 30 columns. With the help of this data, we will train our ml model that will predict the sentiment of the text as positive, neutral, or negative. Sentiment Analysis Machine Learning Project Code Please download the sentiment analysis project code from the following link: Sentiment Analysis Project Code Required Libraries: You need to install certain libraries in your system to implement the python sentiment analysis project. The required libraries are: ■ Numpy (pip install numpy) ■ Pandas (pip install pandas) ■ Matplotlib (pip install matplotlib) ■ Natural language Processing toolkit (NLTK) (pip install nltk) ■ Sklearn (pip install sklearn) In this ml project, we will require these libraries. To install this on your system you can use pip installer. So open your command prompt and type pip install numpy, pip install pandas, etc. Now let’s start implementing and understanding Sentiment Analysis by ProjectGurukul:
Follow the steps to effectively understand the process to implement sentiment analysis project: 1.) Import libraries: Basically, we will be importing libraries at the time we require to use it. So in the first step we will import only two libraries that are pandas and nltk. import pandas as pd import nltk 2.) Read the dataset: Using pandas method read_csv() we are going to read the dataset that we have downloaded above: sentiment_data = pd.read_csv('/content/sentiment_analysis.csv') 3.) Analysing Dataset: Now we will analyze our data, sentiment_data.head(10) To check shape of data we will use shape method: sentiment_data.shape 4.) Filtering our data: Now in this step, we drop all the rows whose airline_sentiment_confidence is less than 0.5.
sentiment_df = sentiment_data.drop(sentiment_data[sentiment_data['airline_sentiment_confidence']<0.5] .index, axis= 0) We have almost removed 200 to 300 rows. sentiment_df.shape 5.) Creating X and Y data: Now we are ready to create X and Y data so that we can create the model. So, X consists of a text column. With the help of NLP we are going to find the parts of text which are important. And Y consists of its corresponding sentiment that is positive, negative or neutral. X = sentiment_df['text'] Y = sentiment_df['airline_sentiment'] 6.) Cleaning our text data: In this step, we will clean our text data using NLP. Let’s understand this. ■ First, we will import some libraries in python sentiment analysis project which will be required to clean our text data. from nltk.corpus import stopwords nltk.download('stopwords') import string from nltk.stem import WordNetLemmatizer lemmatizer = WordNetLemmatizer() ■ Initializing stopwords and punctuation so that we can remove them from our text data. stop_words = stopwords.words('english')
punctuations = string.punctuation ■ Import regular expressions- with the help of this we will consider only the text part and remove all the numbers, special characters from the text data. import re nltk.download('wordnet') ■ Now we will first take our text data only and convert it into the lower case using lower() method and also lemmatize our text (lemmatizing basically means to reduce a word to its lower form, for ex: completing is lemmatized to complete) and then store that preprocessed data to clean_data variable. clean_data = [] for i in range(len(X)): text = re.sub('[^a-zA-Z]', ' ',X.iloc[i]) text = text.lower().split() text = [lemmatizer.lemmatize(word) for word in text if (word not in stop_words) and (word not in punctuations)] text = ' '.join(text) clean_data.append(text) Now let’s see that how clean_data look like: clean_data And also let’s see Y data: 7.) Defining sentiments and converting data:
In this step of python sentiment analysis project, we will define sentiments and give them a number corresponding to their index. So that we can create a model which will classify text as positive, negative, or neutral. ■ 0 stands for ‘negative’, ■ 1 stands for ‘neutral’ and, ■ 2 stands for ‘positive’. # defining Sentiments: sentiments = ['negative' , 'neutral', 'positive'] Y = Y.apply(lambda x: sentiments.index(x)) Let’s check Y that it has correctly changed or not: Y.head() 8.) Vectorizing text data: In this step, we will vectorize our text data (X data) using the CountVectorizer method given by sklearn. Vectorizing a text basically means we will put 1 where we find a word and rest with 0 value. from sklearn.feature_extraction.text import CountVectorizer count_vectorizer = CountVectorizer(max_features = 5000, stop_words = ['virginamerica','united']) X_fit = count_vectorizer.fit_transform(clean_data).toarray() X_fit.shape 9.) Create a model: We will be using the MultinomialNB model to classify text sentiments.
MultinomialNB is a type of Naive Bayes Classifier. The Naive Bayes model works very well in text classification problems, as it is based on probabilities, it calculates the probability of sentiment based on the probability of words. from sklearn.naive_bayes import MultinomialNB from sklearn.model_selection import train_test_split model = MultinomialNB() 10.) Splitting of training and testing data: Now we will split our data into training and testing so that we can train our model. X_train, X_test, Y_train, Y_test = train_test_split(X_fit,Y, test_size = 0.3) 11.) Fit the Model: Now we will train our model on training data and check the accuracy. model.fit(X_train,Y_train) 12.) Prediction: Now the last step is to evaluate the model that we have created on test data. y_pred = model.predict(X_test) from sklearn.metrics import classification_report classification = classification_report(Y_test,y_pred) print(classification) Let’s also plot the confusion matrix: from sklearn.metrics import plot_confusion_matrix import matplotlib.pyplot as plt
plot_confusion_matrix(history, X_test, Y_test) plt.show() Summary We have successfully created the machine learning model which is able to predict the sentiment of text whether the text is positive, negative, or neutral. In this python sentiment analysis project, we have learned how to perform operations and data preprocessing on text data using natural language processing and also how to use the nltk library.