Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Sentiment Analysis with Logistic Regression
Natural Language Processing

Sentiment Analysis with Logistic Regression

Last Updated on February 8, 2021 by Editorial Team

Author(s): Buse Yaren Tekin

Natural Language Processing

Photo by Markus Spiske onΒ Unsplash

πŸ“Œ Logistic Regression is a classification that serves to solve the binary classification problem. The result is usually defined as 0 or 1 in the models with a double situation.

Image by Wikipedia [1]

🩸Estimation is made by applying binary classification with Logistic Regression on the data allocated to training and test data in a data set below. First of all, Standardization for pre-processing will be applied, then training data will be trained with fit( ) and then it will be used to estimate test data with the predict( ) method.

Training of trainingΒ data
Using test data toΒ predict

What is Sentiment Analysis with Logistic Regression?

Sensitivity Analysis is a method used to judge someone’s feelings or make sense of their feelings according to a certain thing. It is basically a text processing process and aims to determine the class that the given text wants to express emotionally.

✨ It is the name given to mining ideas over the frequency (frequencies) of words such as word number, noun, adjective, adverb or verb while deriving ideas for the purpose from the texts. (Word2vec, TF / IDF)

✨ In frequency-based idea mining, first of all, noun word groups are found and classified according to their length, usage requirements, and positive-negative polarity.

πŸ’£ The dataset I use in this project is β€˜sentiment140’ data provided free of charge via Twitter API. Below I will inform you about the content and featureΒ classes.

πŸ“Œ The data set created using the Twitter API includes 1.600.000 tweets.

Data set attributes

target: Tweet polarity (0-negative, 2-neutral, 4-positive)
ids: Tweet id
date: Date of the tweet
flag: Query. If there is no query, it is NO_QUERY.
user: User who posted
text: Tweet text (content)

Data Set Feature Classes Representation

Obtaining the Data Set with Pandas 🐼

Reading theΒ data

βœ”οΈ The reason for writing the encoding field used in the read_csv( ) command is that the data is converted into ASCII based 1 byte Latin alphabet characters.

βœ”οΈ There are 3 types of classes to be used in sentiment analysis: negative, neutral and positive. The key-value values in the Dataframe, for which the target property is specified, as 0, 2 and 4 tags below, are reduced to two in logistic regression. Because it works with binary classification logic, the neutral class isΒ ignored.

Label preprocessing

The sum of positive and negative classes in the data set is 800000 +Β 800000.

Neutral classΒ ignoring

Stop WordsΒ βœ‹πŸ»

The process of converting data into what the computer understands is called preprocessing. One of the main forms of preprocessing is filtering out unnecessary data. Words not used in NLP are called β€˜stopΒ words’.

Stop words are common words that a search engine is programmed to ignore both in indexes for search and when retrieving.

Stop words detection withΒ NLTK
English StopΒ Words

Reading Words with NLTK (Stemming)

πŸ“Œ The regular expression library (re) needs to be introduced prior to the stemming process. Then, with split( ), the words in the sentences will be divided into parts and with the sub( ) command, the regular patterns we call regular expression will be searched for the number of iterations given.

Stemming Example, Conjugated Words Express the SameΒ Meaning

Stop Words Filtering with NLTKΒ πŸ₯…

Stop words should be removed with data clean by searching regular expressions in the patterns given below. Certain regular expression patterns are searched in the string provided with the Substring module used by the Regular expression library. The patterns of the sub-sequences are searched with the given replΒ value.

Sub module syntaxΒ view
Stop Words Filtering withΒ NLTK

Data Preprocessing

  1. Remove URLs to remove unwanted URLs like http, https, or something like these inΒ text.
  2. {Β ,Β .Β :Β ; } Remove punctuation marks suchΒ as.
  3. Tokenization: Separating and classifying parts of a string of input characters. It is to separate each word in the sentence.
  4. In the text, a, most and, etc. Remove words such as stop words as they are. Because these words do not contain useful information.
Regex substring transformations
Regex pathΒ rules

According to Regex rules, tweets are converted to lowercase with lower( ) as shown below. The process will be carried out in this way. Then, the operation in the function I showed in one line above is done in the stripped and tokens variables below. Words form sentences by combining them with join( ) with spaces between them. In the Negations variable, it is searched with regex rules by correcting the negative cases as a note instead of the word n’t in the substring search.

Text changes and regex corrections

Splitting the Data Set into Training and TestΒ Data

The data in this example were fragmented according to the random_state ratio and the specified criteria (TRAIN_SIZE = 0.75) and the size of the training data was determined to be 1200000 (75% of the data set). After these data were trained, the test set was determined as 400000 (25% of the data set) to beΒ tested.

Train and TestΒ Dataset

Target Class (Tweet Polarity)

Below are some users’ tweets. For example, let’s examine the tweet of the user with the id 1994986495.

I am so bored. I don’t feel today. I don’t knowΒ why…

As you can see, the target class is specified as 0 because it is a tweet with negativeΒ content.

Negative (0) Detection on Tweet ☹️

If we need to examine the tweet of the user with the id 1960159696;

What Beautiful Friday!’ Happy FridayΒ Yaa!!!

Since the content is a positive tweet when detected, it is specified as target classΒ 4.

Positive (4) Detection On TweetΒ πŸ™‚

The target class of the data, that is, sentiment data, has been created. Apart from this, by defining negative and positive data without decode_map, these data can also be analyzed.

Negative and Positive Data Collection

It has been very useful for me. I hope it was very productive for you as well. Have a nice dayΒ πŸ˜‡

References

  1. https://de.wikipedia.org/wiki/Datei:Sigmoid-function-2.svg
  2. https://www.geeksforgeeks.org/python-stemming-words-with-nltk/
  3. https://www.geeksforgeeks.org/removing-stop-words-nltk-python/
  4. https://kavita-ganesan.com/news-classifier-with-logistic-regression-in-python/
  5. https://www.kaggle.com/kazanova/sentiment140
  6. https://machinelearningmastery.com/logistic-regression-for-machine-learning/
  7. Conference Paper: Using logistic regression method to classify tweets into the selected topics, Liza Wikarsa, Rinaldo Turang,October 2016.


Sentiment Analysis with Logistic Regression was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Feedback ↓