Analyzing The Presidential Debates
Last Updated on January 6, 2023 by Editorial Team
Author(s): Lawrence Alaso Krukrubo
Data Science
Exploring Sentiments, Key-Phrase-Extraction, and InferencesΒ β¦
2020 has been one βhell-of-a-yearβ, and weβre about the eleventhΒ month.
Itβs that time again for Americans to take to theΒ polls.
If youβve lived long enough, you recognize the patternsβ¦
Each opposing political side, shades the other, scandals and leaks may pop, shortcomings are magnified, critics make the news, promises are doled out βrather–convincinglyβ and thereβs an overwhelming sense of βnationality and togethernessβ touted by bothΒ sidesβ¦
But for the most part, weβre not buying the BS! And often, we simply choose the βlesser of the two evilsβ, because candidly the one is not significantly better than theΒ other.
SoΒ today, Iβm going to analyze the presidential debates of President Trump and Vice-President Bidenβ¦
Disclaimer:
The entire analysis is done by the Author, using scientific methods that do not assume faultlessness. This is a personal project devoid of any political affiliations, sentiments or undertones. The inferences expressed from this scientific process are entirely the Authorβs, based on theΒ data.
Intro:
Trump and Biden faced-off twice.
- The first debate was on September 29, 2020. It was moderated by Chris Wallace of FoxΒ News
- The second debate was originally scheduled for October 15th, but was cancelled due to Trumpβs bout of COVID19, and held a week later. After his βrather-theatrical-and-spectacular-recoveryβ. This debate was moderated by Kristen Welker of NBCΒ News.
1. TheΒ Data:
After watching both debates, as a Data Professional, I got really curious, wondering, what I could learn from analyzing the responses of these two Contestants.
Itβs possible I may find something interesting from digging a little deeper into the way they answered questions bordering on the lives of millions of Americansβ¦
That was my only motivation βCuriosityβ, so I set out looking for theΒ data.
Luckily I stumbled on rev.com, they had up the entire debates so I employed my data skills, scraped it off the website to a Jupyter notebook. That was the easy part. The hard part was preparing the data for each specific format required by the different libraries and tools for my analysis.
I scraped the website with the method I definedΒ belowβ¦
2. Gentlemen You Have TwoΒ Minutes:
If you watched the first debate, youβd have noticed it was a hard task for Chris to keep both men within the 2-minute limit. Trump made it particularly hard, and quite often, there were exchanges between Trump andΒ Biden.
Letβs look at what the dataΒ saysβ¦
Of the total responses during the first debate, Trump had 56%, while Biden had 44% and it got worse for Joe during the second debate, as Trump dominated the responses further to 60%, leaving 40% toΒ Joe.
Trump spoke 314 times in debate one and 193 times in debate two.
Biden spoke 250 times in debate one and 131 times in debate two.
Note to Self: Trump may not be the brightest, but he sure gets his voiceΒ heardβ¦
2. Lexical-Diversity:
This simply means the cardinality or variety of words used in a conversation or document. In this case, it checks the number of unique words as a percentage of total words spoken by Trump andΒ Biden.
The data shows that Joe Biden is more creative with his words. Heβs lexically-richer than Donald Trump, even though he consistently speaks fewer words thanΒ Trump.
Biden speaks 7,936 total words with 2,020 unique words and a lexical-diversity score of 25%
Trump speaks 9,209 total words with 1,894 unique words and a lexical-diversity score of 21%
Note to self: Biden may be few on words, but heβs got a heart of creativityβ¦
3. TFIDF:
Term-Frequency-Inverse-Document-Frequency is arguably the most popular text processing algorithm. It tells us the importance of certain words to a document in comparison to other documents.
Simply put, TF-IDF shows the relative importance of a word or words to a document, given a collection of documents.
So, in this case, I choose to lemmatize the words of Trump and Biden, rather than stemmingΒ themβ¦
def lemmatize_words(word_list):
lemma = WordNetLemmatizer()
lemmatized = [lemma.lemmatize(i) for i in word_list]
return lemmatized
Then I tokenize the words, remove punctuations and remove stopwordsβ¦
Then I build a simple TFIDF class to compute the TFIDF scores for bothΒ men.
So letβs see the words peculiar to Donald Trump using a word-cloudβ¦
Itβs pretty interesting or βuninterestingβ, that Trump has on his top-10 TFIDF, words like βagoβ, βbuiltβ, βChrisβ which is the Moderatorβs name, as we can see he made it a hard task for Chris. Others are βdisasterβ, βcalledβ, βcageβ, βnobodyββ¦.
Letβs see for JoeΒ Bidenβ¦
With words like βcreateβ, βfederalβ, βseriousβ, βAmericansβ, βfolkβ, βsituationββ¦ It appears, Biden, put in more effort to his debate, than Team-Trump, in terms of structure andΒ theme.
4. Some Questions Asked:
We have to commend Chris Wallace and Kristen Welker for being great moderators during theΒ debates.
In the first debate, Chris asked some interesting questions, some of which borderedΒ onβ¦
- Supreme Court
- Obama-Care
- Economy
- Race /Β Justice
- Law Enforcement
- Election Integrity
- COVID
And during the second debate, Kristen held it down with questions onβ¦
- COVID
- National-Security
- America / American-Families
- Minimum-Wage
- Immigration
- Race / Black-Lives-Matter
- Leadership
5. Some Answers and Inferences:
In this section, I shall analyze Trumpβs and Bidenβs responses to questions on three important topics:-
- Jobs, Wages andΒ Taxes
- Racism
- The USΒ Economy
The analysis for this section is quite interesting, involving a few libraries andΒ tools
- For Sentiments-Analysis: AzureML Text-Analytics-Client SDK forΒ python
- For Key-Phrase Extraction: AzureML Text-Analytics-Client SDK forΒ python
- For Parts-Of-Speech-Tagging: spaCY
- For Visualization: Pywaffle, Matplotlib, Seaborn
After signing up on the Microsoft azureML portal and obtaining my key and endpoints, I created two methods for sentiments analysis and key-phrase extraction.
Next, I define the method for extracting the Parts-Of-Speech(POS) tags, using the spaCY library. This is really important in understanding how Trump and Biden often construct their sentences.
At this point, Iβve defined my work structure, now I need a couple of helper functions to process the debates into required formats and to find sentences that match myΒ queries.
The first helper function is a search function. Such that given query-words like βJobsβ, βwagesβ, it would search through Trumpβs and Bidenβs corpus respectively, to extract sentences containing these queryΒ wordsβ¦
The others are a function to convert the sentiments received from the AzureML Client to a DataFrame and another to apply the above methods together on a corpus to return a DataFrame with all sentiments and key-phrases intact plus a dictionary of overall sentiments scores.
With just a couple of extra plotting functions, weβre good toΒ go!
A. Trump and Biden on Jobs/Wages/Taxes:
Trump responds with 93 sentences with an overall sentiment score of 21% positive, 72% negative and 7%Β neutral.
Biden responds with 127 sentences with an overall sentiment score of 33% positive, 60.3% negative and 6.7%Β neutral.
In both Pie-charts above, we can see the huge red portions indicating negative sentiments.
A2: Note that in a debate, negative sentiments should never be taken at face value, but should be explored to understand the context. This can be done by exploring the sentences and key-phrases extracted. For example, Biden may start a sentence by criticizing Trumpβs approach severely, inorder to buttress his point. But doing so will cause the sentiments-analysis-client to record that sentence as overly negative. Therefore, negative-sentiments may only be taken at face-value in a review/feedback session, where negativity may indicate dissatisfaction or unhappy customers.
Given A2 above, Trumpβs sentiments score is still kinda unexpectedβ¦We would expect him to paint a good picture of the work heβs been doing if he believes heβs been doing good work. I mean, itβs expected for Biden to criticize Trump, but since Trump is the sitting President, in charge of the present Government, itβs expected that his responses be more positive.
Letβs see a word-cloud of Trumps key-phrases on Jobs/Wages/Taxes
Trump talks about βCountry, job, tax, companies, taxes, depressionββ¦
Letβs see a few of his positive-sentiments responses on Jobs/Taxes/Wages…
Trump talks about βhelping small business by raising the minimum wageβ, plus βbeing on the road to successβ, amongst other things. He also responds to the question of paying $750 taxes as untrue, saying he paid millions in taxes. When challenged by Biden for exploiting the tax-bill, he claimed the bill was passed by Biden and it only gave βcertain individualsβ the privileges for depreciation and taxΒ credits.
And for Trumpβs negative-sentiment responsesβ¦
For the negatives, Trump talks about people dying, committing suicide and losing their jobs. Saying there are depression, alcohol and drugs at a level nobodyβs seen before, and thatβs why he wants to open up the schools andΒ economy.
Letβs see the word-cloud of Bidenβs key-phrases on Jobs/Wages/Taxes
Biden talks about βtax, job, people, millions, fact, economy, significantββ¦
Letβs see a few of Bidenβs positive-sentiments replies on Jobs/Taxes/Wagesβ¦
Biden talks about creating millions of jobs, investing in 50,000 charging stations on highways so as to own the electric car market of the future. He talks about taking 4 million existing buildings and 2 million existing homes and retrofit them so they donβt leak as much energy, saving hundreds of millions of barrels of oil in the process and creating millions ofΒ jobsβ¦
On Bidenβs negative sentiments responsesβ¦
Here he criticizes the Trump administration saying people who have lost their jobs have been those on the front-lines. Also, that Trump has almost half the states in America with a significant increase in COVID deaths, because he rushed to open theΒ economyβ¦
Generally, Bidenβs negative sentiments scores come from his criticism of Trumps administration, which is expected. Trumpβs negative sentiments are a mix of sour remarks and unfriendly remarks at Biden, Obama and Hillary Clintonβ¦ He called Hillary crooked and a disgrace.
Letβs see the Parts-Of-Speech tags on for both Trump andΒ Biden.
Bigger bubbles represent the most frequent part-of-speech tagsΒ used.
B. Trump and Biden onΒ Racism:
Trump never said the word βRacismβ during the debates. He called Biden a Racist though and said people accuse him(Trump) of being a Racist, but theyβreΒ wrongβ¦
Trump responds with 47 sentences with an overall sentiment score of 10% positive, 87% negative and 3%Β neutral.
Biden responds with 89 sentences with an overall sentiment score of 27.5% positive, 67% negative and 5.5%Β neutral.
Trumpβs sentences again appear overly negative at 87%, while Bidenβs are negative atΒ 67%
Letβs see a word-cloud of Trumps Key-phrases used in describing Racism
Trump uses terms like βpeople, person, horrible, country, china, black, racist, terrible..β
For some positive-sentiments responses fromΒ Trumpβ¦
And for some negative-sentiments responses fromΒ Trumpβ¦
Trump calls Biden a racist, calls Hillary Clinton crooked and says the first time he heard about Black-Lives-Matter, they were chanting βpigs in a blanketβ and βfry them like baconβ, at the police and Trump says, βthatβs a horrible thingβ…
Then Trump goes on to say heβs the least racist person in the room and that heβs been taking care of Black colleges and universities.
Note to self: Trump finds it hard to address racism constructively. Often he thinks itβs about him, he doesnβt realize itβs about the entire AmericanΒ system
Letβs see the Racism word-cloud for JoeΒ Bidenβ¦
Here we have Biden using words like βpeople, president, character, racist, racism, suburbsββ¦ To tackleΒ racism.
Some of Bidenβs positive-sentiments responses areβ¦
On his positives, Biden talks about how most people donβt wanna hurt nobody and how heβs going to provide for economic opportunities, better education, better health-care and educationβ¦
And while whipping negative sentiments, Biden talks likeΒ thisβ¦
Biden reminds Trump that when George Floyd was killed, he asked the military to use tear-gas on peaceful protesters at the White-house so that Trump could pose at the church with a Bible. Biden states thereβs systemic racism in America, he calls Trump a racist and reminds him that itβs not 1950 noΒ moreβ¦
Note to self: As a Blackman, Iβm happy that Biden openly agrees that thereβs systemic racism in Americaβ¦ This assertion is the only true route to a solution.
Now, letβs see the Parts-Of-Speech-Tags, for Trump and Biden onΒ Racismβ¦
C. Trump and Biden on The USΒ Economy:
Trump responds with 44 sentences with an overall sentiment score of 16% positive, 80% negative and 4%Β neutral.
Biden responds with 56 sentences with an overall sentiment score of 45% positive, 50% negative and 5%Β neutral.
And for the βthird-time-runningβ, Trump seems overly negative with his responses on The USΒ Economyβ¦
Letβs see a word-cloud of Trumps Key-phrases about theΒ Economy
Trump uses terms like βgreatest economy in history, country, china, administration, spike, massive,Β worldβ¦β
Letβs see some of Trumpβs responses with positive-sentiments,
On his positives, Trump says Due to COVID he had to close βThe greatest economy of the history of our countryβ. Which by the way is being built again and itβs going up so fast. He ends with saying they had the lowest unemployment numbers before the pandemic.
Letβs see some of Trumpβs responses with negative-sentiments,
Trump talks about the negative effect of closing down the economy because of the βChina-plagueβ. He accuses Biden of planning to shut down the economy again. He said if not for his efforts, there'd be 2.2 million dead Americans to the virus and not the currentΒ 220kβ¦
Letβs see the word-cloud of Bidenβs Key-phrases about theΒ Economy.
Biden talks about βeconomy, jobs, fact, people, energy, covid, number,Β Putinβ¦β
Letβs see some of his remarks with positive-sentiments about theΒ Economy
Biden talks repeatedly about creating millions of new jobs by making sure the economy is being run, moved and motivated by clean energy. He talks specifically about curbing energy leaks and saving millions of barrels of oil, which leads to significantly newΒ jobs.
On Bidenβs negative-sentiments responses about theΒ Economyβ¦
From his negative-sentiment responses, Biden talks to the families whoβve lost loved ones to the pandemic. He challenges Trump that he canβt fix the economy except he first fixes the pandemic. He mentions systemic racism affecting the US economy. He accuses Trump of mismanaging the economy, stating the Obama administration handed him a booming economy which heβsΒ blown.
Finally, for this section, letβs see the bubble-plot of the Parts-Of-Speech tags for Trump and Biden on The USΒ Economy.
6. Bayesian Inference:
So, our task here is to find the conditional probability (P)of Trump and Biden mentioning the words we care most about, given theΒ debates.
We will build a Naive-Bayes classifier from scratch and use it to tell the conditional likelihood of Trump and Biden saying the words we care mostΒ about.
This simply means that the Conditional P of event A, given event B is the Conditional P of event B, given event A, multiplied by the Marginal P of event A, all these divided by the Marginal P of event B (which is actually the Total P of event B occurring atΒ all).
First, letβs define the prior, this is simply the P of Trump and Biden participating in the debates. I say itβs 50%Β each.
p_trump_speech = 0.5
p_biden_speech = 0.5
Now, I get a list of some of the words we care about (some may beΒ stemmed)
['job','wage','tax','raci','race','economy','drugs','covid',
'pandemic','vaccine','virus','health','care','dr','doc','citizen',
'america','black','african','white','latin','hispanic','asian',
'minorit','immigra']
Next, I define a function that computes the individual conditional P of Trump and Biden saying each word, given the debates. It returns a DataFrame with theseΒ intact.
So I get the DataFrame, scale it up uniformly by multiplying each value by some factors of 10 and then I normalize the values and it looks likeΒ thisβ¦
Finally, I define a Bayes-Inference method for computing the conditional probability of Trump and Biden given theseΒ words.
So I get 46.5% for Trump and 53.5% forΒ Biden
So from these debates and given the topics we care about, whoβs more likely to discuss themβ¦ Hopefully, address them and proffer solutions? Bayes Rule says Biden is more likely, and the margin is tight 53.5%βββ46.5% = 7% in favor of JoeΒ Bidenβ¦
This is by no means a prediction of the result of the election nor a means to influence voter decisions, itβs just my opinion inferred solely from the Presidential debates.
But of course, we know thereβs more to life, to America than just twoΒ debates.
God Bless America, God Bless Africa, God Bless TheΒ Worldβ¦
Cheers!!
About Me:
Lawrence is a Data Specialist at Tech Layer, passionate about fair and explainable AI and Data Science. I believe that sharing knowledge and experiences is the best way to learn. I hold both the Data Science Professional and Advanced Data Science Professional certifications from IBM and the IBM Data Science Explainability badge. I have conducted several projects using ML and DL libraries, I love to code up my functions as much as possible. Finally, I never stop learning and experimenting and yes, I have written several highly recommended articles.
Feel free to find meΒ on:-
Analyzing The Presidential Debates was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
Published via Towards AI