Data Analysis Project with Pandas — Step-by-Step Guide (Ted Talks Data)

Last Updated on July 25, 2023 by Editorial Team

Author(s): Fares Sayah

Originally published on Towards AI.

Data Analysis Project Guide — Use Pandas power to get valuable information from your data

The Pandas library is a powerful tool for multiple phases of the data science workflow, including data cleaning, visualization, and exploratory data analysis. However, the size and complexity of the Pandas library make it challenging to discover the best way to accomplish any given task.

In this article, you’ll use pandas to answer questions about multiple real-world datasets. Through each exercise, you’ll learn important data science skills as well as “best practices” for using pandas. By the end of the tutorial, you’ll be more fluent at using pandas to correctly and efficiently answer your own data science questions.

Read the Data

The first step is to get the data and load it to memory. You can download data from this link: TED Talks. We are using Pandas for data manipulation. Matplotlib, Seaborn, and hvPlot for Data Visualization.

After reading the data successfully, we need to check our data. Pandas have a lot of function that allows us to discover our data effectively and quickly. The goal here is to find out more about the data and become a subject matter expert on the dataset you are working with.

In general, we need to know the following questions:

What kind of data do we have, and how do we treat different types?
What’s missing from the data, and how do you deal with it?
How can you add, change or remove features to get more out of your data?

The Shape of the Data

Use ‘.shape’ to see DataFrame shape: rows, columns

(2550, 17)

Pandas Data Types

To check the types of your data, you can use .dtypes and it will return a pandas series of columns associated with there dtype. Pandas have 7 types in general:

object, int64, float64, category, and datetime64 are going to be covered in this article.
bool: True/False values. Can be a NumPy datetime64[ns].
timedelta[ns]: Differences between two datetimes.

comments int64
description object
duration int64
event object
film_date int64
languages int64
main_speaker object
name object
num_speaker int64
published_date int64
ratings object
related_talks object
speaker_occupation object
tags object
title object
url object
views int64
dtype: object

Information About the Data

.info() method prints information about a DataFrame including the index dtype and columns, non-null values, and memory usage.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2550 entries, 0 to 2549
Data columns (total 17 columns):
 # Column Non-Null Count Dtype 
--- ------ -------------- ----- 
 0 comments 2550 non-null int64 
 1 description 2550 non-null object
 2 duration 2550 non-null int64 
 3 event 2550 non-null object
 4 film_date 2550 non-null int64 
 5 languages 2550 non-null int64 
 6 main_speaker 2550 non-null object
 7 name 2550 non-null object
 8 num_speaker 2550 non-null int64 
 9 published_date 2550 non-null int64 
 10 ratings 2550 non-null object
 11 related_talks 2550 non-null object
 12 speaker_occupation 2544 non-null object
 13 tags 2550 non-null object
 14 title 2550 non-null object
 15 url 2550 non-null object
 16 views 2550 non-null int64 
dtypes: int64(7), object(10)
memory usage: 338.8+ KB

Missing Values

Missing data (or missing values) is defined as the data value that is not stored for a variable in the observation of interest. In Machine Learning, we need to handle missing values. There are many types of missing values:

Standard Missing Values: These are missing values that Pandas can detect.
Non-Standard Missing Values: Sometimes it might be the case where there are missing values that have different formats.
Unexpected Missing Values: For example, if our feature is expected to be a string, but there’s a numeric type, then technically, this is also a missing value.

It’s important to understand these different types of missing data from a statistics point of view. The type of missing data will influence how you deal with filling in the missing values.

Sometimes you’ll simply want to delete those rows. Other times you’ll replace them.
A very common way to replace missing values is using a median (for objects) or mean (for numeric values).

But those are weak approaches, some times we need domain knowledge about the data and statistical study to fill in the missing values.

comments 0
description 0
duration 0
event 0
film_date 0
languages 0
main_speaker 0
name 0
num_speaker 0
published_date 0
ratings 0
related_talks 0
speaker_occupation 6
tags 0
title 0
url 0
views 0
dtype: int64

comments 0
description 0
duration 0
event 0
film_date 0
languages 0
main_speaker 0
name 0
num_speaker 0
published_date 0
ratings 0
related_talks 0
speaker_occupation 0
tags 0
title 0
url 0
views 0
dtype: int64

Descriptive Statistics about the Data

.describe() generates descriptive statistics. Descriptive statistics summarize the central tendency, dispersion, and shape of a dataset’s distribution, excluding NaN values.

Analyzes both numeric and object series, as well as DataFrame column sets of mixed data types. The output will vary depending on what is provided. Refer to the notes below for more detail.

Which talks provoke the most online discussion?

From the heatmap, the number of views correlates well with language and comments.

Limitations of this approach

Sub comments (nested comments).
How long has it been online?

To correct this behavior, one solution is to normalize comments by views.

Lessons:

Consider the limitations and biases of your data when analyzing it
Make your results understandable

Visualize the distribution of comments

A line plot is not appropriate here (use it to measure something over time)

Check how many observations we removed from the plot:

(32, 19)

After filtering the data we lose only a small amount of data. This process is called excluding outliers.

Lessons:

Choose your plot type based on the question you are answering and the data type(s) you are working with
Use Pandas one-liners to iterate through plots quickly
Try modifying the plot defaults
Creating plots involves decision-making

Plot the number of talks that took place each year

The event column does not always include the year

1280 TEDGlobal 2012
163 TED2005
1593 TEDGlobal 2013
2027 TEDWomen 2015
197 TED2008
1286 TEDxAustin
1115 TEDxBrussels
1381 TEDxCHUV
159 TED2007
193 TED2007
Name: event, dtype: object

We can’t rely on event Feature, because most of the events don't have a year:

0 1140825600
1 1140825600
2 1140739200
3 1140912000
4 1140566400
Name: film_date, dtype: int64

Results don’t look right:

0 1970-01-01 00:00:01.140825600
1 1970-01-01 00:00:01.140825600
2 1970-01-01 00:00:01.140739200
3 1970-01-01 00:00:01.140912000
4 1970-01-01 00:00:01.140566400
Name: film_date, dtype: datetime64[ns]

Now the results look right:

0 2006-02-25
1 2006-02-25
2 2006-02-24
3 2006-02-26
4 2006-02-22
Name: film_date, dtype: datetime64[ns]

The new column uses the DateTime datatype (this was an automatic conversion):

comments int64
description object
duration int64
event object
film_date int64
languages int64
main_speaker object
name object
num_speaker int64
published_date int64
ratings object
related_talks object
speaker_occupation object
tags object
title object
url object
views int64
comments_per_view float64
views_per_comment float64
film_datetime datetime64[ns]
dtype: object

DateTime columns have convenient attributes under the dt namespace:

0 2006
1 2006
2 2006
3 2006
4 2006
Name: film_datetime, dtype: int64

Count the number of talks each year using value_counts():

2013 270
2011 270
2010 267
2012 267
2016 246
2015 239
2014 237
2009 232
2007 114
2017 98
2008 84
2005 66
2006 50
2003 33
2004 33
2002 27
1998 6
2001 5
1984 1
1983 1
1991 1
1994 1
1990 1
1972 1
Name: film_datetime, dtype: int64

Points are plotted and connected in the order you give them to Pandas:

Need to sort the index before plotting:

Timestamp('2017-08-27 00:00:00')

Lessons:

Read the documentation
Use the DateTime data type for dates and times
Check your work as you go
Consider excluding data if it might not be relevant

What were the “best” events in TED history to attend?

355
274

Count the number of talks (great if you value variety, but they may not be great talks):

TED_2009 199
TED_2010 184
TED_2011 184
TED_2013 183
TED_2012 178
Name: event, dtype: int64

Use views as a proxy for “quality of talk”:

event
AORN Congress 149818.0
Arbejdsglaede Live 971594.0
BBC TV 521974.0
Bowery Poetry Club 676741.0
Business Innovation Factory 304086.0
Name: views, dtype: float64

Find the largest values, but we don’t know how many talks are being averaged:

event
TEDxNorrkoping 6569493.0
TEDxCreativeCoast 8444981.0
TEDxBloomington 9484259.5
TEDxHouston 16140250.5
TEDxPuget Sound 34309432.0
Name: views, dtype: float64

Show the number of talks along with the mean (events with the highest means had only 1 or 2 talks):

Calculate the total views per event:

Lessons:

Think creatively about how you can use the data you have to answer your question
Watch out for small sample sizes

Unpack the rating data

Previously, users could tag talks on the TED website (funny, inspiring, confusing, etc.)

0 [{'id': 7, 'name': 'Funny', 'count': 19645}, {...
1 [{'id': 7, 'name': 'Funny', 'count': 544}, {'i...
2 [{'id': 7, 'name': 'Funny', 'count': 964}, {'i...
3 [{'id': 3, 'name': 'Courageous', 'count': 760}...
4 [{'id': 9, 'name': 'Ingenious', 'count': 3202}...
Name: ratings, dtype: object

Two ways to examine the rating data for the first talk:

"[{'id': 7, 'name': 'Funny', 'count': 19645}, {'id': 1, 'name': 'Beautiful', 'count': 4573}, {'id': 9, 'name': 'Ingenious', 'count': 6073}, {'id': 3, 'name': 'Courageous', 'count': 3253}, {'id': 11, 'name': 'Longwinded', 'count': 387}, {'id': 2, 'name': 'Confusing', 'count': 242}, {'id': 8, 'name': 'Informative', 'count': 7346}, {'id': 22, 'name': 'Fascinating', 'count': 10581}, {'id': 21, 'name': 'Unconvincing', 'count': 300}, {'id': 24, 'name': 'Persuasive', 'count': 10704}, {'id': 23, 'name': 'Jaw-dropping', 'count': 4439}, {'id': 25, 'name': 'OK', 'count': 1174}, {'id': 26, 'name': 'Obnoxious', 'count': 209}, {'id': 10, 'name': 'Inspiring', 'count': 24924}]"

str

Convert this into something useful using Python’s ast module (Abstract Syntax Tree):

list[{'id': 7, 'name': 'Funny', 'count': 19645},
 {'id': 1, 'name': 'Beautiful', 'count': 4573},
 {'id': 9, 'name': 'Ingenious', 'count': 6073},
 {'id': 3, 'name': 'Courageous', 'count': 3253},
 {'id': 11, 'name': 'Longwinded', 'count': 387},
 {'id': 2, 'name': 'Confusing', 'count': 242},
 {'id': 8, 'name': 'Informative', 'count': 7346},
 {'id': 22, 'name': 'Fascinating', 'count': 10581},
 {'id': 21, 'name': 'Unconvincing', 'count': 300},
 {'id': 24, 'name': 'Persuasive', 'count': 10704},
 {'id': 23, 'name': 'Jaw-dropping', 'count': 4439},
 {'id': 25, 'name': 'OK', 'count': 1174},
 {'id': 26, 'name': 'Obnoxious', 'count': 209},
 {'id': 10, 'name': 'Inspiring', 'count': 24924}]list

Define a function to convert an element in the rating Series from
string to list:

Test the function:

[{'id': 7, 'name': 'Funny', 'count': 19645},
 {'id': 1, 'name': 'Beautiful', 'count': 4573},
 {'id': 9, 'name': 'Ingenious', 'count': 6073},
 {'id': 3, 'name': 'Courageous', 'count': 3253},
 {'id': 11, 'name': 'Longwinded', 'count': 387},
 {'id': 2, 'name': 'Confusing', 'count': 242},
 {'id': 8, 'name': 'Informative', 'count': 7346},
 {'id': 22, 'name': 'Fascinating', 'count': 10581},
 {'id': 21, 'name': 'Unconvincing', 'count': 300},
 {'id': 24, 'name': 'Persuasive', 'count': 10704},
 {'id': 23, 'name': 'Jaw-dropping', 'count': 4439},
 {'id': 25, 'name': 'OK', 'count': 1174},
 {'id': 26, 'name': 'Obnoxious', 'count': 209},
 {'id': 10, 'name': 'Inspiring', 'count': 24924}]

Series apply method applies a function to every element in a Series and returns a Series:

0 [{'id': 7, 'name': 'Funny', 'count': 19645}, {...
1 [{'id': 7, 'name': 'Funny', 'count': 544}, {'i...
2 [{'id': 7, 'name': 'Funny', 'count': 964}, {'i...
3 [{'id': 3, 'name': 'Courageous', 'count': 760}...
4 [{'id': 9, 'name': 'Ingenious', 'count': 3202}...
Name: ratings, dtype: object

lambda is a shorter alternative:

0 [{'id': 7, 'name': 'Funny', 'count': 19645}, {...
1 [{'id': 7, 'name': 'Funny', 'count': 544}, {'i...
2 [{'id': 7, 'name': 'Funny', 'count': 964}, {'i...
3 [{'id': 3, 'name': 'Courageous', 'count': 760}...
4 [{'id': 9, 'name': 'Ingenious', 'count': 3202}...
Name: ratings, dtype: object

An even shorter alternative is to apply the function directly (without lambda):

0 [{'id': 7, 'name': 'Funny', 'count': 19645}, {...
1 [{'id': 7, 'name': 'Funny', 'count': 544}, {'i...
2 [{'id': 7, 'name': 'Funny', 'count': 964}, {'i...
3 [{'id': 3, 'name': 'Courageous', 'count': 760}...
4 [{'id': 9, 'name': 'Ingenious', 'count': 3202}...
Name: ratings, dtype: object

[{'id': 7, 'name': 'Funny', 'count': 19645},
 {'id': 1, 'name': 'Beautiful', 'count': 4573},
 {'id': 9, 'name': 'Ingenious', 'count': 6073},
 {'id': 3, 'name': 'Courageous', 'count': 3253},
 {'id': 11, 'name': 'Longwinded', 'count': 387},
 {'id': 2, 'name': 'Confusing', 'count': 242},
 {'id': 8, 'name': 'Informative', 'count': 7346},
 {'id': 22, 'name': 'Fascinating', 'count': 10581},
 {'id': 21, 'name': 'Unconvincing', 'count': 300},
 {'id': 24, 'name': 'Persuasive', 'count': 10704},
 {'id': 23, 'name': 'Jaw-dropping', 'count': 4439},
 {'id': 25, 'name': 'OK', 'count': 1174},
 {'id': 26, 'name': 'Obnoxious', 'count': 209},
 {'id': 10, 'name': 'Inspiring', 'count': 24924}]

listdtype('O')comments int64
description object
duration int64
event object
film_date int64
languages int64
main_speaker object
name object
num_speaker int64
published_date int64
ratings object
related_talks object
speaker_occupation object
tags object
title object
url object
views int64
comments_per_view float64
views_per_comment float64
film_datetime datetime64[ns]
ratings_list object
dtype: object

Lessons:

Pay attention to data types in Pandas
Use apply any time it is necessary

Count the total number of ratings received by each talk

Bonus exercises:

For each talk, calculate the percentage of ratings that were negative
For each talk, calculate the average number of ratings it received per day since it was published

[{'id': 7, 'name': 'Funny', 'count': 19645},
 {'id': 1, 'name': 'Beautiful', 'count': 4573},
 {'id': 9, 'name': 'Ingenious', 'count': 6073},
 {'id': 3, 'name': 'Courageous', 'count': 3253},
 {'id': 11, 'name': 'Longwinded', 'count': 387},
 {'id': 2, 'name': 'Confusing', 'count': 242},
 {'id': 8, 'name': 'Informative', 'count': 7346},
 {'id': 22, 'name': 'Fascinating', 'count': 10581},
 {'id': 21, 'name': 'Unconvincing', 'count': 300},
 {'id': 24, 'name': 'Persuasive', 'count': 10704},
 {'id': 23, 'name': 'Jaw-dropping', 'count': 4439},
 {'id': 25, 'name': 'OK', 'count': 1174},
 {'id': 26, 'name': 'Obnoxious', 'count': 209},
 {'id': 10, 'name': 'Inspiring', 'count': 24924}]

{'id': 7, 'name': 'Funny', 'count': 19645}

[{'id': 7, 'name': 'Funny', 'count': 544},
 {'id': 3, 'name': 'Courageous', 'count': 139},
 {'id': 2, 'name': 'Confusing', 'count': 62},
 {'id': 1, 'name': 'Beautiful', 'count': 58},
 {'id': 21, 'name': 'Unconvincing', 'count': 258},
 {'id': 11, 'name': 'Longwinded', 'count': 113},
 {'id': 8, 'name': 'Informative', 'count': 443},
 {'id': 10, 'name': 'Inspiring', 'count': 413},
 {'id': 22, 'name': 'Fascinating', 'count': 132},
 {'id': 9, 'name': 'Ingenious', 'count': 56},
 {'id': 24, 'name': 'Persuasive', 'count': 268},
 {'id': 23, 'name': 'Jaw-dropping', 'count': 116},
 {'id': 26, 'name': 'Obnoxious', 'count': 131},
 {'id': 25, 'name': 'OK', 'count': 203}]

0 93850
1 2936
2 2824
3 3728
4 25620
Name: ratings_list, dtype: int64

0 93850
1 2936
2 2824
3 3728
4 25620
Name: ratings_list, dtype: int64

0 93850
1 2936
2 2824
3 3728
4 25620
Name: ratings_list, dtype: int64

count 2550.000000
mean 2436.408235
std 4226.795631
min 68.000000
25% 870.750000
50% 1452.500000
75% 2506.750000
max 93850.000000
Name: num_ratings, dtype: float64

Lessons:

Write your code in small chunks, and check your work as you go
Lambda is best for simple functions

Which occupations deliver the funniest TED talks on average?

Bonus exercises:

for each talk, calculate the most frequent rating
for each talk, clean the occupation data so that there’s only one occupation per talk

Step 1: Count the number of funny ratings

0 [{'id': 7, 'name': 'Funny', 'count': 19645}, {...
1 [{'id': 7, 'name': 'Funny', 'count': 544}, {'i...
2 [{'id': 7, 'name': 'Funny', 'count': 964}, {'i...
3 [{'id': 3, 'name': 'Courageous', 'count': 760}...
4 [{'id': 9, 'name': 'Ingenious', 'count': 3202}...
Name: ratings_list, dtype: object

True 2550
Name: ratings, dtype: int64

[{'id': 3, 'name': 'Courageous', 'count': 760},
 {'id': 1, 'name': 'Beautiful', 'count': 291},
 {'id': 2, 'name': 'Confusing', 'count': 32},
 {'id': 7, 'name': 'Funny', 'count': 59},
 {'id': 9, 'name': 'Ingenious', 'count': 105},
 {'id': 21, 'name': 'Unconvincing', 'count': 36},
 {'id': 11, 'name': 'Longwinded', 'count': 53},
 {'id': 8, 'name': 'Informative', 'count': 380},
 {'id': 10, 'name': 'Inspiring', 'count': 1070},
 {'id': 22, 'name': 'Fascinating', 'count': 132},
 {'id': 24, 'name': 'Persuasive', 'count': 460},
 {'id': 23, 'name': 'Jaw-dropping', 'count': 230},
 {'id': 26, 'name': 'Obnoxious', 'count': 35},
 {'id': 25, 'name': 'OK', 'count': 85}]

0 19645
1 544
2 964
3 59
4 1390
Name: funny_ratings, dtype: int64

Step 2: Calculate the percentage of funny ratings

1849 Science humorist
337 Comedian
124 Performance poet, multimedia artist
315 Expert
1168 Social energy entrepreneur
1468 Ornithologist
595 Comedian, voice artist
1534 Cartoon editor
97 Satirist
2297 Actor, writer
568 Comedian
675 Data scientist
21 Humorist, web artist
194 Jugglers
2273 Comedian and writer
2114 Comedian and writer
173 Investor
747 Comedian
1398 Comedian
685 Actor, comedian, playwright
Name: speaker_occupation, dtype: object

2549 Game designer
1612 Biologist
612 Sculptor
998 Penguin expert
593 Engineer
284 Space activist
1041 Biomedical engineer
1618 Spinal cord researcher
2132 Computational geneticist
442 Sculptor
426 Author, thinker
458 Educator
2437 Environmental engineer
1491 Photojournalist
1893 Forensic anthropologist
783 Marine biologist
195 Kenyan MP
772 HIV/AIDS fighter
788 Building activist
936 Neuroengineer
Name: speaker_occupation, dtype: object

Step 3: Analyze the funny rate by occupation

speaker_occupation
Comedian 0.512457
Actor, writer 0.515152
Actor, comedian, playwright 0.558107
Jugglers 0.566828
Comedian and writer 0.602085
Name: funny_rate, dtype: float64

count 2550
unique 1458
top Writer
freq 51
Name: speaker_occupation, dtype: object

Step 4: Focus on occupations that are well-represented in the data

Writer 51
Designer 34
Artist 34
Journalist 33
Entrepreneur 31
 ..
Neuroanatomist 1
Foreign policy strategist 1
Mythologist 1
Designer, illustrator, typographer 1
Computational geneticist 1
Name: speaker_occupation, Length: 1458, dtype: int64

pandas.core.series.Series

Index(['Writer', 'Designer', 'Artist', 'Journalist', 'Entrepreneur',
 'Architect', 'Inventor', 'Psychologist', 'Photographer', 'Filmmaker', 'Educator', 'Economist', 'Author', 'Neuroscientist', 'Philosopher', 'Roboticist', 'Biologist', 'Physicist', 'Marine biologist', 'Musician', 'Technologist', 'Activist', 'Global health expert; data visionary', 'Astronomer', 'Poet', 'Oceanographer', 'Graphic designer', 'Philanthropist', 'Singer/songwriter', 'Behavioral economist',
 'Historian', 'Social psychologist', 'Novelist', 'Futurist', 'Engineer',
 'Computer scientist', 'Astrophysicist', 'Mathematician', 'Comedian',
 'Photojournalist', 'Reporter', 'Evolutionary biologist',
 'Techno-illusionist', 'Writer, activist', 'Legal activist',
 'Social entrepreneur', 'Performance poet, multimedia artist',
 'Singer-songwriter', 'Climate advocate', 'Producer', 'Paleontologist',
 'Environmentalist, futurist', 'Science writer', 'Sound consultant',
 'Investor and advocate for moral leadership', 'Game designer',
 'Cartoonist', 'Tech visionary', 'Sculptor', 'Social Media Theorist',
 'Surgeon', 'Data scientist', 'Physician', 'Researcher', 'Chemist',
 'Musician, activist', 'Violinist', 'Chef'],
 dtype='object')

Step 5: Re-analyze the funny rate by occupation (for top occupations only)

(792, 24)

Lessons:

Check your assumptions about your data
Check whether your results are reasonable
Take advantage of the fact that pandas operations often output a DataFrame or a Series
Watch out for small sample sizes
Consider the impact of missing data
Data scientists are hilarious

Links and Resources:

Link to data used in this tutorial: TED Talks
Link to Full Notebook
Part II: Comprehensive Data Analysis with Pandas — Part II
This blog follows the talk from PyCon 2019 of Kevin Markham. PyCon: Full Conference. Kevin Markham Youtube channel: Data School

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Data Analysis Project with Pandas — Step-by-Step Guide (Ted Talks Data)

Author(s): Fares Sayah

Data Analysis Project Guide — Use Pandas power to get valuable information from your data

Table of Contents:

Read the Data

The Shape of the Data

Pandas Data Types

Information About the Data

Missing Values

Descriptive Statistics about the Data

Which talks provoke the most online discussion?

Limitations of this approach

Lessons:

Visualize the distribution of comments

Lessons:

Plot the number of talks that took place each year

Lessons:

What were the “best” events in TED history to attend?

Lessons:

Unpack the rating data

Lessons:

Count the total number of ratings received by each talk

Lessons:

Which occupations deliver the funniest TED talks on average?

Step 1: Count the number of funny ratings

Step 2: Calculate the percentage of funny ratings

Step 3: Analyze the funny rate by occupation

Step 4: Focus on occupations that are well-represented in the data

Step 5: Re-analyze the funny rate by occupation (for top occupations only)

Lessons:

Links and Resources:

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement

Subscribe to our AI newsletter!

🔥 Recommended Articles 🔥