Python Prior Machine Learning Part 2 & Data Analysis
Last Updated on January 6, 2023 by Editorial Team
Last Updated on July 16, 2022 by Editorial Team
Author(s): Gencay I.
Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.
Machine Learning Prior Part 2 & DataΒ Analysis
Data Frame Analysis withΒ Python
Content Table
Β· Introduction
β Installation
Β· How to gather your data?
β Example
Β· How long your data frame is? What are the column data types? How can I look at a little bit of my data?
β Info
β Shape
β Sample
β Head
β Tail
β Describe
β Value Counts
Β· How to select your pre-defined row?
Β· How to select multiple columns?
β First Two Columns
β Select Columns with Name
β Select Column with their Indexes
Β· How can I sort the values?
Β· How can I look at the mean/standard deviation/max of one column per its categories?
Β· How can I drop the NA Values?
Β· Conclusion
Introduction
Hi from another Machine Learning Tutorial. I want to explain this to you guys briefly here and really think about that, how can I explain it really briefly? Reading too many articles may have helped me. I want to explain to you guys the pandas library with questions and theirΒ answers.
Installation
Now let's begin with the installation process.
Here is the main page of the panda'sΒ library.
Pip or conda, this will depend on yourΒ set-up.
pip install pandas
conda install pandas
Now it's time to import yourΒ package.
import pandas as pd
How to gather yourΒ data?
Now it is time to download yourΒ Data.
CSV is mostly used file type when you will deal withΒ pandas.
url = " "
col =
df = pd.read_csv(ββ)
- The URL you will download yourΒ data.
- The column you want to select toΒ see.
- Define your data frame asΒ df.
Here is the documentation of this method, and you can see the following codes.
Example
Now let's look up real-life examples.
Iris data set is really famous one, you can download it by using the sklearn datasets module or seaborn or viaΒ URL.
This is perhaps the best known database to be found in the pattern recognition literature. Fisherβs paper is a classic in the field and is referenced frequently to this day. (See Duda & Hart, for example.) The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.
Predicted attribute: class of irisΒ plant.
Here are the remaining details of thisΒ Dataset.
Now, let's implement our codes in thatΒ Dataset;
How long your data frame is? What are the column data types? How can I look at a little bit of myΒ data?
Info
It will give your column dataΒ types.
df.info()
Your column data types.
Shape
It will give the dimension from your Dataframe.
df.shape()
Shape of your df.
Sample
It will give βnβ random samples from your Dataframe.
df.sample(5)
5 random samples of your df
Head
Looking first βnβ rows of your DataΒ frame.
df.head(5)
Looking first 5 rows of your df.
Tail
Looking at the last βnβ rows of your DataΒ frame.
df.tail(5)
Looking last 5 rows of your df.
Describe
Shows a summary of numerical features.
df.describe()
Shows a summary of a numerical features.
Value Counts
Looking at your categorical columnΒ types.
df["Column"].value_counts()
Looking this values data types.
How to select your pre-defined row?
By using the locΒ method.
Now you want to see Iris-virginica class and Iris-virginica classΒ only.
In addition to that, if you want your sepal length to be bigger than five and petal length to be smaller than five, then your code will be likeΒ that;
How to select multipleΒ columns?
First TwoΒ Columns
Now first β:β means all rows, and 0:2 means start from the first column, end from the third column but do not select the thirdΒ one.
Select Columns withΒ Name
By using two brackets.
Select Column with theirΒ Indexes
Selecting the first and third columns by using the indexΒ method;
How can I sort theΒ values?
- by = The column you want toΒ sort
If you want that order to be different, then you should add the following argument:
For more, visitΒ here
How can I look at the mean/standard deviation/max of one column per its categories?
Now, if you want to be a good programmer, you should start reading documents today.
Here is the explanation;
A group by operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on theseΒ groups.
You can look up the arguments of this method, visit here and start reading documents from thisΒ library.
How can I drop the NAΒ Values?
Now there are too many approaches to doΒ that.
You can fill the mean of the column to the NA Values if your dataset is small and you do not want to lose yourΒ data.
df.dropna()
Now, I can not give a real-life explanation to you because my dataset does not contain NA values, however, as I mentioned earlier, it will be good for you to read library documents.
Here you can find other examples here, official document.
Conclusion
I try to be brief as much as IΒ can.
Although there are too many other methods that may have helped you along the machine learning journey, I think this prior knowledge would be okay to launch your first Machine LearningΒ Model.
In addition to all of these, thank you for your support of my previous articles, your reactions really motivate me to keep writing tutorials and articles.
If you want to be noticed in my upcoming articles via e-mail, hereΒ ;
Get an email whenever Gencay I. publishes.
I actually mentioned to you guys before about my preparation for E-Book, in this one, I will plan to explain to you guys all concepts in detail, not briefly this time, and with real-life explanations and datasets.
Machine learning is the last invention that humanity will ever need to make.β NickΒ Bostrom
Thanks.
Python Prior Machine Learning Part 2 & Data Analysis was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
Join thousands of data leaders on the AI newsletter. Itβs free, we donβt spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI