Python Prior Machine Learning Part 2 & Data Analysis
Last Updated on July 16, 2022 by Editorial Team
Author(s): Gencay I.
Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.
Machine Learning Prior Part 2 & Data Analysis
Data Frame Analysis with Python
· How to gather your data?
· How long your data frame is? What are the column data types? How can I look at a little bit of my data?
∘ Value Counts
· How to select your pre-defined row?
· How to select multiple columns?
∘ First Two Columns
∘ Select Columns with Name
∘ Select Column with their Indexes
· How can I sort the values?
· How can I look at the mean/standard deviation/max of one column per its categories?
· How can I drop the NA Values?
Hi from another Machine Learning Tutorial. I want to explain this to you guys briefly here and really think about that, how can I explain it really briefly? Reading too many articles may have helped me. I want to explain to you guys the pandas library with questions and their answers.
Now let's begin with the installation process.
Here is the main page of the panda's library.
Pip or conda, this will depend on your set-up.
pip install pandas
conda install pandas
Now it's time to import your package.
import pandas as pd
How to gather your data?
Now it is time to download your Data.
CSV is mostly used file type when you will deal with pandas.
url = " "
df = pd.read_csv(“”)
- The URL you will download your data.
- The column you want to select to see.
- Define your data frame as df.
Here is the documentation of this method, and you can see the following codes.
Now let's look up real-life examples.
Iris data set is really famous one, you can download it by using the sklearn datasets module or seaborn or via URL.
This is perhaps the best known database to be found in the pattern recognition literature. Fisher’s paper is a classic in the field and is referenced frequently to this day. (See Duda & Hart, for example.) The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.
Predicted attribute: class of iris plant.
Here are the remaining details of this Dataset.
Now, let's implement our codes in that Dataset;
How long your data frame is? What are the column data types? How can I look at a little bit of my data?
It will give your column data types.
Your column data types.
It will give the dimension from your Dataframe.
Shape of your df.
It will give “n” random samples from your Dataframe.
5 random samples of your df
Looking first “n” rows of your Data frame.
Looking first 5 rows of your df.
Looking at the last “n” rows of your Data frame.
Looking last 5 rows of your df.
Shows a summary of numerical features.
Shows a summary of a numerical features.
Looking at your categorical column types.
Looking this values data types.
How to select your pre-defined row?
By using the loc method.
Now you want to see Iris-virginica class and Iris-virginica class only.
In addition to that, if you want your sepal length to be bigger than five and petal length to be smaller than five, then your code will be like that;
How to select multiple columns?
First Two Columns
Now first “:” means all rows, and 0:2 means start from the first column, end from the third column but do not select the third one.
Select Columns with Name
By using two brackets.
Select Column with their Indexes
Selecting the first and third columns by using the index method;
How can I sort the values?
- by = The column you want to sort
If you want that order to be different, then you should add the following argument:
For more, visit here
How can I look at the mean/standard deviation/max of one column per its categories?
Now, if you want to be a good programmer, you should start reading documents today.
Here is the explanation;
A group by operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.
You can look up the arguments of this method, visit here and start reading documents from this library.
How can I drop the NA Values?
Now there are too many approaches to do that.
You can fill the mean of the column to the NA Values if your dataset is small and you do not want to lose your data.
Now, I can not give a real-life explanation to you because my dataset does not contain NA values, however, as I mentioned earlier, it will be good for you to read library documents.
Here you can find other examples here, official document.
I try to be brief as much as I can.
Although there are too many other methods that may have helped you along the machine learning journey, I think this prior knowledge would be okay to launch your first Machine Learning Model.
In addition to all of these, thank you for your support of my previous articles, your reactions really motivate me to keep writing tutorials and articles.
If you want to be noticed in my upcoming articles via e-mail, here ;
I actually mentioned to you guys before about my preparation for E-Book, in this one, I will plan to explain to you guys all concepts in detail, not briefly this time, and with real-life explanations and datasets.
Machine learning is the last invention that humanity will ever need to make.” Nick Bostrom
Python Prior Machine Learning Part 2 & Data Analysis was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI