Soar your Bet in Data Science Using Unix Cmds
Last Updated on July 28, 2021 by Editorial Team
Author(s): Karthik Bhandary
Data Science
When I ask you, βfor what purpose do you use the command line/terminal?β, you probably tell me βTo run the scripts, obviously!πβ. I know that we use it to run the scripts, but what else can we do with it? If you are someone who is in the programming field for some time you know where I am going, But if you are a newbie, and just got into the coding world, you probably donβt have that much of an idea. Itβs not a problem at all. Itβs quite natural considering you are aΒ newbie.
If you donβt know the answer to the above-posed question, then here itΒ is:
greater control of an OS or application; faster management of many operating systems; ability to store scripts to automate regular tasks; basic command–line interface knowledge to help with troubleshooting, such as network connection issues.
We are obviously not going to talk about troubleshooting but, we definitely will talk about file management and control other tasks. If you are an aspiring Data Scientist or Data Analyst, I want to you know that Cmd is going to be a very important, common, and powerful πͺtool used to manage files and modify the data in theΒ files.
In this blog, I am not going to be talking about every single command available. I am going to be talking about some of the important commands and techniques available.
Commands
cat:
This is used to view the contents of theΒ file.
syntax: catΒ filename
Example: cat food/burger.csv
It will let you view the content of the burger.csv file.
less:
This is used to view one page at a time and we can use more than one file with this command. There are some special flags that we can use along with thisΒ command.
syntax: less filename filenameΒ β¦
when viewing if we want to do the page down use the spaceΒ bar.
- :n is used to go the next/second file.
- :p to go to the previousΒ file.
- :q toΒ quit.
There are sometimes when we want to view just the top few or the last few items of a file. We have commands to take care of that asΒ well.
head:
This is used to look at the top few items of theΒ file.
syntax: headΒ filename
we can even specify the number of lines we want with β-nβ followed by the number of lines you want. ForΒ example
head -n 3Β filename
tail:
It should be intuitive by now, that this selects the last few elements of aΒ file.
syntax: tailΒ filename
It can do what the head commandΒ does.
tail -n 3Β filename
If you are in a situation where you want to know what a command does then all you have to do is to use this command namedΒ βmanβ
syntax: manΒ cmd_name
While the head and the tail command are used to select the rows, the βcutβ command is used to select theΒ columns.
cut:
This is used to select data byΒ column.
It has some flags, that we canΒ use.
- -c: To cut by character use the -cΒ option.
- -b: To extract the specificΒ bytes.
- -d: cut uses tab as a default field delimiter but can also work with other delimiters by using the -dΒ option.
- -f: To cut by the field use the -fΒ option.
Example:
cut -f 2β5,8 -d β β filename.csv
Here we are selecting the fields from 2 to 5 and 8 and we are considering the β β(space) as the delimiter/separator.
Consider the following situation, If we use a command and it returns an error because we are in the wrong directory. We are willing to get to the right directory but are not interested in typing the whole command that we wrote earlier. We use the β!β to rerun aΒ command.
Rerun(!):
Example:
head roti.csv # Throws an error because we are in the wrong directory.
cd food # we go to the correct directory
!head # reruns the above headΒ command.
When we want to select a line containing specific values we use the commandΒ βgrepβ.
grep:
It takes a piece of text followed by one or more filenames and prints all the lines in these files that contain theΒ text.
Example:
grep βMamma mia!!βΒ dosa.txt
There are sometimes when we want to store the result of a command. We can do thatΒ easily.
Storing Data:
β>β is needed to store the data. It is used asΒ follows.
syntax:
head -n 5 food/idli.csv >Β top.csv
This stores the top 5 rows of the idli.csv in the top.csvΒ file.
Combining Commands
This is very important because we use this method all the time when working with data using terminal andΒ Unix.
We can combine two commands or more by using the pipelining method which is technically using the β|β(pipe) found above you Enter key on the keyboard.
Example:
head -n 5 food/biriyani.csv | tail -n 3 > top_bottom.csv
This is a very important concept that you should remember. It is one of the concepts that will make your lifeΒ easier.
There are times when we want to count the records by either the character or word or even by the lines. For this, we are going to use the βwcβΒ command.
wc:
It is used to count the records. By using flags you can even specify by what you want toΒ count.
- -c: character
- -w: words
- -l: lines
Example:
grep β2017β07β seasonal/spring | wcΒ -l
In the above example, I am selecting everything with β2017β07β and counting the no of occurrences byΒ line.
Until now, if we wanted to select more than one file we just typed them out. We can reduce that by using the wildcard character β*β.
Wildcard (*):
Used to select more than one file at aΒ time.
Example:
cut -d, -f 1 seasonal/*.csv
I am getting the first field with the delimiter from every file of the seasonal directory.
We can use β>β with the pipeline but it should appear at the end. ForΒ example
cut -d, -f 2 seasonal/*.csv > teeth-only.txt | grep -vΒ tooth
In the above example, all the output is stored to the teeth-only.txt making the grep wait forever for some input. Instead, you can do it likeΒ this
cut -d, -f 2 seasonal/*.csv | grep -v tooth > teeth-only.txt
This is how you should use redirecting with pipelining.
When you run a command and nothing is happening and you are not able to run another command, press Ctrl +Β C.
Environment Variables:
They are used to store data. Some common variables are HEAD,Β USER
Use uppercase when defining a variable. It is the common convention.
You can print the variable using the following command.
echo $var_name
We use $ because this allows the shell to differentiate between a filename and the value of a variableΒ name.
We store data as we do in any programing language that is by using the β=βΒ sign.
Example:
TRAINING = seasonal/summer.csv
for loop:
If we want to repeat the command for a certain time we can use loops. We particularly use the βforΒ loopβ.
Syntax:
forΒ ..var.. inΒ ..list..; doΒ ..body..; done
Example:
for filename in $@ # special sign for passingΒ files.
do
head -n 2 $filename | tail -nΒ 1
tail -n 1 $filename
done
You can use indentations if you want for better readability.
We can use files to store commands. These types of files are calledΒ scripts.
Editing aΒ Script:
You should know the following shortcuts to work with theseΒ scripts.
For editing the script, i.e to enter into a script use the βnanoβΒ command.
nano filename
Inside we can use these shortcuts.
Ctrl + K: Delete aΒ line.
Ctrl + UΒ : Un-delete aΒ line.
Ctrl + O: save the file (βOβ forΒ output)
Ctrl + X: exit theΒ editor.
As I said earlier we can use these scripts to store some commands and reuse them as we please. For your understanding purposes, I am going to show you anΒ example.
nano header.sh
# inside header.sh
grep -v βToothβ seasonal/spring.csv
press ctrl + o and press enter. # saves the contents.
press ctrl + x to exit theΒ file.
bash header.sh # by using this we use the commandsΒ inside.
Passing Filenames toΒ Script:
Earlier in the for loop example, I used a special character β$@β. This is used as a placeholder for the filenames.
Letβs say we have βunique-dish.shβ which contains the following command inΒ it.
sort $@ |Β sort
when weΒ run
unique-dish.sh food/burger.csv
the $@ inside the file gets replaced with the filename passed, i.e burger.csv. We can give it more than one filename atΒ once.
These are the Unix commands that I covered. Mind you these are not all there is to it. I just mentioned these, because they are important when it comes to DataΒ Science.
Most of the other commands are similar to the command line of windows. So you donβt need to worry aboutΒ it.
CONCLUSION
The key takeaways are
- We learned some important commands.
- We learned some important techniques.
- We learned how to apply them with examples.
I hope you found this blog helpful. If you liked this blog then I suggest you follow me on Medium and YouTube, for more content on productivity, self-improvement, Coding, andΒ Tech.
And while at it why donβt you check out my recentΒ works:
Soar your Bet in Data Science Using Unix Cmds was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
Published via Towards AI