Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Soar your Bet in Data Science Using Unix Cmds
Latest

Soar your Bet in Data Science Using Unix Cmds

Last Updated on July 28, 2021 by Editorial Team

Author(s): Karthik Bhandary

Data Science

When I ask you, β€œfor what purpose do you use the command line/terminal?”, you probably tell me β€œTo run the scripts, obviously!πŸ™„β€. I know that we use it to run the scripts, but what else can we do with it? If you are someone who is in the programming field for some time you know where I am going, But if you are a newbie, and just got into the coding world, you probably don’t have that much of an idea. It’s not a problem at all. It’s quite natural considering you are aΒ newbie.

If you don’t know the answer to the above-posed question, then here itΒ is:

greater control of an OS or application; faster management of many operating systems; ability to store scripts to automate regular tasks; basic commandline interface knowledge to help with troubleshooting, such as network connection issues.

Photo by Mr Cup / Fabien Barral onΒ Unsplash

We are obviously not going to talk about troubleshooting but, we definitely will talk about file management and control other tasks. If you are an aspiring Data Scientist or Data Analyst, I want to you know that Cmd is going to be a very important, common, and powerful πŸ’ͺtool used to manage files and modify the data in theΒ files.

In this blog, I am not going to be talking about every single command available. I am going to be talking about some of the important commands and techniques available.

Commands

Photo by Karina Vorozheeva onΒ Unsplash

cat:

This is used to view the contents of theΒ file.

syntax: catΒ filename

Example: cat food/burger.csv

It will let you view the content of the burger.csv file.

Photo by K8 onΒ Unsplash

less:

This is used to view one page at a time and we can use more than one file with this command. There are some special flags that we can use along with thisΒ command.

syntax: less filename filename …

when viewing if we want to do the page down use the spaceΒ bar.

  • :n is used to go the next/second file.
  • :p to go to the previousΒ file.
  • :q toΒ quit.

There are sometimes when we want to view just the top few or the last few items of a file. We have commands to take care of that asΒ well.

Photo by Andres Herrera onΒ Unsplash

head:

This is used to look at the top few items of theΒ file.

syntax: headΒ filename

we can even specify the number of lines we want with β€œ-n” followed by the number of lines you want. ForΒ example

head -n 3Β filename

Photo by Jason Leung onΒ Unsplash

tail:

It should be intuitive by now, that this selects the last few elements of aΒ file.

syntax: tailΒ filename

It can do what the head commandΒ does.

tail -n 3Β filename

If you are in a situation where you want to know what a command does then all you have to do is to use this command namedΒ β€œman”

syntax: manΒ cmd_name

While the head and the tail command are used to select the rows, the β€œcut” command is used to select theΒ columns.

Photo by Brands&People onΒ Unsplash

cut:

This is used to select data byΒ column.

It has some flags, that we canΒ use.

  • -c: To cut by character use the -cΒ option.
  • -b: To extract the specificΒ bytes.
  • -d: cut uses tab as a default field delimiter but can also work with other delimiters by using the -dΒ option.
  • -f: To cut by the field use the -fΒ option.

Example:

cut -f 2–5,8 -d β€œ ” filename.csv

Here we are selecting the fields from 2 to 5 and 8 and we are considering the β€œ ”(space) as the delimiter/separator.

Consider the following situation, If we use a command and it returns an error because we are in the wrong directory. We are willing to get to the right directory but are not interested in typing the whole command that we wrote earlier. We use the β€œ!” to rerun aΒ command.

Rerun(!):

Example:

head roti.csv # Throws an error because we are in the wrong directory.

cd food # we go to the correct directory

!head # reruns the above headΒ command.

When we want to select a line containing specific values we use the commandΒ β€œgrep”.

Photo by S Migaj onΒ Unsplash

grep:

It takes a piece of text followed by one or more filenames and prints all the lines in these files that contain theΒ text.

Example:

grep β€œMamma mia!!” dosa.txt

There are sometimes when we want to store the result of a command. We can do thatΒ easily.

Photo by Lars Kienle onΒ Unsplash

Storing Data:

β€œ>” is needed to store the data. It is used asΒ follows.

syntax:

head -n 5 food/idli.csv >Β top.csv

This stores the top 5 rows of the idli.csv in the top.csvΒ file.

Photo by Michael Dziedzic onΒ Unsplash

Combining Commands

This is very important because we use this method all the time when working with data using terminal andΒ Unix.

We can combine two commands or more by using the pipelining method which is technically using the β€œ|”(pipe) found above you Enter key on the keyboard.

Example:

head -n 5 food/biriyani.csv | tail -n 3 > top_bottom.csv

This is a very important concept that you should remember. It is one of the concepts that will make your lifeΒ easier.

There are times when we want to count the records by either the character or word or even by the lines. For this, we are going to use the β€œwc” command.

Photo by Towfiqu barbhuiya onΒ Unsplash

wc:

It is used to count the records. By using flags you can even specify by what you want toΒ count.

  • -c: character
  • -w: words
  • -l: lines

Example:

grep β€œ2017–07” seasonal/spring | wcΒ -l

In the above example, I am selecting everything with β€œ2017–07” and counting the no of occurrences byΒ line.

Until now, if we wanted to select more than one file we just typed them out. We can reduce that by using the wildcard character β€œ*”.

Photo by Quentin Rey onΒ Unsplash

Wildcard (*):

Used to select more than one file at aΒ time.

Example:

cut -d, -f 1 seasonal/*.csv

I am getting the first field with the delimiter from every file of the seasonal directory.

We can use β€œ>” with the pipeline but it should appear at the end. ForΒ example

cut -d, -f 2 seasonal/*.csv > teeth-only.txt | grep -vΒ tooth

In the above example, all the output is stored to the teeth-only.txt making the grep wait forever for some input. Instead, you can do it likeΒ this

cut -d, -f 2 seasonal/*.csv | grep -v tooth > teeth-only.txt

This is how you should use redirecting with pipelining.

When you run a command and nothing is happening and you are not able to run another command, press Ctrl +Β C.

Environment Variables:

They are used to store data. Some common variables are HEAD,Β USER

Use uppercase when defining a variable. It is the common convention.

You can print the variable using the following command.

echo $var_name

We use $ because this allows the shell to differentiate between a filename and the value of a variableΒ name.

We store data as we do in any programing language that is by using the β€œ=” sign.

Example:

TRAINING = seasonal/summer.csv

for loop:

If we want to repeat the command for a certain time we can use loops. We particularly use the β€œforΒ loop”.

Syntax:

forΒ ..var.. inΒ ..list..; doΒ ..body..; done

Example:

for filename in $@ # special sign for passingΒ files.

do

head -n 2 $filename | tail -nΒ 1

tail -n 1 $filename

done

You can use indentations if you want for better readability.

We can use files to store commands. These types of files are calledΒ scripts.

Editing aΒ Script:

You should know the following shortcuts to work with theseΒ scripts.

For editing the script, i.e to enter into a script use the β€œnano” command.

nano filename

Inside we can use these shortcuts.

Ctrl + K: Delete aΒ line.

Ctrl + UΒ : Un-delete aΒ line.

Ctrl + O: save the file (β€˜O’ forΒ output)

Ctrl + X: exit theΒ editor.

As I said earlier we can use these scripts to store some commands and reuse them as we please. For your understanding purposes, I am going to show you anΒ example.

nano header.sh

# inside header.sh

grep -v β€œTooth” seasonal/spring.csv

press ctrl + o and press enter. # saves the contents.

press ctrl + x to exit theΒ file.

bash header.sh # by using this we use the commandsΒ inside.

Passing Filenames toΒ Script:

Earlier in the for loop example, I used a special character β€œ$@”. This is used as a placeholder for the filenames.

Let’s say we have β€œunique-dish.sh” which contains the following command inΒ it.

sort $@ |Β sort

when weΒ run

unique-dish.sh food/burger.csv

the $@ inside the file gets replaced with the filename passed, i.e burger.csv. We can give it more than one filename atΒ once.

These are the Unix commands that I covered. Mind you these are not all there is to it. I just mentioned these, because they are important when it comes to DataΒ Science.

Most of the other commands are similar to the command line of windows. So you don’t need to worry aboutΒ it.

CONCLUSION

The key takeaways are

  • We learned some important commands.
  • We learned some important techniques.
  • We learned how to apply them with examples.

I hope you found this blog helpful. If you liked this blog then I suggest you follow me on Medium and YouTube, for more content on productivity, self-improvement, Coding, andΒ Tech.

And while at it why don’t you check out my recentΒ works:


Soar your Bet in Data Science Using Unix Cmds was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Feedback ↓