How to Build an End-to-End Deep Learning Portfolio Project
Last Updated on March 4, 2021 by Editorial Team
Author(s): Yash Prakash
Deep Learning
The complete guide to the steps I used for building a complete, real-world significant project to showcase proudly on myΒ profile.
It was in the late December 2020 when one evening, I was casually scrolling through my Twitter timeline that I caught a tweet from a famous YouTuber that I followed and I paused. He had tweeted about how it was a pain to go through the huge number of comments that each of this videos received and how too often, so many good commentsβββto which he wouldβve really loved to reply toβββget lost in the sheerΒ volume.
Being a data science practitioner, I was intrigued by the idea of efficiently handling such a huge inflow of comments on videos. Upon thinking about it for a few hours, I was ready to believe that it really was a genuineΒ problem.
It was then that the idea of doing a project based on that particular use case was born. I wanted to do something to make shifting through the massive number of commentsΒ easier.
In this article, I will be going over how I built a full portfolio project on this idea and how you can find a particular problem to solve and build a project around itΒ too.
So, hanging on to this thought, my pointΒ is:
Focus on a real problem that you might be able to solve through yourΒ skills.
It is really necessary to focus on a problem that a colleague you know, a family member or a friend or anyone who has chosen to share their experiences in a field that you are passionate about, haveΒ faced.
Forget about deep learning for a while, these are the series of nine questions that you should ask yourselfΒ first:
- What problem are you going toΒ solve?
- Why do you need to solveΒ it?
- What kind of data will the problem require and can you access itΒ somehow?
- If yes, then how and from where do you access it? Through a public API, some web scraping, Kaggle, Google datasets, GitHub,Β where?
- Once you have the dataβββhow do you clean it and make it usable to solve thatΒ problem?
- How do you go about deciding on a modelling approach?
- How will you know which is the good solution? Or is there a good solution? Can you defineΒ it?
- What tools and libraries available can you use toΒ model?
and finally,
9. How will your results lookΒ like?
When you decide on a problem, you automatically start thinking about the next steps from the above list. Itβs very natural to go down that route and start converting your idea into a full fledged application.
Therefore, it all begins with a problem and how you will be attempting to solveΒ it.
Go ahead and jot down that list now. It is the one I use when doing my own projectsΒ too!
Once you have, letβs moveΒ on.
From here on, I will go over each of those nine points and describe how I built a whole project aroundΒ it.
Letβs go!
1. What problem am I going toΒ solve?
Like I mentioned in the little anecdote earlier, I like to define this particular point in a singleΒ line.
It makes it simple, brief, easy to understand and hence, actionable.
So here went my answer to the question:
I want to make it easier to analyse the comments on a YouTubeΒ video.
Thatβs it. Thatβs my motivation to make this project. Now go ahead and defineΒ yours.
2. Why do I need to solve thatΒ problem?
Immediately as Iβd begun to think about that tweet, there was this feeling inside me that told me that this problem can certainly be solved through some deep learning approach.
It would be fantastic if a deep learning technique could filter out the comments and make sure only the relevant ones are showcased first and as a group so that it is easier for the user to read through them, and hence choose to reply to them asΒ well.
So my end goal becameβββI want to use my skills to help provide a solution to this problem. And because it would be an appropriate use of natural language processing, as well as a challenge to my ability to bring together a viable solution asΒ well.
The main takeaway from this question isβββyou need to know why you want to solve this problem. Is there a need? Is no good solution available? And if something is available, how do go about making yours different?
3. What kind of data will the problem require and can I access itΒ somehow?
This question was easier to answerβββI will need to access comments from the videosΒ somehow.
Through research, I came to know about the convenient YouTube Data API that allows us to do just thatβββfetch comments. Now I only needed to write a script around it to build my own commentsΒ dataset.
For my project, Iβd also answered the question no. 4 within this question itself.Β π
This step is crucial in making sure you can actually move on to think about a possible way to model the problem through deep learning once you have a source ofΒ data.
Now that you know how and where to get access to the data, go ahead and obtain it or even better, try to build your own dataset like IΒ did!
5. Once I have the dataβββhow do I clean it and make it usable to solve thatΒ problem?
Iβd collected all the comments Iβd want to use for the modelling later. Now came the decision to actually transform the data well to be able to feed it into whatever model I spin upΒ later.
Every project requires a different set of steps and practices to transform the data. Hence, spending quality time with your data is very important.
In my case, I had a csv file of approximately 5000 comments from a video Iβd selected. Iβd saved it in the form of four columns for every row of commentβββthe text, the author, the comment-id, and the likeΒ count.
Looking closely at the comments, I found that typically, the creator Iβd fetched the comments from received two kinds of comments:
One, in which a commentor made sure to thank him for the video or generally applaud his film-making, editing skills, etc. These were typically very short, concise comments.
And two, in which the commentor made a remark about a particular thing he liked the most in the video and took some time to really write in depth about it. These comments often ended with a thank you message too. Typically, these comments were on the longer side of the spectrum.
These two type of comments gave me an idea of segregating them into two categories and hence, perform different operations on each, according to the features I will decide later on to include in theΒ project.
Therefore, looking back, the main points to takeaway from this question isΒ that:
Make sure to spend a while cleaning, slicing, transforming your data according to your needs. Research, think and write about what features youβd want to have and go about transforming your data according toΒ those.
Now, letβs go over the brain of the application. The modelling steps.Β π
6. How do I go about deciding on a modelling approach?
It was clear to me from the start that Iβll need to experiment with various ways to implement NLP applications into my project. It was important that I spend time studying and researching exactly what will I need to approach my desiredΒ results.
But first, I wanted to define what I wanted to do in theΒ app.
I settled on the following:
- Make a way to bring out the important topics talked about in the commentsβββcluster themΒ together
- Implement a semantic retrieval algorithm to query similar comments from the corpus with respect to a given searchedΒ topic
These were the two main features of my project. Later, I included two other thingsΒ too:
- Display top emotes used by people in the comments (something diverted from mainstream and a bit moreΒ fun)
- Display the comments as some neat, pretty word clouds! (something aestheticΒ πΒ )
It took me weeks to study, research and settle on things that I wanted to implement in theΒ project.
Remember thisβββNot everything that comes to your mind instinctively can really be included in the final application, and not everything that comes to you too late into building the final product canβt be included.
There is no hard and fast rule to abide by here. It is your project. You can include and exclude things as and when you like. So spend time doing justΒ that.
Letβs go over the next question.
7. How will I know which is the good solution? Or is there a good solution? Can I defineΒ it?
This step is almost as important as the previousΒ one.
The features Iβd decided to include in the app directly answered this question.
If I can make looking through thousands of comments and picking out some good ones really easy for the userβββspecifically, just a tap of a buttonβββthat would be the ideal solution, the one Iβm lookingΒ for.
Making it as simply as possible from the userβs perspective is crucial to getting a good project done. Not everyone has the eyes of the developer who sits behind the computer screen for hours and days to write theΒ project.
This is the definition of a good project for me. Accomplising your goal by implementing the required features is as important as making sure those features are incredibly simple to understand asΒ well.
The solution should be crisp, clean and easy to interpret and analyse. Also, if you can make it easy AND fun, it would be fantastic.
That was my thought as I began to search for ways to make sure I could also, a bit later if possible, build out a frontend for the project asΒ well.
Now comes the last two questions. These are the final stepping stones to getting a good product rolled out so do read themΒ through.
8. and 9. What tools and libraries available can I use to model? and, How will my results lookΒ like?
There were a variety of NLP libraries available for my usecase. According to the features Iβd settled upon, I decided to use the following:
- for clustering comments with similar topicsβ I used sentence-transformers to model semantic similarity for more efficient clustering. I modelled this on the longer set of commentsΒ only.
- for finding the top emotes, I used the emoji library and made sure to look up each emote and store their frequencies. I used this only on the shorter comments.
- for retrieving comments from a searched query, I used sentence-transformers again but this time, on the entire set of comments.
- Once that it was built, I came across the amazing Streamlit library which enabled me to make a beautiful frontend in mere hours of work. And that too by writing code in PythonΒ itself!
- Once the UI was done, the next step was to deploy/serve the project in a convenient way. Since this is an open-source project Iβm building, I decided to use Docker forΒ it.
It is important that you make sure that someone who comes across your project in GitHub (Iβm assuming youβll be displaying it there) can easily run your app for themselves.
It is therefore necessary to include a clear set of intructions in the README for the project. Explain everything you can think of and more. Not everything that appears obvious to you is also very apparent for them. So do it. Make sure to make it as thorough as possible.
And that is it. Weβre finallyΒ done.
If youβve followed along all the way, congratulations! Youβre one step closer in making a cool portfolio project that you can be proudΒ of!
Go, follow the steps and DOΒ IT!
One lastΒ thing.
In case you were as excited by the idea of this project just like me and want to learn even moreβββI have some good news forΒ you.
I am giving away the step by step guide to my whole workflow while building this project from scratch. as a FREE eBook ofΒ course!
All you have to do is sign up for itΒ here.
Learning Data Science isnβt that hard, but follow me and letβs make it fun together. π€
Weeks of hard work yielded a result. Check out this whole project on GitHubβββit is called:Β Insight.
Feel free to get in touch with ideas to improve this project, if you really want to. I appreciate any feedback you might have. Also get in touch if you want to build a frontend for the app in React/Vueβββit will be fun toΒ collab!
Thank you for reading and I hope you learned some good insights from this article. See you in the nextΒ one!
How to Build an End-to-End Deep Learning Portfolio Project was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
Published via Towards AI