How to Build an End-to-End Deep Learning Portfolio Project
Last Updated on March 4, 2021 by Editorial Team
Author(s): Yash Prakash
The complete guide to the steps I used for building a complete, real-world significant project to showcase proudly on my profile.
It was in the late December 2020 when one evening, I was casually scrolling through my Twitter timeline that I caught a tweet from a famous YouTuber that I followed and I paused. He had tweeted about how it was a pain to go through the huge number of comments that each of this videos received and how too often, so many good comments — to which he would’ve really loved to reply to — get lost in the sheer volume.
Being a data science practitioner, I was intrigued by the idea of efficiently handling such a huge inflow of comments on videos. Upon thinking about it for a few hours, I was ready to believe that it really was a genuine problem.
It was then that the idea of doing a project based on that particular use case was born. I wanted to do something to make shifting through the massive number of comments easier.
In this article, I will be going over how I built a full portfolio project on this idea and how you can find a particular problem to solve and build a project around it too.
So, hanging on to this thought, my point is:
Focus on a real problem that you might be able to solve through your skills.
It is really necessary to focus on a problem that a colleague you know, a family member or a friend or anyone who has chosen to share their experiences in a field that you are passionate about, have faced.
Forget about deep learning for a while, these are the series of nine questions that you should ask yourself first:
- What problem are you going to solve?
- Why do you need to solve it?
- What kind of data will the problem require and can you access it somehow?
- If yes, then how and from where do you access it? Through a public API, some web scraping, Kaggle, Google datasets, GitHub, where?
- Once you have the data — how do you clean it and make it usable to solve that problem?
- How do you go about deciding on a modelling approach?
- How will you know which is the good solution? Or is there a good solution? Can you define it?
- What tools and libraries available can you use to model?
9. How will your results look like?
When you decide on a problem, you automatically start thinking about the next steps from the above list. It’s very natural to go down that route and start converting your idea into a full fledged application.
Therefore, it all begins with a problem and how you will be attempting to solve it.
Go ahead and jot down that list now. It is the one I use when doing my own projects too!
Once you have, let’s move on.
From here on, I will go over each of those nine points and describe how I built a whole project around it.
1. What problem am I going to solve?
Like I mentioned in the little anecdote earlier, I like to define this particular point in a single line.
It makes it simple, brief, easy to understand and hence, actionable.
So here went my answer to the question:
I want to make it easier to analyse the comments on a YouTube video.
That’s it. That’s my motivation to make this project. Now go ahead and define yours.
2. Why do I need to solve that problem?
Immediately as I’d begun to think about that tweet, there was this feeling inside me that told me that this problem can certainly be solved through some deep learning approach.
It would be fantastic if a deep learning technique could filter out the comments and make sure only the relevant ones are showcased first and as a group so that it is easier for the user to read through them, and hence choose to reply to them as well.
So my end goal became — I want to use my skills to help provide a solution to this problem. And because it would be an appropriate use of natural language processing, as well as a challenge to my ability to bring together a viable solution as well.
The main takeaway from this question is — you need to know why you want to solve this problem. Is there a need? Is no good solution available? And if something is available, how do go about making yours different?
3. What kind of data will the problem require and can I access it somehow?
This question was easier to answer — I will need to access comments from the videos somehow.
Through research, I came to know about the convenient YouTube Data API that allows us to do just that — fetch comments. Now I only needed to write a script around it to build my own comments dataset.
For my project, I’d also answered the question no. 4 within this question itself. ?
This step is crucial in making sure you can actually move on to think about a possible way to model the problem through deep learning once you have a source of data.
Now that you know how and where to get access to the data, go ahead and obtain it or even better, try to build your own dataset like I did!
5. Once I have the data — how do I clean it and make it usable to solve that problem?
I’d collected all the comments I’d want to use for the modelling later. Now came the decision to actually transform the data well to be able to feed it into whatever model I spin up later.
Every project requires a different set of steps and practices to transform the data. Hence, spending quality time with your data is very important.
In my case, I had a csv file of approximately 5000 comments from a video I’d selected. I’d saved it in the form of four columns for every row of comment — the text, the author, the comment-id, and the like count.
Looking closely at the comments, I found that typically, the creator I’d fetched the comments from received two kinds of comments:
One, in which a commentor made sure to thank him for the video or generally applaud his film-making, editing skills, etc. These were typically very short, concise comments.
And two, in which the commentor made a remark about a particular thing he liked the most in the video and took some time to really write in depth about it. These comments often ended with a thank you message too. Typically, these comments were on the longer side of the spectrum.
These two type of comments gave me an idea of segregating them into two categories and hence, perform different operations on each, according to the features I will decide later on to include in the project.
Therefore, looking back, the main points to takeaway from this question is that:
Make sure to spend a while cleaning, slicing, transforming your data according to your needs. Research, think and write about what features you’d want to have and go about transforming your data according to those.
Now, let’s go over the brain of the application. The modelling steps. ?
6. How do I go about deciding on a modelling approach?
It was clear to me from the start that I’ll need to experiment with various ways to implement NLP applications into my project. It was important that I spend time studying and researching exactly what will I need to approach my desired results.
But first, I wanted to define what I wanted to do in the app.
I settled on the following:
- Make a way to bring out the important topics talked about in the comments — cluster them together
- Implement a semantic retrieval algorithm to query similar comments from the corpus with respect to a given searched topic
These were the two main features of my project. Later, I included two other things too:
- Display top emotes used by people in the comments (something diverted from mainstream and a bit more fun)
- Display the comments as some neat, pretty word clouds! (something aesthetic 🙂 )
It took me weeks to study, research and settle on things that I wanted to implement in the project.
Remember this — Not everything that comes to your mind instinctively can really be included in the final application, and not everything that comes to you too late into building the final product can’t be included.
There is no hard and fast rule to abide by here. It is your project. You can include and exclude things as and when you like. So spend time doing just that.
Let’s go over the next question.
7. How will I know which is the good solution? Or is there a good solution? Can I define it?
This step is almost as important as the previous one.
The features I’d decided to include in the app directly answered this question.
If I can make looking through thousands of comments and picking out some good ones really easy for the user — specifically, just a tap of a button — that would be the ideal solution, the one I’m looking for.
Making it as simply as possible from the user’s perspective is crucial to getting a good project done. Not everyone has the eyes of the developer who sits behind the computer screen for hours and days to write the project.
This is the definition of a good project for me. Accomplising your goal by implementing the required features is as important as making sure those features are incredibly simple to understand as well.
The solution should be crisp, clean and easy to interpret and analyse. Also, if you can make it easy AND fun, it would be fantastic.
That was my thought as I began to search for ways to make sure I could also, a bit later if possible, build out a frontend for the project as well.
Now comes the last two questions. These are the final stepping stones to getting a good product rolled out so do read them through.
8. and 9. What tools and libraries available can I use to model? and, How will my results look like?
There were a variety of NLP libraries available for my usecase. According to the features I’d settled upon, I decided to use the following:
- for clustering comments with similar topics— I used sentence-transformers to model semantic similarity for more efficient clustering. I modelled this on the longer set of comments only.
- for finding the top emotes, I used the emoji library and made sure to look up each emote and store their frequencies. I used this only on the shorter comments.
- for retrieving comments from a searched query, I used sentence-transformers again but this time, on the entire set of comments.
- Once that it was built, I came across the amazing Streamlit library which enabled me to make a beautiful frontend in mere hours of work. And that too by writing code in Python itself!
- Once the UI was done, the next step was to deploy/serve the project in a convenient way. Since this is an open-source project I’m building, I decided to use Docker for it.
It is important that you make sure that someone who comes across your project in GitHub (I’m assuming you’ll be displaying it there) can easily run your app for themselves.
It is therefore necessary to include a clear set of intructions in the README for the project. Explain everything you can think of and more. Not everything that appears obvious to you is also very apparent for them. So do it. Make sure to make it as thorough as possible.
And that is it. We’re finally done.
If you’ve followed along all the way, congratulations! You’re one step closer in making a cool portfolio project that you can be proud of!
Go, follow the steps and DO IT!
One last thing.
In case you were as excited by the idea of this project just like me and want to learn even more — I have some good news for you.
I am giving away the step by step guide to my whole workflow while building this project from scratch. as a FREE eBook of course!
All you have to do is sign up for it here.
Learning Data Science isn’t that hard, but follow me and let’s make it fun together. ?
Weeks of hard work yielded a result. Check out this whole project on GitHub — it is called: Insight.
Feel free to get in touch with ideas to improve this project, if you really want to. I appreciate any feedback you might have. Also get in touch if you want to build a frontend for the app in React/Vue — it will be fun to collab!
Thank you for reading and I hope you learned some good insights from this article. See you in the next one!
How to Build an End-to-End Deep Learning Portfolio Project was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
Published via Towards AI