Honing my data science skills
Last Updated on July 20, 2023 by Editorial Team
Author(s): Bram Bregman
Originally published on Towards AI.
Careers
The start of a journey to become proficient in data management
Looking back, I guess it has been lingering in the back of my mind for a long time, but the urge to become an entrepreneur and create a business that I could truly call my own became increasingly more manifest the last few years.
If it wasnβt for the economic and societal impact of the COVID pandemic -if I was lucky enough to not lose my job in the first place- I would probably still find myself working at my former employer. After all; quitting a fulltime job to start a business of your own requires you to step out of your comfort zone and give up on -what I presumed to be- securities in life.
During the summer of 2020 I decided to take the plunge into the unknown. I quit my job and decided to go for it. But for what exactly? I knew that I had to start doing something challenging, inspiring and fulfilling. Something that I could be truly passionate about. After all they say; βchoose a job you love and youβll never have to work a day in your life!β
The company I have worked for provided great opportunities; a large international corporation that enabled me to switch jobs every now and then; moving both up and sideways on the career-ladder. Stumbling from one interesting and challenging project on another without ever having to wonder if what I was doing was actually what I truly love doing. Now the time had come for me to ask myself that question. And it was not for long before the answer came naturally.
Finding my Ikigai.
βDo what youβre passionate about and be passionate about what youβre doing.β
Iβve always found joy in scrutinizing large Excel-files and pivoting them in order to derive meaningful information from what I back then considered to be substantial amounts of data. I am visually oriented and I love to visualize data. I am used to doodling meeting notes and preferably use visuals in presentations. Often graphing rather complex processes, IT architectures and data flows in easier-to-understand pictures that generally tell stories better than words.
After abstracting the previously mentioned combined with my natural curiosity, creativity and the professional interests and experience that I have, it soon became clear to me that my Ikigai is data management and I decided to start learning about data science!
Data science; isnβt that complicated?
βI decided to spend at least one year learning the basics of data science.β
Algorithms nowadays are readily available in countless open source libraries. It doesnβt take too much effort to figure out how to use them. But I soon learned that that is not the essence of practicing data science; it is about knowing what algorithm to use for a particular challenge at hand. It is about knowing how algorithms work and how to interpret their output. It is about being able to optimize them for specific tasks. Only then can one apply them for successfully performing descriptive and predictive analytics and only then can one derive meaningful information from vast amounts of structured and unstructured data. And thatβs what it takes to be able to solve problems and add value using data science. After doing my desk research I came to the conclusion that pursuing a career in data science requires a profound foundational knowledge of statistics, mathematics and programming skills.
Sounded pretty complicated to me. But I was willing to start learning. Luckily I was also able to, since I had some savings that allowed me not having to work for a while. So I decided to invest in my own development and spend at least one year learning the basics of data science.
Getting ready for take-off.
βFour online courses and a virtually proctored exam at a similar pace and level of rigor as an on-campus course at MIT. Challenging!β
Due to the restrictions that the COVID pandemic caused I was forced to look for online resources to start learning. After experimenting with Python programming courses on Kaggle, Udemy and Udacity I realized that this was not quite what I was looking for. I have a bachelor in business informatics and a graduate education in business economics and business studies and I was looking for something more challenging. Narrowing my search for high quality online MOOCβs led me to Coursera and Edx. Both online learning platforms have been initiated by renowned Universities and offer fair priced, high quality content. I started out with IBM Data Science on Edx, but was disappointed with the fact that the course felt like watching one big commercial for The Big Blue, so I stopped and looked for something else. Finally; I found the Massachusetts Institute of Technology (MIT) offering a Micromasters Statistics and Data Science on Edx. This is a professional and academic credential for online learners. It consists of four online courses and a virtually proctored exam at a similar pace and level of rigor as an on-campus course at MIT. Challenging!
Dedication, motivation and perseverance.
βI failed to obtain a high enough score to pass and felt discouraged.β
In September 2020, I started the Micromasters in Statistics and Data Science. The first module was about probability theory and the science of uncertainty and data. After three months of hard work I didnβt make it. I failed to obtain a high enough score to pass and felt discouraged. I was committed to succeed and after a short while, I realized that it motivated me to learn even more and even harder. Since the next module about machine learning and algorithms was not to start for another two months, I decided to make use of the time left and work on my programming skills. I simultaneously started a βComputer Science and Programming using Pythonβ and βIntroduction to Computational Thinking and Data Scienceβ course from MIT through Edx as well as an βApplied Data Science with Python Specializationβ from the University of Michigan through Coursera. The first two provide a solid introduction to writing object oriented program code and will teach you the basics of computational thinking. Concepts like recursive programming, classes and inheritance, depth-first search versus breadth first search, debugging and big-O notation are introduced. As well as greedy algorithms, random walks and Monte Carlo simulations. The Applied Data Science with Python Specialization introduces you to using Pandas for working with dataframes in order to start data wrangling and apply techniques like Regular Expressions. It furthermore provides a gentle introduction to the application of a variety of machine learning algorithms, text mining and network analysis using NetworkX. It also touches on the topic of data visualization; not only by introducing you to Matplotlib and related libraries, but also covering topics like Edward Tuftes principals for creating a good data visualization. Numerous hands-on exercises and peer-review grading make the Coursera specialization a very practical one.
Mastering Mathematics.
βMastering mathematics, and especially working with vectors, matrices and knowledge of calculus is quintessential when you want to truly comprehend the working of Machine Learning and Artificial Intelligence algorithms.β
Early 2021 I was learning full-time and followed three to four courses in parallel. My hard work paid off since I started to successfully finish courses. I slowly but surely qualified for one certificate after another. This boosted my self-confidence and increased my learning appetite even more. During the machine learning module I decided to dust off my mathematical skills. Mastering mathematics, and especially working with vectors, matrices and knowledge of calculus is quintessential when you want to truly comprehend the working of machine learning concepts like clustering, classification, regression, regularization. Or concepts like reinforcement learning, back-propagation, principal component analysis, feed forward neural networks and stochastic gradient descent. The Mathematics for Machine Learning Specialization from the Imperial College of London through Coursera was a life saver. Linear algebra, multivariate calculus and principal component analysis are explained in an intuitive and very accessible way. Furthermore, Edx courses on integration and multivariable calculus, vectors and derivatives were very helpful in internalizing the underlying mathematical theories and solidified my learnings.
Big data.
βQuerying a billion records using Databricks and a distributed computing environment.β
Halfway spring 2021 it was time for me to go for another attempt on the probability course. I re-applied for this course and followed it together with the Fundamentals of Statistics module. I thought I already knew a thing or two about statistics; until I followed this course. It provides an in depth teaching of inference and estimation, hypothesis testing and Bayesian statistics, linear regression and generalization of linear models. Perhaps it was one of the most difficult courses I have ever taken! Hard work and perseverance paid off though and I succeeded. Early summer I started refreshing my SQL-skills; querying big data -literally a billion records- using Databricks and a distributed computing environment whilst I also worked on my data analytics and visualization skills at the same time by becoming a Tableau certified author, analyst and data scientist.
At the time of writing this post the leaves are falling. It is autumn 2021 and I am busy with the last module of the Micromasters program (Statistical Modeling and Computation in Applications) and the fourth of in total five courses in the University of Michigans Sports Performance Analytics Specialization, which I started during the summer just for the fun of it.
Specialized, but not a specialist
βAlthough I specialized in data science, I realize that I am not a true specialist at heart.β
Fifteen months into my learning journey I started realizing a few things. The more I know about data science, the more I realize that there is still so much I have yet to learn. I believe one of the best ways to do that is by using my newly acquired skills in practice and start working on projects.
Applied data science is even more valuable when combined with data engineering skills and knowledge of data architectures and data management. Although I specialized in data science, I realize that I am not a true specialist at heart. Ultimately, I enjoy using this knowledge in context and I believe I have more value to add when I combine my knowledge of data science, artificial intelligence and machine learning with the IT and data knowledge and experience that I already possess. Hence I decided to offer my services as a self employed data management consultant.
The journey has only just begun.
βContinuing my data management learning journey βon the jobβ.β
Before I even got a chance to make my planned start with the acquisition of projects, my professional network proved to be one step ahead of me. Business relations got word of my ambition and I now provide my data management consultancy services to two large companies continuing my learning journey βon the jobβ.
When you decide to start learning about data science, the course-path I took hopefully sparks some ideas and maybe even provides inspiration. In my opinion there is not one fixed blueprint to learn about data science. Instead, it is often a lenghty journey of experimenting, learning and doing. I am profoundly enjoying it and Iβve only just begunβ¦
Feel free to reach out. Iβm on LinkedIn.
List of recommended courses I took in 2020/2021:
β 6.419x: Data Analysis: Statistical Modeling and Computation in Applications, edX (Massachusetts Institute of Technology).
β 18.6501x: Fundamentals of Statistics, edX (Massachusetts Institute of Technology).
β 6.86x: Machine Learning with Python-From Linear Models to Deep Learning, edX (Massachusetts Institute of Technology).
β 6.431x: Probability β The Science of Uncertainty and Data, edX (Massachusetts Institute of Technology).
β 18.02.1x: Multivariable Calculus 1: Vectors and Derivatives, edX (Massachusetts Institute of Technology).
β 18.01.1x: Calculus 1A: Differentiation, edX (Massachusetts Institute of Technology).
β 6.00.2x: Introduction to Computational Thinking and Data Science, edX (Massachusetts Institute of Technology).
β 6.00.1x: Introduction to Computer Science and Programming Using Python, edX (Massachusetts Institute of Technology).
β SQL for Data Science, Coursera (University of California, Davis).
β Distributed Computing with Spark SQL, Coursera (University of California, Davis).
β Mathematics for Machine Learning: PCA, Coursera (Imperial College of London).
β Mathematics for Machine Learning: Multivariate Calculus, Coursera (Imperial College of London).
β Mathematics for Machine Learning: Linear Algebra, Coursera (Imperial College of London).
β Wearable Technologies and Sports Analytics, Coursera (University of Michigan).
β Prediction Models with Sports Data, Coursera (University of Michigan).
β Moneyball and Beyond, Coursera (University of Michigan).
β Foundations of Sports Analytics: Data, Representation and Models in Sports, Coursera (University of Michigan).
β Applied Social Network Analysis in Python, Coursera (University of Michigan).
β Applied Text Mining in Python, Coursera (University of Michigan).
β Applied Machine Learning in Python, Coursera (University of Michigan).
β Applied Plotting, Charting & Data Representation in Python, Coursera (University of Michigan).
β Introduction to Data Science in Python, Coursera (University of Michigan).
β Tableau Data Scientist (Tableau Learning Center).
β Tableau Analyst (Tableau Learning Center).
β Tableau Author (Tableau Learning Center).
β Intro to Machine Learning (Kaggle).
β Python (Kaggle).
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI