How to install Hadoop on MacBook M1 or M2 without Homebrew or Virtual Machine
Author(s): Mala Deep Originally published on Towards AI. Step 1: Check and install the Java JDK (if needed) using the terminal This member-only story is on us. Upgrade to access all of Medium. Hadoop localhost User Interface. Image by the author. In …
Big Data, IoT and AI, Part One: Three Sides of the Same Coin
Author(s): Charles Towers-Clark Originally published on Towards AI. Big Data, AI and IoT are all parts of the same system β and nothing will improve unless we think of them holistically U+007C YUKOKUSAMURAI, SHUTTERSTOCK Since computers were first invented people have been …
Big Data, AI & IoT Part Two: Driving Industry 4.0 One Step At A Time
Author(s): Charles Towers-Clark Originally published on Towards AI. Factories, refineries, utilities and all manner of industrial environments will benefit from AI, Big Data, and IoT, but what will it take to get there? UNSPLASH What comes to mind when you think of …
Big Data, AI & IoT, Part Three: Whatβs Stopping Us?
Author(s): Charles Towers-Clark Originally published on Towards AI. The progress of AI, Big Data and IoT has been well-documented, but there are still major hurdles to cross before they achieve their full potential. JOHN CAMERON, UNSPLASH This series of articles have looked …
How You Should Save the Output of your Spark ETL Jobs (If you are not Writing to a Database)
Author(s): ___ Originally published on Towards AI. In this article, I will share my thoughts on the best way to save the output of Spark ETL jobs so that it is easier to do analytical work later. The code to reproduce the …
Planning Better Cities With AI And Big Data-Part One
Author(s): Charles Towers-Clark Originally published on Towards AI. 3D Models like this one of Adelaide can help city planners visualize how a development will look in situ, a useful tool to help to alleviate the increasing congestion of urban spaces. Our cities …
Will Your Education Pay You Well?
Author(s): Harsh Darji Originally published on Towards AI. Wage analysis using Random Forest https://pixabay.com/photos/woman-adult-people-money-3261425/ Wage analysis is a process of comparing the salaries based on the attributes attached to the employee. Of course, there are several factors like the company, location which …
Clash Royale API: Looping Query for Data Collection
Author(s): Michelangiolo Mazzeschi Originally published on Towards AI. Data Science A few days ago I had the idea of applying factor analysis to the decks of the Clash Royale players in order to classify them into hierarchies. Unfortunately, I realized that I …
Small -> Big -> Massive β VM to BM to Serverless Spark-based Data Science
Author(s): Deepak Sekar Originally published on Towards AI. Cloud Computing We have heard about big data platforms supporting ML workloads with distributed computing. But do you always need a big data platform for your data science workloads? How about the flexibility to …
Stock Downloader API with Alpha Vantage
Author(s): Michelangiolo Mazzeschi Originally published on Towards AI. Finance Full code available on my GitHub repository. In the past few weeks, I scavenged the internet in search of reliable ways to download historical stock prices. Unfortunately, it is not easy to find …
Using AI to control AI: How to Prevent Creating Biased Datasets
Author(s): Michelangiolo Mazzeschi Originally published on Towards AI. What are labels? In the last few days, MIT took down a cited 80 million tiny images 32×32 size because it contained labels (if you do not know what it means, I will clarify …
PySpark process Multi char Delimiter Dataset
Author(s): Vivek Chaudhary Originally published on Towards AI. Programming The objective of this article is to process multiple delimited files using Apache spark with Python Programming language. This is a real-time scenario where an application can share multiple delimited file,s and the …
5 Steps to Tackle Real-World Imbalanced Data
Author(s): Snehal Nair Originally published on Towards AI. Baseline Model without resampling Working with imbalanced data can be very challenging. Imbalanced data refers to data where classes do not have equal weight. Some examples of imbalanced datasets include fraud detection, churn prediction, …
Supercharge Your Data Engineering Skills with This Machine Learning Pipeline
Author(s): ????Mike Shakhomirov Originally published on Towards AI. Data modeling, Python, DAGs, Big Data file formats, costsβ¦ It covers everything Photo by Peter Olexa on Unsplash This is a real-life scenario when I was tasked to create a highly scalable machine learning …
Large-Scale Sentiment Analysis with PySpark
Author(s): ClΓ©ment Delteil Originally published on Towards AI. Comparative study of classification algorithms and feature extraction functions implemented in PySpark on 1,600,000 Tweets. Photo by Nik on Unsplash As entities become more interconnected, the volume of data to be processed grows exponentially. …