4 Tips To Write Scalable Apache Spark Code
Author(s): ___ Originally published on Towards AI. In this article, I will share some tips on how to write scalable Apache Spark code. The examples presented here are actually based on the code I encountered in the real world. So, by sharing …
A Practical Tip When Working With Random Samples On Spark
Author(s): ___ Originally published on Towards AI. In this article, I will share a crucial tip when using Spark to analyze a random sample of a data frame. The code to reproduce the results can be found here. Itβs an HTML version …
Small -> Big -> Massive β VM to BM to Serverless Spark-based Data Science
Author(s): Deepak Sekar Originally published on Towards AI. Cloud Computing We have heard about big data platforms supporting ML workloads with distributed computing. But do you always need a big data platform for your data science workloads? How about the flexibility to …
This Is How You Can Build a Churn Prediction Model Using Apache Spark
Author(s): Paul Iusztin Originally published on Towards AI. An end-to-end tutorial on how to build a churn prediction pipeline using only Apache Spark. This member-only story is on us. Upgrade to access all of Medium. Image by the Author created with Stable …
How to Set Up Your Environment for Spark
Author(s): Hao Cai Originally published on Towards AI. Data Engineering Photo by Ilya Pavlov on Unsplash Spark is a very popular open-source big data framework that is being used by many companies in the industry. Here I want to show you how …
Billions of Rows, Milliseconds of Time- PySpark Starter Guide
Author(s): Ravi Shankar Originally published on Towards AI. Programming Intended Audience: Data Scientists with a working knowledge of Python, SQL, and Linux How often we see the below error followed by a terminal shutdown followed by despair over lost work: Memory Error- …