4 Tips To Write Scalable Apache Spark Code
Last Updated on July 20, 2023 by Editorial Team

Originally published on Towards AI.

In this article, I will share some tips on how to write scalable Apache Spark code. The examples presented here are actually based on the code I encountered in the real world. So, by sharing these tips, I hope I can help newcomers to write performant Spark code without needlessly increasing their cluster’s resources.

The cluster I used to run the code in this article is hosted on Databricks with the following configuration:

Cluster Mode: StandardDatabricks Runtime Version: 5.5 LTS ML (includes Apache Spark 2.4.3 Scala 2.11)

There are 8 workers and both the workers and driver are m4.xlarge instances (16.0 GB, 4… Read the full blog for free on Medium.

