Pyspark Kafka Structured Streaming Data Pipeline
Author(s): Vivek Chaudhary Originally published on Towards AI. Programming The objective of this article is to build an understanding to create a data pipeline to process data using Apache Structured Streaming and Apache Kafka. Source: Kafka-Spark streaming Business Case Explanation: Let us …
PySpark process Multi char Delimiter Dataset
Author(s): Vivek Chaudhary Originally published on Towards AI. Programming The objective of this article is to process multiple delimited files using Apache spark with Python Programming language. This is a real-time scenario where an application can share multiple delimited file,s and the …
Handle Missing Data in Pyspark
Author(s): Vivek Chaudhary Originally published on Towards AI. Programming, Python The objective of this article is to understand various ways to handle missing or null values present in the dataset. A null means an unknown or missing or irrelevant value, but with …
Exploratory Data Analysis (EDA) using Pyspark
Author(s): Vivek Chaudhary Originally published on Towards AI. Data Analytics, Python The objective of this article is to perform analysis on the dataset and answer some questions to get the insight of data. We will learn how to connect to Oracle DB …