Data Engineering using Kafka and Spark Structured Streaming

Data Engineering using Kafka and Spark Structured Streaming, available at $74.99, has an average rating of 4.5, with 113 lectures, based on 216 reviews, and has 3898 subscribers.

You will learn about Setting up self support lab with Hadoop (HDFS and YARN), Hive, Spark, and Kafka Overview of Kafka to build streaming pipelines Data Ingestion to Kafka topics using Kafka Connect using File Source Data Ingestion to HDFS using Kafka Connect using HDFS 3 Connector Plugin Overview of Spark Structured Streaming to process data as part of Streaming Pipelines Incremental Data Processing using Spark Structured Streaming using File Source and File Target Integration of Kafka and Spark Structured Streaming – Reading Data from Kafka Topics This course is ideal for individuals who are Experienced ETL Developers who want to learn Kafka and Spark to build streaming pipelines or Experienced PL/SQL Developers who want to learn Kafka and Spark to build streaming pipelines or Beginner or Experienced Data Engineers who want to learn Kafka and Spark to build streaming pipelines It is particularly useful for Experienced ETL Developers who want to learn Kafka and Spark to build streaming pipelines or Experienced PL/SQL Developers who want to learn Kafka and Spark to build streaming pipelines or Beginner or Experienced Data Engineers who want to learn Kafka and Spark to build streaming pipelines.

Enroll now: Data Engineering using Kafka and Spark Structured Streaming

Summary

Title: Data Engineering using Kafka and Spark Structured Streaming

Price: $74.99

Average Rating: 4.5

Number of Lectures: 113

Number of Published Lectures: 113

Number of Curriculum Items: 113

Number of Published Curriculum Objects: 113

Original Price: $22.99

Quality Status: approved

Status: Live

What You Will Learn

Setting up self support lab with Hadoop (HDFS and YARN), Hive, Spark, and Kafka
Overview of Kafka to build streaming pipelines
Data Ingestion to Kafka topics using Kafka Connect using File Source
Data Ingestion to HDFS using Kafka Connect using HDFS 3 Connector Plugin
Overview of Spark Structured Streaming to process data as part of Streaming Pipelines
Incremental Data Processing using Spark Structured Streaming using File Source and File Target
Integration of Kafka and Spark Structured Streaming – Reading Data from Kafka Topics

Who Should Attend

Experienced ETL Developers who want to learn Kafka and Spark to build streaming pipelines
Experienced PL/SQL Developers who want to learn Kafka and Spark to build streaming pipelines
Beginner or Experienced Data Engineers who want to learn Kafka and Spark to build streaming pipelines

Target Audiences

Experienced ETL Developers who want to learn Kafka and Spark to build streaming pipelines
Experienced PL/SQL Developers who want to learn Kafka and Spark to build streaming pipelines
Beginner or Experienced Data Engineers who want to learn Kafka and Spark to build streaming pipelines

As part of this course, you will be learning to build streaming pipelines by integrating Kafka and Spark Structured Streaming. Let us go through the details about what is covered in the course.

First of all, we need to have the proper environment to build streaming pipelines using Kafka and Spark Structured Streaming on top of Hadoop or any other distributed file system. As part of the course, you will start with setting up a self-support lab with all the key components such as Hadoop, Hive, Spark, and Kafka on a single node Linux-based system.
Once the environment is set up you will go through the details related to getting started with Kafka. As part of that process, you will create a Kafka topic, produce messages into the topic as well as consume messages from the topic.
You will also learn how to use Kafka Connect to ingest data from web server logs into Kafka topic as well as ingest data from Kafka topic into HDFS as a sink.
Once you understand Kafka from the perspective of Data Ingestion, you will get an overview of some of the key concepts of related Spark Structured Streaming.
After learning Kafka and Spark Structured streaming separately, you will build a streaming pipeline to consume data from Kafka topic using Spark Structured Streaming, then process and write to different targets.
You will also learn how to take care of incremental data processing using Spark Structured Streaming.

Course Outline

Here is a brief outline of the course. You can choose either Cloud9 or GCP to provision a server to set up the environment.

Setting up Environment using AWS Cloud9 or GCP
Setup Single Node Hadoop Cluster
Setup Hive and Spark on top of Single Node Hadoop Cluster
Setup Single Node Kafka Cluster on top of Single Node Hadoop Cluster
Getting Started with Kafka
Data Ingestion using Kafka Connect – Web server log files as a source to Kafka Topic
Data Ingestion using Kafka Connect – Kafka Topic to HDFS a sink
Overview of Spark Structured Streaming
Kafka and Spark Structured Streaming Integration
Incremental Loads using Spark Structured Streaming

Udemy based support

In case you run into technical challenges while taking the course, feel free to raise your concerns using Udemy Messenger. We will make sure that issue is resolved in 48 hours.

Course Curriculum

Chapter 1: Introduction

Lecture 1: Introduction to Data Engineering using Kafka and Spark Structured Streaming

Lecture 2: Important Note for first time Data Engineering Customers

Lecture 3: Important Note for Data Engineering Essentials (Python and Spark) Customers

Lecture 4: How to get 30 days complementary lab access?

Lecture 5: How to access material used for this course?

Chapter 2: Getting Started with Kafka

Lecture 1: Overview of Kafka

Lecture 2: Managing Topics using Kafka CLI

Lecture 3: Produce and Consume Messages using CLI

Lecture 4: Validate Generation of Web Server Logs

Lecture 5: Create Web Server using nc

Lecture 6: Produce retail logs to Kafka Topic

Lecture 7: Consume retail logs from Kafka Topic

Lecture 8: Clean up Kafka CLI Sessions to produce and consume messages

Lecture 9: Define Kafka Connect to produce

Lecture 10: Validate Kafka Connect to produce

Chapter 3: Data Ingestion using Kafka Connect

Lecture 1: Overview of Kafka Connect

Lecture 2: Define Kafka Connect to Produce Messages

Lecture 3: Validate Kafka Connect to produce messages

Lecture 4: Cleanup Kafka Connect to produce messages

Lecture 5: Write Data to HDFS using Kafka Connect

Lecture 6: Setup HDFS 3 Sink Connector Plugin

Lecture 7: Overview of Kafka Consumer Groups

Lecture 8: Configure HDFS 3 Sink Properties

Lecture 9: Run and Validate HDFS 3 Sink

Lecture 10: Cleanup Kafka Connect to consume messages

Chapter 4: Overview of Spark Structured Streaming

Lecture 1: Understanding Streaming Context

Lecture 2: Validate Log Data for Streaming

Lecture 3: Push log messages to Netcat Webserver

Lecture 4: Overview of built-in Input Sources

Lecture 5: Reading Web Server logs using Spark Structured Streaming

Lecture 6: Overview of Output Modes

Lecture 7: Using append as Output Mode

Lecture 8: Using complete as Output Mode

Lecture 9: Using update as Output Mode

Lecture 10: Overview of Triggers in Spark Structured Streaming

Lecture 11: Overview of built-in Output Sinks

Lecture 12: Previewing the Streaming Data

Chapter 5: Kafka and Spark Structured Streaming Integration

Lecture 1: Create Kafka Topic

Lecture 2: Read Data from Kafka Topic

Lecture 3: Preview data using console

Lecture 4: Preview data using memory

Lecture 5: Transform Data using Spark APIs

Lecture 6: Write Data to HDFS using Spark

Lecture 7: Validate Data in HDFS using Spark

Lecture 8: Write Data to HDFS using Spark using Header

Lecture 9: Cleanup Kafka Connect and Files in HDFS

Chapter 6: Incremental Loads using Spark Structured Streaming

Lecture 1: Overview of Spark Structured Streaming Triggers

Lecture 2: Steps for Incremental Data Processing

Lecture 3: Create Working Directory in HDFS

Lecture 4: Logic to Upload GHArchive Files

Lecture 5: Upload GHArchive Files to HDFS

Lecture 6: Add new GHActivity JSON Files

Lecture 7: Read JSON Data using Spark Structured streaming

Lecture 8: Write in Parquet File Format

Lecture 9: Analyze GHArchive Data in Parquet files using Spark

Lecture 10: Add New GHActivity JSON files

Lecture 11: Load Data Incrementally to Target Table

Lecture 12: Validate Incremental Load

Lecture 13: Add New GHActivity JSON files

Lecture 14: Using maxFilerPerTrigger and latestFirst

Lecture 15: Validate Incremental Load

Lecture 16: Add New GHActivity JSON files

Lecture 17: Incremental Load using Archival Process

Lecture 18: Validate Incremental Load

Chapter 7: Setting up Environment using AWS Cloud9

Lecture 1: Getting Started with Cloud9

Lecture 2: Creating Cloud9 Environment

Lecture 3: Warming up with Cloud9 IDE

Lecture 4: Overview of EC2 related to Cloud9

Lecture 5: Opening ports for Cloud9 Instance

Lecture 6: Associating Elastic IPs to Cloud9 Instance

Lecture 7: Increase EBS Volume Size of Cloud9 Instance

Lecture 8: Setup Jupyter Lab on Cloud9

Lecture 9: [Commands] Setup Jupyter Lab on Cloud9

Chapter 8: Setting up Environment – Overview of GCP and Provision Ubuntu VM

Lecture 1: Signing up for GCP

Lecture 2: Overview of GCP Web Console

Lecture 3: Overview of GCP Pricing

Lecture 4: Provision Ubuntu VM from GCP

Lecture 5: Setup Docker

Lecture 6: Validating Python

Lecture 7: Setup Jupyter Lab

Lecture 8: Setup Jupyter Lab locally on Mac

Chapter 9: Setup Single Node Hadoop Cluster

Lecture 1: Introduction to Single Node Hadoop Cluster

Lecture 2: Material related to setting up the environment

Lecture 3: Setup Prerequisites

Lecture 4: Setup Password less login

Lecture 5: Download and Install Hadoop

Lecture 6: Configure Hadoop HDFS

Lecture 7: Start and Validate HDFS

Lecture 8: Configure Hadoop YARN

Lecture 9: Start and Validate YARN

Lecture 10: Managing Single Node Hadoop

Instructors

Durga Viswanatha Raju Gadiraju
CEO at ITVersity and CTO at Analytiqs, Inc
Madhuri Gadiraju
Pratik Kumar
Sathvika Dandu
Phani Bhushan Bozzam

Rating Distribution

1 stars: 9 votes
2 stars: 10 votes
3 stars: 18 votes
4 stars: 59 votes
5 stars: 120 votes

Frequently Asked Questions

How long do I have access to the course materials?

You can view and review the lecture materials indefinitely, like an on-demand channel.

Can I take my courses with me wherever I go?

Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don’t have an internet connection, some instructors also let their students download course lectures. That’s up to the instructor though, so make sure you get on their good side!

Menu