Data Engineering using Kafka and Spark Structured Streaming
Data Engineering using Kafka and Spark Structured Streaming, available at $74.99, has an average rating of 4.5, with 113 lectures, based on 216 reviews, and has 3898 subscribers.
You will learn about Setting up self support lab with Hadoop (HDFS and YARN), Hive, Spark, and Kafka Overview of Kafka to build streaming pipelines Data Ingestion to Kafka topics using Kafka Connect using File Source Data Ingestion to HDFS using Kafka Connect using HDFS 3 Connector Plugin Overview of Spark Structured Streaming to process data as part of Streaming Pipelines Incremental Data Processing using Spark Structured Streaming using File Source and File Target Integration of Kafka and Spark Structured Streaming – Reading Data from Kafka Topics This course is ideal for individuals who are Experienced ETL Developers who want to learn Kafka and Spark to build streaming pipelines or Experienced PL/SQL Developers who want to learn Kafka and Spark to build streaming pipelines or Beginner or Experienced Data Engineers who want to learn Kafka and Spark to build streaming pipelines It is particularly useful for Experienced ETL Developers who want to learn Kafka and Spark to build streaming pipelines or Experienced PL/SQL Developers who want to learn Kafka and Spark to build streaming pipelines or Beginner or Experienced Data Engineers who want to learn Kafka and Spark to build streaming pipelines.
Enroll now: Data Engineering using Kafka and Spark Structured Streaming
Summary
Title: Data Engineering using Kafka and Spark Structured Streaming
Price: $74.99
Average Rating: 4.5
Number of Lectures: 113
Number of Published Lectures: 113
Number of Curriculum Items: 113
Number of Published Curriculum Objects: 113
Original Price: $22.99
Quality Status: approved
Status: Live
What You Will Learn
- Setting up self support lab with Hadoop (HDFS and YARN), Hive, Spark, and Kafka
- Overview of Kafka to build streaming pipelines
- Data Ingestion to Kafka topics using Kafka Connect using File Source
- Data Ingestion to HDFS using Kafka Connect using HDFS 3 Connector Plugin
- Overview of Spark Structured Streaming to process data as part of Streaming Pipelines
- Incremental Data Processing using Spark Structured Streaming using File Source and File Target
- Integration of Kafka and Spark Structured Streaming – Reading Data from Kafka Topics
Who Should Attend
- Experienced ETL Developers who want to learn Kafka and Spark to build streaming pipelines
- Experienced PL/SQL Developers who want to learn Kafka and Spark to build streaming pipelines
- Beginner or Experienced Data Engineers who want to learn Kafka and Spark to build streaming pipelines
Target Audiences
- Experienced ETL Developers who want to learn Kafka and Spark to build streaming pipelines
- Experienced PL/SQL Developers who want to learn Kafka and Spark to build streaming pipelines
- Beginner or Experienced Data Engineers who want to learn Kafka and Spark to build streaming pipelines
As part of this course, you will be learning to build streaming pipelines by integrating Kafka and Spark Structured Streaming. Let us go through the details about what is covered in the course.
-
First of all, we need to have the proper environment to build streaming pipelines using Kafka and Spark Structured Streaming on top of Hadoop or any other distributed file system. As part of the course, you will start with setting up a self-support lab with all the key components such as Hadoop, Hive, Spark, and Kafka on a single node Linux-based system.
-
Once the environment is set up you will go through the details related to getting started with Kafka. As part of that process, you will create a Kafka topic, produce messages into the topic as well as consume messages from the topic.
-
You will also learn how to use Kafka Connect to ingest data from web server logs into Kafka topic as well as ingest data from Kafka topic into HDFS as a sink.
-
Once you understand Kafka from the perspective of Data Ingestion, you will get an overview of some of the key concepts of related Spark Structured Streaming.
-
After learning Kafka and Spark Structured streaming separately, you will build a streaming pipeline to consume data from Kafka topic using Spark Structured Streaming, then process and write to different targets.
-
You will also learn how to take care of incremental data processing using Spark Structured Streaming.
Course Outline
Here is a brief outline of the course. You can choose either Cloud9 or GCP to provision a server to set up the environment.
-
Setting up Environment using AWS Cloud9 or GCP
-
Setup Single Node Hadoop Cluster
-
Setup Hive and Spark on top of Single Node Hadoop Cluster
-
Setup Single Node Kafka Cluster on top of Single Node Hadoop Cluster
-
Getting Started with Kafka
-
Data Ingestion using Kafka Connect – Web server log files as a source to Kafka Topic
-
Data Ingestion using Kafka Connect – Kafka Topic to HDFS a sink
-
Overview of Spark Structured Streaming
-
Kafka and Spark Structured Streaming Integration
-
Incremental Loads using Spark Structured Streaming
Udemy based support
In case you run into technical challenges while taking the course, feel free to raise your concerns using Udemy Messenger. We will make sure that issue is resolved in 48 hours.
Course Curriculum
Chapter 1: Introduction
Lecture 1: Introduction to Data Engineering using Kafka and Spark Structured Streaming
Lecture 2: Important Note for first time Data Engineering Customers
Lecture 3: Important Note for Data Engineering Essentials (Python and Spark) Customers
Lecture 4: How to get 30 days complementary lab access?
Lecture 5: How to access material used for this course?
Chapter 2: Getting Started with Kafka
Lecture 1: Overview of Kafka
Lecture 2: Managing Topics using Kafka CLI
Lecture 3: Produce and Consume Messages using CLI
Lecture 4: Validate Generation of Web Server Logs
Lecture 5: Create Web Server using nc
Lecture 6: Produce retail logs to Kafka Topic
Lecture 7: Consume retail logs from Kafka Topic
Lecture 8: Clean up Kafka CLI Sessions to produce and consume messages
Lecture 9: Define Kafka Connect to produce
Lecture 10: Validate Kafka Connect to produce
Chapter 3: Data Ingestion using Kafka Connect
Lecture 1: Overview of Kafka Connect
Lecture 2: Define Kafka Connect to Produce Messages
Lecture 3: Validate Kafka Connect to produce messages
Lecture 4: Cleanup Kafka Connect to produce messages
Lecture 5: Write Data to HDFS using Kafka Connect
Lecture 6: Setup HDFS 3 Sink Connector Plugin
Lecture 7: Overview of Kafka Consumer Groups
Lecture 8: Configure HDFS 3 Sink Properties
Lecture 9: Run and Validate HDFS 3 Sink
Lecture 10: Cleanup Kafka Connect to consume messages
Chapter 4: Overview of Spark Structured Streaming
Lecture 1: Understanding Streaming Context
Lecture 2: Validate Log Data for Streaming
Lecture 3: Push log messages to Netcat Webserver
Lecture 4: Overview of built-in Input Sources
Lecture 5: Reading Web Server logs using Spark Structured Streaming
Lecture 6: Overview of Output Modes
Lecture 7: Using append as Output Mode
Lecture 8: Using complete as Output Mode
Lecture 9: Using update as Output Mode
Lecture 10: Overview of Triggers in Spark Structured Streaming
Lecture 11: Overview of built-in Output Sinks
Lecture 12: Previewing the Streaming Data
Chapter 5: Kafka and Spark Structured Streaming Integration
Lecture 1: Create Kafka Topic
Lecture 2: Read Data from Kafka Topic
Lecture 3: Preview data using console
Lecture 4: Preview data using memory
Lecture 5: Transform Data using Spark APIs
Lecture 6: Write Data to HDFS using Spark
Lecture 7: Validate Data in HDFS using Spark
Lecture 8: Write Data to HDFS using Spark using Header
Lecture 9: Cleanup Kafka Connect and Files in HDFS
Chapter 6: Incremental Loads using Spark Structured Streaming
Lecture 1: Overview of Spark Structured Streaming Triggers
Lecture 2: Steps for Incremental Data Processing
Lecture 3: Create Working Directory in HDFS
Lecture 4: Logic to Upload GHArchive Files
Lecture 5: Upload GHArchive Files to HDFS
Lecture 6: Add new GHActivity JSON Files
Lecture 7: Read JSON Data using Spark Structured streaming
Lecture 8: Write in Parquet File Format
Lecture 9: Analyze GHArchive Data in Parquet files using Spark
Lecture 10: Add New GHActivity JSON files
Lecture 11: Load Data Incrementally to Target Table
Lecture 12: Validate Incremental Load
Lecture 13: Add New GHActivity JSON files
Lecture 14: Using maxFilerPerTrigger and latestFirst
Lecture 15: Validate Incremental Load
Lecture 16: Add New GHActivity JSON files
Lecture 17: Incremental Load using Archival Process
Lecture 18: Validate Incremental Load
Chapter 7: Setting up Environment using AWS Cloud9
Lecture 1: Getting Started with Cloud9
Lecture 2: Creating Cloud9 Environment
Lecture 3: Warming up with Cloud9 IDE
Lecture 4: Overview of EC2 related to Cloud9
Lecture 5: Opening ports for Cloud9 Instance
Lecture 6: Associating Elastic IPs to Cloud9 Instance
Lecture 7: Increase EBS Volume Size of Cloud9 Instance
Lecture 8: Setup Jupyter Lab on Cloud9
Lecture 9: [Commands] Setup Jupyter Lab on Cloud9
Chapter 8: Setting up Environment – Overview of GCP and Provision Ubuntu VM
Lecture 1: Signing up for GCP
Lecture 2: Overview of GCP Web Console
Lecture 3: Overview of GCP Pricing
Lecture 4: Provision Ubuntu VM from GCP
Lecture 5: Setup Docker
Lecture 6: Validating Python
Lecture 7: Setup Jupyter Lab
Lecture 8: Setup Jupyter Lab locally on Mac
Chapter 9: Setup Single Node Hadoop Cluster
Lecture 1: Introduction to Single Node Hadoop Cluster
Lecture 2: Material related to setting up the environment
Lecture 3: Setup Prerequisites
Lecture 4: Setup Password less login
Lecture 5: Download and Install Hadoop
Lecture 6: Configure Hadoop HDFS
Lecture 7: Start and Validate HDFS
Lecture 8: Configure Hadoop YARN
Lecture 9: Start and Validate YARN
Lecture 10: Managing Single Node Hadoop
Instructors
-
Durga Viswanatha Raju Gadiraju
CEO at ITVersity and CTO at Analytiqs, Inc -
Madhuri Gadiraju
-
Pratik Kumar
-
Sathvika Dandu
-
Phani Bhushan Bozzam
Rating Distribution
- 1 stars: 9 votes
- 2 stars: 10 votes
- 3 stars: 18 votes
- 4 stars: 59 votes
- 5 stars: 120 votes
Frequently Asked Questions
How long do I have access to the course materials?
You can view and review the lecture materials indefinitely, like an on-demand channel.
Can I take my courses with me wherever I go?
Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don’t have an internet connection, some instructors also let their students download course lectures. That’s up to the instructor though, so make sure you get on their good side!
You may also like
- Top 10 Language Learning Courses to Learn in November 2024
- Top 10 Video Editing Courses to Learn in November 2024
- Top 10 Music Production Courses to Learn in November 2024
- Top 10 Animation Courses to Learn in November 2024
- Top 10 Digital Illustration Courses to Learn in November 2024
- Top 10 Renewable Energy Courses to Learn in November 2024
- Top 10 Sustainable Living Courses to Learn in November 2024
- Top 10 Ethical AI Courses to Learn in November 2024
- Top 10 Cybersecurity Fundamentals Courses to Learn in November 2024
- Top 10 Smart Home Technology Courses to Learn in November 2024
- Top 10 Holistic Health Courses to Learn in November 2024
- Top 10 Nutrition And Diet Planning Courses to Learn in November 2024
- Top 10 Yoga Instruction Courses to Learn in November 2024
- Top 10 Stress Management Courses to Learn in November 2024
- Top 10 Mindfulness Meditation Courses to Learn in November 2024
- Top 10 Life Coaching Courses to Learn in November 2024
- Top 10 Career Development Courses to Learn in November 2024
- Top 10 Relationship Building Courses to Learn in November 2024
- Top 10 Parenting Skills Courses to Learn in November 2024
- Top 10 Home Improvement Courses to Learn in November 2024