Apache Spark 3 for Data Engineering & Analytics with Python
Apache Spark 3 for Data Engineering & Analytics with Python, available at $69.99, has an average rating of 4.33, with 89 lectures, based on 606 reviews, and has 7423 subscribers.
You will learn about Learn the Spark Architecture Learn Spark Execution Concepts Learn Spark Transformations and Actions using the Structured API Learn Spark Transformations and Actions using the RDD (Resilient Distributed Datasets) API Learn how to set up your own local PySpark Environment Learn how to interpret the Spark Web UI Learn how to interpret DAG (Directed Acyclic Graph) for Spark Execution Learn the RDD (Resilient Distributed Datasets) API (Crash Course) Learn the Spark DataFrame API (Structured APIs) Learn Spark SQL Learn Spark on Databricks Learn to Visualize (Graphs and Dashboards) Data on Databricks This course is ideal for individuals who are Python Developers who wish to learn how to use the language for Data Engineering and Analytics with PySpark or Aspiring Data Engineering and Analytics Professionals or Data Scientists / Analysts who wish to learn an analytical processing strategy that can be deployed over a big data cluster or Data Managers who want to gain a deeper understanding of managing data over a cluster It is particularly useful for Python Developers who wish to learn how to use the language for Data Engineering and Analytics with PySpark or Aspiring Data Engineering and Analytics Professionals or Data Scientists / Analysts who wish to learn an analytical processing strategy that can be deployed over a big data cluster or Data Managers who want to gain a deeper understanding of managing data over a cluster.
Enroll now: Apache Spark 3 for Data Engineering & Analytics with Python
Summary
Title: Apache Spark 3 for Data Engineering & Analytics with Python
Price: $69.99
Average Rating: 4.33
Number of Lectures: 89
Number of Published Lectures: 89
Number of Curriculum Items: 89
Number of Published Curriculum Objects: 89
Original Price: $19.99
Quality Status: approved
Status: Live
What You Will Learn
- Learn the Spark Architecture
- Learn Spark Execution Concepts
- Learn Spark Transformations and Actions using the Structured API
- Learn Spark Transformations and Actions using the RDD (Resilient Distributed Datasets) API
- Learn how to set up your own local PySpark Environment
- Learn how to interpret the Spark Web UI
- Learn how to interpret DAG (Directed Acyclic Graph) for Spark Execution
- Learn the RDD (Resilient Distributed Datasets) API (Crash Course)
- Learn the Spark DataFrame API (Structured APIs)
- Learn Spark SQL
- Learn Spark on Databricks
- Learn to Visualize (Graphs and Dashboards) Data on Databricks
Who Should Attend
- Python Developers who wish to learn how to use the language for Data Engineering and Analytics with PySpark
- Aspiring Data Engineering and Analytics Professionals
- Data Scientists / Analysts who wish to learn an analytical processing strategy that can be deployed over a big data cluster
- Data Managers who want to gain a deeper understanding of managing data over a cluster
Target Audiences
- Python Developers who wish to learn how to use the language for Data Engineering and Analytics with PySpark
- Aspiring Data Engineering and Analytics Professionals
- Data Scientists / Analysts who wish to learn an analytical processing strategy that can be deployed over a big data cluster
- Data Managers who want to gain a deeper understanding of managing data over a cluster
The key objectives of this course are as follows;
-
Learn the Spark Architecture
-
Learn Spark Execution Concepts
-
Learn Spark Transformations and Actions using the Structured API
-
Learn Spark Transformations and Actions using the RDD (Resilient Distributed Datasets) API
-
Learn how to set up your own local PySpark Environment
-
Learn how to interpret the Spark Web UI
-
Learn how to interpret DAG (Directed Acyclic Graph) for Spark Execution
-
Learn the RDD (Resilient Distributed Datasets) API (Crash Course)
-
RDD Transformations
-
RDD Actions
-
-
Learn the Spark DataFrame API (Structured APIs)
-
Create Schemas and Assign DataTypes
-
Read and Write Data using the DataFrame Reader and Writer
-
Read Semi-Structured Data such as JSON
-
Create and New Data Columns to the DataFrame using Expressions
-
Filter the DataFrame using the “Filter” and “Where” Transformations
-
Ensure that the DataFrame has unique rows
-
Detect and Drop Duplicates
-
Augment the DataFrame by Adding New Rows
-
Combine 2 or More DataFrames
-
Order the DataFrame by Specific Columns
-
Renaming and Drop Columns from the DataFrame
-
Clean the DataFrame by detecting and Removing Missing or Bad Data
-
Create User-Defined Spark Functions
-
Read and Write to/from Parquet File
-
Partition the DataFrame and Write to Parquet File
-
Aggregate the DataFrame using Spark SQL functions (count, countDistinct, Max, Min, Sum, SumDistinct, AVG)
-
Perform Aggregations with Grouping
-
-
Learn Spark SQL and Databricks
-
Create a Databricks Account
-
Create a Databricks Cluster
-
Create Databricks SQL and Python Notebooks
-
Learn Databricks shortcuts
-
Create Databases and Tables using Spark SQL
-
Use DML, DQL, and DDL with Spark SQL
-
Use Spark SQL Functions
-
Learn the differences between Managed and Unmanaged Tables
-
Read CSV Files from the Databricks File System
-
Learn to write Complex SQL
-
Use Spark SQL Functions
-
Create Visualisations with Databricks
-
Create a Databricks Dashboard
-
The Python Spark project that we are going to do together;
Sales Data
-
Create a Spark Session
-
Read a CSV file into a Spark Dataframe
-
Learn to Infer a Schema
-
Select data from the Spark Dataframe
-
Produce analytics that shows the topmost sales orders per Region and Country
Convert Fahrenheit to Degrees Centigrade
-
Create a Spark Session
-
Read and Parallelize data using the Spark Context into an RDD
-
Create a Function to Convert Fahrenheit to Degrees Centigrade
-
Use the Map Function to convert data contained within an RDD
-
Filter temperatures greater than or equal to 13 degrees celsius
XYZ Research
-
Create a set of RDDs that hold Research Data
-
Use the union transformation to combine RDDs
-
Learn to use the subtract transformation to minus values from an RDD
-
Use the RDD API to answer the following questions
-
How many research projects were initiated in the first three years?
-
How many projects were completed in the first year?
-
How many projects were completed in the first two years?
-
Sales Analytics
-
Create the Sales Analytics DataFrame to a set of CSV Files
-
Prepare the DataFrame by applying a Structure
-
Remove bad records from the DataFrame (Cleaning)
-
Generate New Columns from the DataFrame
-
Write a Partitioned DataFrame to a Parquet Directory
-
Answer the following questions and create visualizations using Seaborn and Matplotlib
-
What was the best month in sales?
-
What city sold the most products?
-
What time should the business display advertisements to maximize the likelihood of customers buying products?
-
What products are often sold together in the state “NY”?
-
Technology Spec
-
Python
-
Jupyter Notebook
-
Jupyter Lab
-
PySpark (Spark with Python)
-
Pandas
-
Matplotlib
-
Seaborne
-
Databricks
-
SQL
Course Curriculum
Chapter 1: Introduction to Spark and Installation
Lecture 1: Introduction
Lecture 2: The Spark Architecture
Lecture 3: The Spark Unified Stack
Lecture 4: Windows – Download Java
Lecture 5: Windows – Install Java
Lecture 6: Windows – Set up Java environment variables
Lecture 7: Windows – Download Python Installer
Lecture 8: Windows – Install Python
Lecture 9: Windows – Set up PATH variable for Python
Lecture 10: Windows – Install Spark for Python
Lecture 11: Windows – PySpark Test Program
Lecture 12: Hadoop Installation
Lecture 13: Install Microsoft Buid Tools
Lecture 14: Mac OS – Java Installation
Lecture 15: Mac OS – Python Installation
Lecture 16: Mac OS – PySpark Installation
Lecture 17: Mac OS – Testing the Spark Installation
Lecture 18: Install Jupyter Notebooks
Lecture 19: The Spark Web UI
Lecture 20: Section Summary
Chapter 2: Spark Execution Concepts
Lecture 1: Section Introduction
Lecture 2: Spark Application and Session
Lecture 3: Spark Transformations and Actions Part 1
Lecture 4: Spark Transformations and Actions Part 2
Lecture 5: DAG Visualisation
Chapter 3: RDD Crash Course
Lecture 1: Introduction to RDDs
Lecture 2: Data Preparation
Lecture 3: Distince and Filter Transformations
Lecture 4: Map and Flat Map Transformations
Lecture 5: SortByKey Transformations
Lecture 6: RDD Actions
Lecture 7: Challenge – Convert Fahrenheit to Centigrade
Lecture 8: Challenge – XYZ Research
Lecture 9: XYZ Research
Lecture 10: Challenge – XYZ Research Part 1
Lecture 11: Challenge XYZ Research Part 2
Chapter 4: Structured API – Spark DataFrame
Lecture 1: Structured APIs Introduction
Lecture 2: Preparing the Project Folder
Lecture 3: PySpark DataFrame, Schema and DataTypes
Lecture 4: DataFrame Reader and Writer
Lecture 5: Challenge Part 1 – Brief
Lecture 6: Challenge Part 1
Lecture 7: Challenge Part 1 – Data Preparation
Lecture 8: Working with Structured Operations
Lecture 9: Managing Performance Errors
Lecture 10: Reading a JSON File
Lecture 11: Columns and Expressions
Lecture 12: Filter and Where Conditions
Lecture 13: Distinct Drop Duplicates Order By
Lecture 14: Rows and Union
Lecture 15: Adding, Renaming and Dropping Columns
Lecture 16: Working with Missing or Bad Data
Lecture 17: Working with User Defined Functions
Lecture 18: Challenge Part 2 – Brief
Lecture 19: Challenge Part 2
Lecture 20: Challenge Part 2 – Remove Null Row and Bad Records
Lecture 21: Challenge Part 2 – Get the City and State
Lecture 22: Challenge Part 2 – Rearrange the Schema
Lecture 23: Challenge Part 2 – Write Partitioned DataFrame to Parquet
Lecture 24: Aggregations
Lecture 25: Aggregations – Setting up Flight Summary Data
Lecture 26: Aggregations – Count and Count Distinct
Lecture 27: Aggregations – Min Max Sum SumDistinct AVG
Lecture 28: Aggregations with Grouping
Lecture 29: Challenge Part 3 – Brief
Lecture 30: Challenge Part 3
Lecture 31: Challenge Part 3 – Prepare 2019 Data
Lecture 32: Challenge Part 3 – Q1 Get the Best Sales Month
Lecture 33: Challenge Part 3 – Q2 Get the City that sold the most products
Lecture 34: Challenge Part 3 – Q3 When to advertise
Lecture 35: Challenge Part 3 – Q4 Products Bought Together
Chapter 5: Introduction to Spark SQL and Databricks
Lecture 1: Introduction to DataBricks
Lecture 2: Spark SQL Introduction
Lecture 3: Register Account on Databricks
Lecture 4: Create a Databricks Cluster
Lecture 5: Creating our First 2 Databricks Notebooks
Lecture 6: Reading CSV Files into DataFrame
Lecture 7: Creating a Database and Table
Lecture 8: Inserting Records into a Table
Lecture 9: Exposing Bad Records
Lecture 10: Figuring out how to remove bad records
Lecture 11: Extract the City and State
Lecture 12: Inserting Records to Final Sales Table
Lecture 13: What was the best month in sales?
Lecture 14: Get the City that sold the most products
Lecture 15: Get the right time to advertise
Lecture 16: Get the most products sold together
Lecture 17: Create a Dashboard
Lecture 18: Summary
Instructors
-
David Charles Academy
Senior Big Data Engineer / Consultant at ABN AMRO
Rating Distribution
- 1 stars: 4 votes
- 2 stars: 7 votes
- 3 stars: 77 votes
- 4 stars: 224 votes
- 5 stars: 294 votes
Frequently Asked Questions
How long do I have access to the course materials?
You can view and review the lecture materials indefinitely, like an on-demand channel.
Can I take my courses with me wherever I go?
Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don’t have an internet connection, some instructors also let their students download course lectures. That’s up to the instructor though, so make sure you get on their good side!
You may also like
- Top 10 Language Learning Courses to Learn in November 2024
- Top 10 Video Editing Courses to Learn in November 2024
- Top 10 Music Production Courses to Learn in November 2024
- Top 10 Animation Courses to Learn in November 2024
- Top 10 Digital Illustration Courses to Learn in November 2024
- Top 10 Renewable Energy Courses to Learn in November 2024
- Top 10 Sustainable Living Courses to Learn in November 2024
- Top 10 Ethical AI Courses to Learn in November 2024
- Top 10 Cybersecurity Fundamentals Courses to Learn in November 2024
- Top 10 Smart Home Technology Courses to Learn in November 2024
- Top 10 Holistic Health Courses to Learn in November 2024
- Top 10 Nutrition And Diet Planning Courses to Learn in November 2024
- Top 10 Yoga Instruction Courses to Learn in November 2024
- Top 10 Stress Management Courses to Learn in November 2024
- Top 10 Mindfulness Meditation Courses to Learn in November 2024
- Top 10 Life Coaching Courses to Learn in November 2024
- Top 10 Career Development Courses to Learn in November 2024
- Top 10 Relationship Building Courses to Learn in November 2024
- Top 10 Parenting Skills Courses to Learn in November 2024
- Top 10 Home Improvement Courses to Learn in November 2024