PySpark & AWS: Master Big Data With PySpark and AWS
PySpark & AWS: Master Big Data With PySpark and AWS, available at $84.99, has an average rating of 4.51, with 207 lectures, 12 quizzes, based on 2184 reviews, and has 15398 subscribers.
You will learn about ● The introduction and importance of Big Data. ● Practical explanation and live coding with PySpark. ● Spark applications ● Spark EcoSystem ● Spark Architecture ● Hadoop EcoSystem ● Hadoop Architecture ● PySpark RDDs ● PySpark RDD transformations ● PySpark RDD actions ● PySpark DataFrames ● PySpark DataFrames transformations ● PySpark DataFrames actions ● Collaborative filtering in PySpark ● Spark Streaming ● ETL Pipeline ● CDC and Replication on Going This course is ideal for individuals who are ● People who are beginners and know absolutely nothing about PySpark and AWS. or ● People who want to develop intelligent solutions. or ● People who want to learn PySpark and AWS. or ● People who love to learn the theoretical concepts first before implementing them using Python. or ● People who want to learn PySpark along with its implementation in realistic projects. or ● Big Data Scientists. or ● Big Data Engineers. It is particularly useful for ● People who are beginners and know absolutely nothing about PySpark and AWS. or ● People who want to develop intelligent solutions. or ● People who want to learn PySpark and AWS. or ● People who love to learn the theoretical concepts first before implementing them using Python. or ● People who want to learn PySpark along with its implementation in realistic projects. or ● Big Data Scientists. or ● Big Data Engineers.
Enroll now: PySpark & AWS: Master Big Data With PySpark and AWS
Summary
Title: PySpark & AWS: Master Big Data With PySpark and AWS
Price: $84.99
Average Rating: 4.51
Number of Lectures: 207
Number of Quizzes: 12
Number of Published Lectures: 190
Number of Published Quizzes: 12
Number of Curriculum Items: 219
Number of Published Curriculum Objects: 202
Original Price: $199.99
Quality Status: approved
Status: Live
What You Will Learn
- ● The introduction and importance of Big Data.
- ● Practical explanation and live coding with PySpark.
- ● Spark applications
- ● Spark EcoSystem
- ● Spark Architecture
- ● Hadoop EcoSystem
- ● Hadoop Architecture
- ● PySpark RDDs
- ● PySpark RDD transformations
- ● PySpark RDD actions
- ● PySpark DataFrames
- ● PySpark DataFrames transformations
- ● PySpark DataFrames actions
- ● Collaborative filtering in PySpark
- ● Spark Streaming
- ● ETL Pipeline
- ● CDC and Replication on Going
Who Should Attend
- ● People who are beginners and know absolutely nothing about PySpark and AWS.
- ● People who want to develop intelligent solutions.
- ● People who want to learn PySpark and AWS.
- ● People who love to learn the theoretical concepts first before implementing them using Python.
- ● People who want to learn PySpark along with its implementation in realistic projects.
- ● Big Data Scientists.
- ● Big Data Engineers.
Target Audiences
- ● People who are beginners and know absolutely nothing about PySpark and AWS.
- ● People who want to develop intelligent solutions.
- ● People who want to learn PySpark and AWS.
- ● People who love to learn the theoretical concepts first before implementing them using Python.
- ● People who want to learn PySpark along with its implementation in realistic projects.
- ● Big Data Scientists.
- ● Big Data Engineers.
Comprehensive Course Description:
The hottest buzzwords in the Big Data analytics industry are Python and Apache Spark. PySpark supports the collaboration of Python and Apache Spark. In this course, you’ll start right from the basics and proceed to the advanced levels of data analysis. From cleaning data to building features and implementing machine learning (ML) models, you’ll learn how to execute end-to-end workflows using PySpark.
Right through the course, you’ll be using PySpark for performing data analysis. You’ll explore Spark RDDs, Dataframes, and a bit of Spark SQL queries. Also, you’ll explore the transformations and actions that can be performed on the data using Spark RDDs and dataframes. You’ll also explore the ecosystem of Spark and Hadoop and their underlying architecture. You’ll use the Databricks environment for running the Spark scripts and explore it as well.
Finally, you’ll have a taste of Spark with AWS cloud. You’ll see how we can leverage AWS storages, databases, computations, and how Spark can communicate with different AWS services and get its required data.
How Is This Course Different?
In this Learning by Doing course, every theoretical explanation is followed by practical implementation.
The course ‘PySpark & AWS: Master Big Data With PySpark and AWS’ is crafted to reflect the most in-demand workplace skills. This course will help you understand all the essential concepts and methodologies with regards to PySpark. The course is:
• Easy to understand.
• Expressive.
• Exhaustive.
• Practical with live coding.
• Rich with the state of the art and latest knowledge of this field.
As this course is a detailed compilation of all the basics, it will motivate you to make quick progress and experience much more than what you have learned. At the end of each concept, you will be assigned Homework/tasks/activities/quizzes along with solutions. This is to evaluate and promote your learning based on the previous concepts and methods you have learned. Most of these activities will be coding-based, as the aim is to get you up and running with implementations.
High-quality video content, in-depth course material, evaluating questions, detailed course notes, and informative handouts are some of the perks of this course. You can approach our friendly team in case of any course-related queries, and we assure you of a fast response.
The course tutorials are divided into 140+ brief videos. You’ll learn the concepts and methodologies of PySpark and AWS along with a lot of practical implementation. The total runtime of the HD videos is around 16 hours.
Why Should You Learn PySpark and AWS?
PySpark is the Python library that makes the magic happen.
PySpark is worth learning because of the huge demand for Spark professionals and the high salaries they command. The usage of PySpark in Big Data processing is increasing at a rapid pace compared to other Big Data tools.
AWS, launched in 2006, is the fastest-growing public cloud. The right time to cash in on cloud computing skills—AWS skills, to be precise—is now.
Course Content:
The all-inclusive course consists of the following topics:
1. Introduction:
a. Why Big Data?
b. Applications of PySpark
c. Introduction to the Instructor
d. Introduction to the Course
e. Projects Overview
2. Introduction to Hadoop, Spark EcoSystems, and Architectures:
a. Hadoop EcoSystem
b. Spark EcoSystem
c. Hadoop Architecture
d. Spark Architecture
e. PySpark Databricks setup
f. PySpark local setup
3. Spark RDDs:
a. Introduction to PySpark RDDs
b. Understanding underlying Partitions
c. RDD transformations
d. RDD actions
e. Creating Spark RDD
f. Running Spark Code Locally
g. RDD Map (Lambda)
h. RDD Map (Simple Function)
i. RDD FlatMap
j. RDD Filter
k. RDD Distinct
l. RDD GroupByKey
m. RDD ReduceByKey
n. RDD (Count and CountByValue)
o. RDD (saveAsTextFile)
p. RDD (Partition)
q. Finding Average
r. Finding Min and Max
s. Mini project on student data set analysis
t. Total Marks by Male and Female Student
u. Total Passed and Failed Students
v. Total Enrollments per Course
w. Total Marks per Course
x. Average marks per Course
y. Finding Minimum and Maximum marks
z. Average Age of Male and Female Students
4. Spark DFs:
a. Introduction to PySpark DFs
b. Understanding underlying RDDs
c. DFs transformations
d. DFs actions
e. Creating Spark DFs
f. Spark Infer Schema
g. Spark Provide Schema
h. Create DF from RDD
i. Select DF Columns
j. Spark DF with Column
k. Spark DF with Column Renamed and Alias
l. Spark DF Filter rows
m. Spark DF (Count, Distinct, Duplicate)
n. Spark DF (sort, order By)
o. Spark DF (Group By)
p. Spark DF (UDFs)
q. Spark DF (DF to RDD)
r. Spark DF (Spark SQL)
s. Spark DF (Write DF)
t. Mini project on Employees data set analysis
u. Project Overview
v. Project (Count and Select)
w. Project (Group By)
x. Project (Group By, Aggregations, and Order By)
y. Project (Filtering)
z. Project (UDF and With Column)
aa. Project (Write)
5. Collaborative filtering:
a. Understanding collaborative filtering
b. Developing recommendation system using ALS model
c. Utility Matrix
d. Explicit and Implicit Ratings
e. Expected Results
f. Dataset
g. Joining Dataframes
h. Train and Test Data
i. ALS model
j. Hyperparameter tuning and cross-validation
k. Best model and evaluate predictions
l. Recommendations
6. Spark Streaming:
a. Understanding the difference between batch and streaming analysis.
b. Hands-on with spark streaming through word count example
c. Spark Streaming with RDD
d. Spark Streaming Context
e. Spark Streaming Reading Data
f. Spark Streaming Cluster Restart
g. Spark Streaming RDD Transformations
h. Spark Streaming DF
i. Spark Streaming Display
j. Spark Streaming DF Aggregations
7. ETL Pipeline
a. Understanding the ETL
b. ETL pipeline Flow
c. Data set
d. Extracting Data
e. Transforming Data
f. Loading data (Creating RDS)
g. Load data (Creating RDS)
h. RDS Networking
i. Downloading Postgres
j. Installing Postgres
k. Connect to RDS through PgAdmin
l. Loading Data
8. Project – Change Data Capture / Replication On Going
a. Introduction to Project
b. Project Architecture
c. Creating RDS MySql Instance
d. Creating S3 Bucket
e. Creating DMS Source Endpoint
f. Creating DMS Destination Endpoint
g. Creating DMS Instance
h. MySql WorkBench
i. Connecting with RDS and Dumping Data
j. Querying RDS
k. DMS Full Load
l. DMS Replication Ongoing
m. Stoping Instances
n. Glue Job (Full Load)
o. Glue Job (Change Capture)
p. Glue Job (CDC)
q. Creating Lambda Function and Adding Trigger
r. Checking Trigger
s. Getting S3 file name in Lambda
t. Creating Glue Job
u. Adding Invoke for Glue Job
v. Testing Invoke
w. Writing Glue Shell Job
x. Full Load Pipeline
y. Change Data Capture Pipeline
After the successful completion of this course, you will be able to:
● Relate the concepts and practicals of Spark and AWS with real-world problems
● Implement any project that requires PySpark knowledge from scratch
● Know the theory and practical aspects of PySpark and AWS
Who this course is for:
● People who are beginners and know absolutely nothing about PySpark and AWS
● People who want to develop intelligent solutions
● People who want to learn PySpark and AWS
● People who love to learn the theoretical concepts first before implementing them using Python
● People who want to learn PySpark along with its implementation in realistic projects
● Big Data Scientists
● Big Data Engineers
Enroll in this comprehensive PySpark and AWS course now to master the essential skills in Big Data analytics, data processing, and cloud computing.
Whether you’re a beginner or looking to expand your knowledge, this course offers a hands-on learning experience with practical projects. Don’t miss this opportunity to advance your career and tackle real-world challenges in the world of data analytics and cloud computing. Join us today and start your journey towards becoming a Big Data expert with PySpark and AWS!
List of keywords:
-
Big Data analytics
-
Data analysis
-
Data cleaning
-
Machine learning (ML)
-
Spark RDDs
-
Dataframes
-
Spark SQL queries
-
Spark ecosystem
-
Hadoop
-
Databricks
-
AWS cloud
-
Spark scripts
-
AWS services
-
PySpark and AWS collaboration
-
PySpark tutorial
-
PySpark hands-on
-
PySpark projects
-
Spark architecture
-
Hadoop ecosystem
-
PySpark Databricks setup
-
Spark local setup
-
Spark RDD transformations
-
Spark RDD actions
-
Spark DF transformations
-
Spark DF actions
-
Spark Infer Schema
-
Spark Provide Schema
-
Spark DF Filter rows
-
Spark DF (Count, Distinct, Duplicate)
-
Spark DF (sort, order By)
-
Spark DF (Group By)
-
Spark DF (UDFs)
-
Spark DF (Spark SQL)
-
Collaborative filtering
-
Recommendation system
-
ALS model
-
Spark Streaming
-
ETL pipeline
-
Change Data Capture (CDC)
-
Replication
-
AWS Glue Job
-
Lambda Function
-
RDS
-
S3 Bucket
-
MySql Instance
-
Data Migration Service (DMS)
-
PgAdmin
-
Spark Shell Job
-
Full Load Pipeline
-
Change Data Capture Pipeline
Course Curriculum
Chapter 1: Introduction
Lecture 1: Why Big Data
Lecture 2: Applications of PySpark
Lecture 3: Introduction to Instructor
Lecture 4: Introduction to Course
Lecture 5: Projects Overview
Lecture 6: Request for Your Honest Review
Lecture 7: Links for the Course's Materials and Codes
Chapter 2: 01-Introduction to Hadoop, Spark EcoSystems and Architectures
Lecture 1: Links for the Course's Materials and Codes
Lecture 2: Why Spark
Lecture 3: Hadoop EcoSystem
Lecture 4: Spark Architecture and EcoSystem
Lecture 5: DataBricks SignUp
Lecture 6: Create DataBricks Notebook
Lecture 7: Download Spark and Dependencies
Lecture 8: Java Setup on Window
Lecture 9: Windows Setup Python Spark Hadoop
Lecture 10: Runing Spark on Window
Lecture 11: Java Download on MAC
Lecture 12: Installing JDK on MAC
Lecture 13: Setting Java Home on MAC
Lecture 14: Java check on MAC
Lecture 15: Installing Python on MAC
Lecture 16: Setup Spark on MAC
Chapter 3: Spark RDDs
Lecture 1: Links for the Course's Materials and Codes
Lecture 2: Spark RDDs
Lecture 3: Creating Spark RDD
Lecture 4: Running Spark Code Locally
Lecture 5: RDD Map (Lambda)
Lecture 6: RDD Map (Simple Function)
Lecture 7: Quiz (Map)
Lecture 8: Solution 1 (Map)
Lecture 9: Solution 2 (Map)
Lecture 10: RDD FlatMap
Lecture 11: RDD Filter
Lecture 12: Quiz (Filter)
Lecture 13: Solution (Filter)
Lecture 14: RDD Distinct
Lecture 15: RDD GroupByKey
Lecture 16: RDD ReduceByKey
Lecture 17: Quiz (Word Count)
Lecture 18: Solution (Word Count)
Lecture 19: RDD (Count and CountByValue)
Lecture 20: RDD (saveAsTextFile)
Lecture 21: RDD (Partition)
Lecture 22: Finding Average-1
Lecture 23: Finding Average-2
Lecture 24: Quiz (Average)
Lecture 25: Solution (Average)
Lecture 26: Finding Min and Max
Lecture 27: Quiz (Min and Max)
Lecture 28: Solution (Min and Max)
Lecture 29: Project Overview
Lecture 30: Total Students
Lecture 31: Total Marks by Male and Female Student
Lecture 32: Total Passed and Failed Students
Lecture 33: Total Enrollments per Course
Lecture 34: Total Marks per Course
Lecture 35: Average marks per Course
Lecture 36: Finding Minimum and Maximum marks
Lecture 37: Average Age of Male and Female Students
Chapter 4: Spark DFs
Lecture 1: Links for the Course's Materials and Codes
Lecture 2: Introduction to Spark DFs
Lecture 3: Creating Spark DFs
Lecture 4: Spark Infer Schema
Lecture 5: Spark Provide Schema
Lecture 6: Create DF from Rdd
Lecture 7: Rectifying the Error
Lecture 8: Select DF Colums
Lecture 9: Spark DF withColumn
Lecture 10: Spark DF withColumnRenamed and Alias
Lecture 11: Spark DF Filter rows
Lecture 12: Quiz (select, withColumn, filter)
Lecture 13: Solution (select, withColumn, filter)
Lecture 14: Spark DF (Count, Distinct, Duplicate)
Lecture 15: Quiz (Distinct, Duplicate)
Lecture 16: Solution (Distinct, Duplicate)
Lecture 17: Spark DF (sort, orderBy)
Lecture 18: Quiz (sort, orderBy)
Lecture 19: Solution (sort, orderBy)
Lecture 20: Spark DF (Group By)
Lecture 21: Spark DF (Group By – Multiple Columns and Aggregations)
Lecture 22: Spark DF (Group By -Visualization)
Lecture 23: Spark DF (Group By – Filtering)
Lecture 24: Quiz (Group By)
Lecture 25: Solution (Group By)
Lecture 26: Quiz (Word Count)
Lecture 27: Solution (Word Count)
Lecture 28: Spark DF (UDFs)
Lecture 29: Quiz (UDFs)
Lecture 30: Solution (UDFs)
Instructors
-
AI Sciences
AI Experts & Data Scientists |4+ Rated | 168+ Countries -
AI Sciences Team
Support Team AI Sciences
Rating Distribution
- 1 stars: 34 votes
- 2 stars: 31 votes
- 3 stars: 191 votes
- 4 stars: 838 votes
- 5 stars: 1091 votes
Frequently Asked Questions
How long do I have access to the course materials?
You can view and review the lecture materials indefinitely, like an on-demand channel.
Can I take my courses with me wherever I go?
Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don’t have an internet connection, some instructors also let their students download course lectures. That’s up to the instructor though, so make sure you get on their good side!
You may also like
- Top 10 Video Editing Courses to Learn in November 2024
- Top 10 Music Production Courses to Learn in November 2024
- Top 10 Animation Courses to Learn in November 2024
- Top 10 Digital Illustration Courses to Learn in November 2024
- Top 10 Renewable Energy Courses to Learn in November 2024
- Top 10 Sustainable Living Courses to Learn in November 2024
- Top 10 Ethical AI Courses to Learn in November 2024
- Top 10 Cybersecurity Fundamentals Courses to Learn in November 2024
- Top 10 Smart Home Technology Courses to Learn in November 2024
- Top 10 Holistic Health Courses to Learn in November 2024
- Top 10 Nutrition And Diet Planning Courses to Learn in November 2024
- Top 10 Yoga Instruction Courses to Learn in November 2024
- Top 10 Stress Management Courses to Learn in November 2024
- Top 10 Mindfulness Meditation Courses to Learn in November 2024
- Top 10 Life Coaching Courses to Learn in November 2024
- Top 10 Career Development Courses to Learn in November 2024
- Top 10 Relationship Building Courses to Learn in November 2024
- Top 10 Parenting Skills Courses to Learn in November 2024
- Top 10 Home Improvement Courses to Learn in November 2024
- Top 10 Gardening Courses to Learn in November 2024