Spark SQL & Hadoop (For Data Science)
Spark SQL & Hadoop (For Data Science), available at $69.99, has an average rating of 4.15, with 85 lectures, based on 70 reviews, and has 5567 subscribers.
You will learn about Students will get hands-on experience working in a Spark Hadoop environment that’s free and downloadable as part of this course. Students will have opportunities solve Data Engineering and Data Analysis Problems using Spark on a Hadoop cluster in the sandbox environment that comes as part Issuing HDFS commands. Converting a set of data values in a given format stored in HDFS into new data values or a new data format and writing them into HDFS. Loading data from HDFS for use in Spark applications & writing the results back into HDFS using Spark. Reading and writing files in a variety of file formats. Performing standard extract, transform, load (ETL) processes on data using the Spark API. Using metastore tables as an input source or an output sink for Spark applications. Applying the understanding of the fundamentals of querying datasets in Spark. Filtering data using Spark. Writing queries that calculate aggregate statistics. Joining disparate datasets using Spark. Producing ranked or sorted data. This course is ideal for individuals who are This course has been designed specifically for data scientists, big data analysts and data engineers looking to leverage the power of Hadoop and Apache Spark to make sense of big data. or This course is also well suited for university students and recent graduates that are keen to land a job with a company that’s looking to fill a big data-related positions or anyone who simply wants to apply their SQL skills in a big data environment using Spark-SQL. or Software engineers & developers who are looking to break into the Data Engineering field will also find this course helpful. It is particularly useful for This course has been designed specifically for data scientists, big data analysts and data engineers looking to leverage the power of Hadoop and Apache Spark to make sense of big data. or This course is also well suited for university students and recent graduates that are keen to land a job with a company that’s looking to fill a big data-related positions or anyone who simply wants to apply their SQL skills in a big data environment using Spark-SQL. or Software engineers & developers who are looking to break into the Data Engineering field will also find this course helpful.
Enroll now: Spark SQL & Hadoop (For Data Science)
Summary
Title: Spark SQL & Hadoop (For Data Science)
Price: $69.99
Average Rating: 4.15
Number of Lectures: 85
Number of Published Lectures: 85
Number of Curriculum Items: 85
Number of Published Curriculum Objects: 85
Original Price: $39.99
Quality Status: approved
Status: Live
What You Will Learn
- Students will get hands-on experience working in a Spark Hadoop environment that’s free and downloadable as part of this course.
- Students will have opportunities solve Data Engineering and Data Analysis Problems using Spark on a Hadoop cluster in the sandbox environment that comes as part
- Issuing HDFS commands.
- Converting a set of data values in a given format stored in HDFS into new data values or a new data format and writing them into HDFS.
- Loading data from HDFS for use in Spark applications & writing the results back into HDFS using Spark.
- Reading and writing files in a variety of file formats.
- Performing standard extract, transform, load (ETL) processes on data using the Spark API.
- Using metastore tables as an input source or an output sink for Spark applications.
- Applying the understanding of the fundamentals of querying datasets in Spark.
- Filtering data using Spark.
- Writing queries that calculate aggregate statistics.
- Joining disparate datasets using Spark.
- Producing ranked or sorted data.
Who Should Attend
- This course has been designed specifically for data scientists, big data analysts and data engineers looking to leverage the power of Hadoop and Apache Spark to make sense of big data.
- This course is also well suited for university students and recent graduates that are keen to land a job with a company that’s looking to fill a big data-related positions or anyone who simply wants to apply their SQL skills in a big data environment using Spark-SQL.
- Software engineers & developers who are looking to break into the Data Engineering field will also find this course helpful.
Target Audiences
- This course has been designed specifically for data scientists, big data analysts and data engineers looking to leverage the power of Hadoop and Apache Spark to make sense of big data.
- This course is also well suited for university students and recent graduates that are keen to land a job with a company that’s looking to fill a big data-related positions or anyone who simply wants to apply their SQL skills in a big data environment using Spark-SQL.
- Software engineers & developers who are looking to break into the Data Engineering field will also find this course helpful.
Apache Spark is currently one of the most popular systems for processing big data.
Apache Hadoop continues to be used by many organizations that look to store data locally on premises. Hadoop allows these organisations to efficiently store big datasets ranging in size from gigabytes to petabytes.
As the number of vacancies for data science, big data analysis and data engineering roles continue to grow, so too will the demand for individuals that possess knowledge of Spark and Hadoop technologies to fill these vacancies.
This course has been designed specifically for data scientists, big data analysts and data engineers looking to leverage the power of Hadoop and Apache Spark to make sense of big data.
This course will help those individuals that are looking to interactively analyse big data or to begin writing production applications to prepare data for further analysis using Spark SQL in a Hadoop environment.
The course is also well suited for university students and recent graduates that are keen to gain exposure to Spark & Hadoop or anyone who simply wants to apply their SQL skills in a big data environment using Spark-SQL.
This course has been designed to be concise and to provide students with a necessary and sufficient amount of theory, enough for them to be able to use Hadoop & Spark without getting bogged down in too much theory about older low-level APIs such as RDDs.
On solving the questions contained in this course students will begin to develop those skills & the confidence needed to handle real world scenarios that come their way in a production environment.
(a) There are just under 30 problems in this course. These cover hdfs commands, basic data engineering tasks and data analysis.
(b) Fully worked out solutions to all the problems.
(c) Also included is the Verulam Blue virtual machine which is an environment that has a spark Hadoop cluster already installed so that you can practice working on the problems.
-
The VM contains a Spark Hadoop environment which allows students to read and write data to & from the Hadoop file system as well as to store metastore tables on the Hive metastore.
-
All the datasets students will need for the problems are already loaded onto HDFS, so there is no need for students to do any extra work.
-
The VM also has Apache Zeppelin installed. This is a notebook specific to Spark and is similar to Python’s Jupyter notebook.
This course will allow students to get hands-on experience working in a Spark Hadoop environment as they practice:
-
Converting a set of data values in a given format stored in HDFS into new data values or a new data format and writing them into HDFS.
-
Loading data from HDFS for use in Spark applications & writing the results back into HDFS using Spark.
-
Reading and writing files in a variety of file formats.
-
Performing standard extract, transform, load (ETL) processes on data using the Spark API.
-
Using metastore tables as an input source or an output sink for Spark applications.
-
Applying the understanding of the fundamentals of querying datasets in Spark.
-
Filtering data using Spark.
-
Writing queries that calculate aggregate statistics.
-
Joining disparate datasets using Spark.
-
Producing ranked or sorted data.
Course Curriculum
Chapter 1: Introduction
Lecture 1: Introduction
Lecture 2: The Udemy Environment
Chapter 2: Introduction to Hadoop & Spark
Lecture 1: Section Introduction
Lecture 2: Big Data
Lecture 3: Distributed Storage & Processing
Lecture 4: Introduction to Hadoop
Lecture 5: Introduction to Spark
Lecture 6: Spark Applications
Lecture 7: Spark's Interactive Shell
Lecture 8: Distributed Processing on a Hadoop Cluster using Spark
Chapter 3: Our Working Environment
Lecture 1: Section Introduction
Lecture 2: Install Oracle VM VirtualBox
Lecture 3: The Verulam Blue VM – Zipped Files for Downloading
Lecture 4: Loading the Verulam Blue VM
Lecture 5: Booting up the VM
Lecture 6: Spin Up Cluster
Lecture 7: spark-shell
Lecture 8: Run Zeppelin Notebook
Lecture 9: Problems & practice test questions
Chapter 4: HDFS Basic File Management
Lecture 1: Interacting with HDFS
Lecture 2: The File System Shell (FS Shell)
Lecture 3: Commands and operations -help
Lecture 4: Commands and operations -ls
Lecture 5: Commands and operations -find
Lecture 6: Commands and operations -mkdir
Lecture 7: Commands and operations -put
Lecture 8: Commands and operations -cp -mv
Lecture 9: Commands and operations -cat -tail -text
Lecture 10: Commands and operations -rmdir -rm
Lecture 11: Commands and operations -get
Lecture 12: Health warning
Lecture 13: HDFS Basic File Management – Problems & Solutions
Chapter 5: Data Structures
Lecture 1: Section Introduction
Lecture 2: DataFrames
Lecture 3: Tables
Lecture 4: Temp Views
Chapter 6: Spark SQL & Creating Data Structures
Lecture 1: Section Introduction
Lecture 2: Querying Data Structures using SQL via Spark SQL
Lecture 3: Creating DataFrames with Spark SQL
Lecture 4: Creating Databases & Tables with Spark SQL
Lecture 5: Creating Temporary Views with Spark SQL
Chapter 7: Basic Operations on Data Structures
Lecture 1: Section Introduction
Lecture 2: Operations on DataFrame columns
Lecture 3: Operations on DataFrame rows
Lecture 4: Basic SQL queries for Tables
Chapter 8: Data Engineering
Lecture 1: Section Introduction
Lecture 2: The ETL Process
Lecture 3: The Extract Phase of an ETL process
Lecture 4: The Extract Phase – Loading CSV and Text files
Lecture 5: The Extract Phase – Loading JSON and Parquet files
Lecture 6: The Extract Phase – Loading Avro and ORC files
Lecture 7: The Transform Phase of an ETL process
Lecture 8: The Transform Phase – String Transformations
Lecture 9: The Transform Phase – Numerical Transformations
Lecture 10: The Transform Phase – Date & Time Transformations
Lecture 11: The Transform Phase – Data Type Transformations
Lecture 12: The Transform Phase – Transformations of Nulls
Lecture 13: The Load Phase of an ETL process
Lecture 14: The Load Phase – Saving DataFrame data to Files I
Lecture 15: The Load Phase – Saving DataFrame data to Files II
Lecture 16: The Load Phase – Saving DataFrame data to Tables
Lecture 17: Data Engineering – Solutions to Problems
Chapter 9: Data Analysis
Lecture 1: Section Introduction
Lecture 2: Metastore Tables as Input Sources or Output Sinks
Lecture 3: Querying datasets in Spark
Lecture 4: Math Functions in SQL
Lecture 5: Filtering
Lecture 6: Sorting & Ranking
Lecture 7: Aggregation
Lecture 8: Grouping
Lecture 9: Multi Table Queries
Lecture 10: Multi Table Queries – Joins
Lecture 11: Multi Table Queries – Types of Joins
Lecture 12: Multi Table Queries – Unions
Lecture 13: Data Analysis – Solutions to Problems
Chapter 10: End of Course Test Solutions
Lecture 1: End of Course Test Solutions
Chapter 11: Appendix – Hadoop Theory
Lecture 1: HDFS Architecture
Lecture 2: YARN Architecture
Chapter 12: Appendix – Spark Theory
Lecture 1: Components of a Spark application
Lecture 2: The Driver Process
Lecture 3: The Executor Process
Lecture 4: The Master Process
Lecture 5: The Spark Application Execution Model
Lecture 6: Deploying Spark Applications on Hadoop clusters
Chapter 13: ** BONUS SECTION **
Lecture 1: ** BONUS **
Instructors
-
Matthew Barr
Data Scientist | Founder of Verulam Blue
Rating Distribution
- 1 stars: 1 votes
- 2 stars: 1 votes
- 3 stars: 12 votes
- 4 stars: 12 votes
- 5 stars: 44 votes
Frequently Asked Questions
How long do I have access to the course materials?
You can view and review the lecture materials indefinitely, like an on-demand channel.
Can I take my courses with me wherever I go?
Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don’t have an internet connection, some instructors also let their students download course lectures. That’s up to the instructor though, so make sure you get on their good side!
You may also like
- Top 10 Video Editing Courses to Learn in November 2024
- Top 10 Music Production Courses to Learn in November 2024
- Top 10 Animation Courses to Learn in November 2024
- Top 10 Digital Illustration Courses to Learn in November 2024
- Top 10 Renewable Energy Courses to Learn in November 2024
- Top 10 Sustainable Living Courses to Learn in November 2024
- Top 10 Ethical AI Courses to Learn in November 2024
- Top 10 Cybersecurity Fundamentals Courses to Learn in November 2024
- Top 10 Smart Home Technology Courses to Learn in November 2024
- Top 10 Holistic Health Courses to Learn in November 2024
- Top 10 Nutrition And Diet Planning Courses to Learn in November 2024
- Top 10 Yoga Instruction Courses to Learn in November 2024
- Top 10 Stress Management Courses to Learn in November 2024
- Top 10 Mindfulness Meditation Courses to Learn in November 2024
- Top 10 Life Coaching Courses to Learn in November 2024
- Top 10 Career Development Courses to Learn in November 2024
- Top 10 Relationship Building Courses to Learn in November 2024
- Top 10 Parenting Skills Courses to Learn in November 2024
- Top 10 Home Improvement Courses to Learn in November 2024
- Top 10 Gardening Courses to Learn in November 2024