Learning PySpark
Learning PySpark, available at $49.99, has an average rating of 3.95, with 49 lectures, based on 184 reviews, and has 625 subscribers.
You will learn about Learn about Apache Spark and the Spark 2.0 architecture. Understand schemas for RDD, lazy executions, and transformations. Explore the sorting and saving elements of RDD. Build and interact with Spark DataFrames using Spark SQL Create and explore various APIs to work with Spark DataFrames. Learn how to change the schema of a DataFrame programmatically. Explore how to aggregate, transform, and sort data with DataFrames. This course is ideal for individuals who are If you are a Python developer keen to master hands-on techniques using the Apache Spark 2.x ecosystem in the best possible manner, this video is for you. It is particularly useful for If you are a Python developer keen to master hands-on techniques using the Apache Spark 2.x ecosystem in the best possible manner, this video is for you.
Enroll now: Learning PySpark
Summary
Title: Learning PySpark
Price: $49.99
Average Rating: 3.95
Number of Lectures: 49
Number of Published Lectures: 49
Number of Curriculum Items: 49
Number of Published Curriculum Objects: 49
Original Price: $109.99
Quality Status: approved
Status: Live
What You Will Learn
- Learn about Apache Spark and the Spark 2.0 architecture.
- Understand schemas for RDD, lazy executions, and transformations.
- Explore the sorting and saving elements of RDD.
- Build and interact with Spark DataFrames using Spark SQL
- Create and explore various APIs to work with Spark DataFrames.
- Learn how to change the schema of a DataFrame programmatically.
- Explore how to aggregate, transform, and sort data with DataFrames.
Who Should Attend
- If you are a Python developer keen to master hands-on techniques using the Apache Spark 2.x ecosystem in the best possible manner, this video is for you.
Target Audiences
- If you are a Python developer keen to master hands-on techniques using the Apache Spark 2.x ecosystem in the best possible manner, this video is for you.
Apache Spark is an open-source distributed engine for querying and processing data. In this tutorial, we provide a brief overview of Spark and its stack. This tutorial presents effective, time-saving techniques on how to leverage the power of Python and put it to use in the Spark ecosystem. You will start by getting a firm understanding of the Apache Spark architecture and how to set up a Python environment for Spark.
You’ll learn about different techniques for collecting data, and distinguish between (and understand) techniques for processing data. Next, we provide an in-depth review of RDDs and contrast them with DataFrames. We provide examples of how to read data from files and from HDFS and how to specify schemas using reflection or programmatically (in the case of DataFrames). The concept of lazy execution is described and we outline various transformations and actions specific to RDDs and DataFrames.
Finally, we show you how to use SQL to interact with DataFrames. By the end of this tutorial, you will have learned how to process data using Spark DataFrames and mastered data collection techniques by distributed data processing.
About the Author
Tomasz Drabas is a Data Scientist working for Microsoft and currently residing in the Seattle area. He has over 12 years’ international experience in data analytics and data science in numerous fields: advanced technology, airlines, telecommunications, finance, and consulting.
Tomasz started his career in 2003 with LOT Polish Airlines in Warsaw, Poland while finishing his Master’s degree in strategy management. In 2007, he moved to Sydney to pursue a doctoral degree in operations research at the University of New South Wales, School of Aviation; his research crossed boundaries between discrete choice modeling and airline operations research. During his time in Sydney, he worked as a Data Analyst for Beyond Analysis Australia and as a Senior Data Analyst/Data Scientist for Vodafone Hutchison Australia among others. He has also published scientific papers, attended international conferences, and served as a reviewer for scientific journals.
In 2015 he relocated to Seattle to begin his work for Microsoft. While there, he has worked on numerous projects involving solving problems in high-dimensional feature space.
Course Curriculum
Chapter 1: A Brief Primer on PySpark
Lecture 1: The Course Overview
Lecture 2: Brief Introduction to Spark
Lecture 3: Apache Spark Stack
Lecture 4: Spark Execution Process
Lecture 5: Newest Capabilities of PySpark 2.0+
Lecture 6: Cloning GitHub Repository
Chapter 2: Resilient Distributed Datasets
Lecture 1: Brief Introduction to RDDs
Lecture 2: Creating RDDs
Lecture 3: Schema of an RDD
Lecture 4: Understanding Lazy Execution
Lecture 5: Introducing Transformations – .map(…)
Lecture 6: Introducing Transformations – .filter(…)
Lecture 7: Introducing Transformations – .flatMap(…)
Lecture 8: Introducing Transformations – .distinct(…)
Lecture 9: Introducing Transformations – .sample(…)
Lecture 10: Introducing Transformations – .join(…)
Lecture 11: Introducing Transformations – .repartition(…)
Chapter 3: Resilient Distributed Datasets and Actions
Lecture 1: Introducing Actions – .take(…)
Lecture 2: Introducing Actions – .collect(…)
Lecture 3: Introducing Actions – .reduce(…) and .reduceByKey(…)
Lecture 4: Introducing Actions – .count()
Lecture 5: Introducing Actions – .foreach(…)
Lecture 6: Introducing Actions – .aggregate(…) and .aggregateByKey(…)
Lecture 7: Introducing Actions – .coalesce(…)
Lecture 8: Introducing Actions – .combineByKey(…)
Lecture 9: Introducing Actions – .histogram(…)
Lecture 10: Introducing Actions – .sortBy(…)
Lecture 11: Introducing Actions – Saving Data
Lecture 12: Introducing Actions – Descriptive Statistics
Chapter 4: DataFrames and Transformations
Lecture 1: Introduction
Lecture 2: Creating DataFrames
Lecture 3: Specifying Schema of a DataFrame
Lecture 4: Interacting with DataFrames
Lecture 5: The .agg(…) Transformation
Lecture 6: The .sql(…) Transformation
Lecture 7: Creating Temporary Tables
Lecture 8: Joining Two DataFrames
Lecture 9: Performing Statistical Transformations
Lecture 10: The .distinct(…) Transformation
Chapter 5: Data Processing with Spark DataFrames
Lecture 1: Schema Changes
Lecture 2: Filtering Data
Lecture 3: Aggregating Data
Lecture 4: Selecting Data
Lecture 5: Transforming Data
Lecture 6: Presenting Data
Lecture 7: Sorting DataFrames
Lecture 8: Saving DataFrames
Lecture 9: Pitfalls of UDFs
Lecture 10: Repartitioning Data
Instructors
-
Packt Publishing
Tech Knowledge in Motion
Rating Distribution
- 1 stars: 19 votes
- 2 stars: 11 votes
- 3 stars: 46 votes
- 4 stars: 63 votes
- 5 stars: 45 votes
Frequently Asked Questions
How long do I have access to the course materials?
You can view and review the lecture materials indefinitely, like an on-demand channel.
Can I take my courses with me wherever I go?
Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don’t have an internet connection, some instructors also let their students download course lectures. That’s up to the instructor though, so make sure you get on their good side!
You may also like
- Top 10 Video Editing Courses to Learn in November 2024
- Top 10 Music Production Courses to Learn in November 2024
- Top 10 Animation Courses to Learn in November 2024
- Top 10 Digital Illustration Courses to Learn in November 2024
- Top 10 Renewable Energy Courses to Learn in November 2024
- Top 10 Sustainable Living Courses to Learn in November 2024
- Top 10 Ethical AI Courses to Learn in November 2024
- Top 10 Cybersecurity Fundamentals Courses to Learn in November 2024
- Top 10 Smart Home Technology Courses to Learn in November 2024
- Top 10 Holistic Health Courses to Learn in November 2024
- Top 10 Nutrition And Diet Planning Courses to Learn in November 2024
- Top 10 Yoga Instruction Courses to Learn in November 2024
- Top 10 Stress Management Courses to Learn in November 2024
- Top 10 Mindfulness Meditation Courses to Learn in November 2024
- Top 10 Life Coaching Courses to Learn in November 2024
- Top 10 Career Development Courses to Learn in November 2024
- Top 10 Relationship Building Courses to Learn in November 2024
- Top 10 Parenting Skills Courses to Learn in November 2024
- Top 10 Home Improvement Courses to Learn in November 2024
- Top 10 Gardening Courses to Learn in November 2024