Learn By Example: Hadoop, MapReduce for Big Data problems
Learn By Example: Hadoop, MapReduce for Big Data problems, available at $89.99, has an average rating of 4.75, with 74 lectures, based on 1110 reviews, and has 9904 subscribers.
You will learn about Develop advanced MapReduce applications to process BigData Master the art of "thinking parallel" – how to break up a task into Map/Reduce transformations Self-sufficiently set up their own mini-Hadoop cluster whether it's a single node, a physical cluster or in the cloud. Use Hadoop + MapReduce to solve a wide variety of problems : from NLP to Inverted Indices to Recommendations Understand HDFS, MapReduce and YARN and how they interact with each other Understand the basics of performance tuning and managing your own cluster This course is ideal for individuals who are Yep! Analysts who want to leverage the power of HDFS where traditional databases don't cut it anymore or Yep! Engineers who want to develop complex distributed computing applications to process lot's of data or Yep! Data Scientists who want to add MapReduce to their bag of tricks for processing data It is particularly useful for Yep! Analysts who want to leverage the power of HDFS where traditional databases don't cut it anymore or Yep! Engineers who want to develop complex distributed computing applications to process lot's of data or Yep! Data Scientists who want to add MapReduce to their bag of tricks for processing data.
Enroll now: Learn By Example: Hadoop, MapReduce for Big Data problems
Summary
Title: Learn By Example: Hadoop, MapReduce for Big Data problems
Price: $89.99
Average Rating: 4.75
Number of Lectures: 74
Number of Published Lectures: 73
Number of Curriculum Items: 74
Number of Published Curriculum Objects: 73
Original Price: $89.99
Quality Status: approved
Status: Live
What You Will Learn
- Develop advanced MapReduce applications to process BigData
- Master the art of "thinking parallel" – how to break up a task into Map/Reduce transformations
- Self-sufficiently set up their own mini-Hadoop cluster whether it's a single node, a physical cluster or in the cloud.
- Use Hadoop + MapReduce to solve a wide variety of problems : from NLP to Inverted Indices to Recommendations
- Understand HDFS, MapReduce and YARN and how they interact with each other
- Understand the basics of performance tuning and managing your own cluster
Who Should Attend
- Yep! Analysts who want to leverage the power of HDFS where traditional databases don't cut it anymore
- Yep! Engineers who want to develop complex distributed computing applications to process lot's of data
- Yep! Data Scientists who want to add MapReduce to their bag of tricks for processing data
Target Audiences
- Yep! Analysts who want to leverage the power of HDFS where traditional databases don't cut it anymore
- Yep! Engineers who want to develop complex distributed computing applications to process lot's of data
- Yep! Data Scientists who want to add MapReduce to their bag of tricks for processing data
Taught by a 4 person team including 2 Stanford-educated, ex-Googlers and 2 ex-Flipkart Lead Analysts. This team has decades of practical experience in working with Java and with billions of rows of data.
This course is a zoom-in, zoom-out, hands-on workout involving Hadoop, MapReduce and the art of thinking parallel.
Let’s parse that.
Zoom-in, Zoom-Out: This course is both broad and deep. It covers the individual components of Hadoop in great detail, and also gives you a higher level picture of how they interact with each other.
Hands-on workout involving Hadoop, MapReduce :This course will get you hands-on with Hadoop very early on. You’ll learn how to set up your own cluster using both VMs and the Cloud. All the major features of MapReduce are covered – including advanced topics like Total Sort and Secondary Sort.
The art of thinking parallel:MapReduce completely changed the way people thought about processing Big Data. Breaking down any problem into parallelizable units is an art. The examples in this course will train you to “think parallel”.
What’s Covered:
Lot’s of cool stuff ..
- Using MapReduce to
- Recommend friends in a Social Networking site:Generate Top 10 friend recommendations using a Collaborative filtering algorithm.
- Build an Inverted Index for Search Engines:Use MapReduce to parallelize the humongous task of building an inverted index for a search engine.
- Generate Bigrams from text:Generate bigrams and compute their frequency distribution in a corpus of text.
- Build your Hadoop cluster:
- Install Hadoop in Standalone, Pseudo-Distributed and Fully Distributed modes
- Set up a hadoop cluster using Linux VMs.
- Set up a cloud Hadoop cluster on AWS with Cloudera Manager.
- Understand HDFS, MapReduce and YARN and their interaction
- Customize your MapReduce Jobs:
- Chain multiple MR jobs together
- Write your own Customized Partitioner
- Total Sort :Globally sort a large amount of data by sampling input files
- Secondary sorting
- Unit tests with MR Unit
- Integrate with Python using the Hadoop Streaming API
.. and of course all the basics:
- MapReduce :Mapper, Reducer, Sort/Merge, Partitioning, Shuffle and Sort
- HDFS & YARN:Namenode, Datanode, Resource manager, Node manager, the anatomy of a MapReduce application, YARN Scheduling, Configuring HDFS and YARN to performance tune your cluster.
Course Curriculum
Chapter 1: Introduction
Lecture 1: You, this course and Us
Chapter 2: Why is Big Data a Big Deal
Lecture 1: The Big Data Paradigm
Lecture 2: Serial vs Distributed Computing
Lecture 3: What is Hadoop?
Lecture 4: HDFS or the Hadoop Distributed File System
Lecture 5: MapReduce Introduced
Lecture 6: YARN or Yet Another Resource Negotiator
Chapter 3: Installing Hadoop in a Local Environment
Lecture 1: Hadoop Install Modes
Lecture 2: Hadoop Standalone mode Install
Lecture 3: Hadoop Pseudo-Distributed mode Install
Chapter 4: The MapReduce "Hello World"
Lecture 1: The basic philosophy underlying MapReduce
Lecture 2: MapReduce – Visualized And Explained
Lecture 3: MapReduce – Digging a little deeper at every step
Lecture 4: "Hello World" in MapReduce
Lecture 5: The Mapper
Lecture 6: The Reducer
Lecture 7: The Job
Chapter 5: Run a MapReduce Job
Lecture 1: Get comfortable with HDFS
Lecture 2: Run your first MapReduce Job
Chapter 6: Juicing your MapReduce – Combiners, Shuffle and Sort and The Streaming API
Lecture 1: Parallelize the reduce phase – use the Combiner
Lecture 2: Not all Reducers are Combiners
Lecture 3: How many mappers and reducers does your MapReduce have?
Lecture 4: Parallelizing reduce using Shuffle And Sort
Lecture 5: MapReduce is not limited to the Java language – Introducing the Streaming API
Lecture 6: Python for MapReduce
Chapter 7: HDFS and Yarn
Lecture 1: HDFS – Protecting against data loss using replication
Lecture 2: HDFS – Name nodes and why they're critical
Lecture 3: HDFS – Checkpointing to backup name node information
Lecture 4: Yarn – Basic components
Lecture 5: Yarn – Submitting a job to Yarn
Lecture 6: Yarn – Plug in scheduling policies
Lecture 7: Yarn – Configure the scheduler
Chapter 8: MapReduce Customizations For Finer Grained Control
Lecture 1: Setting up your MapReduce to accept command line arguments
Lecture 2: The Tool, ToolRunner and GenericOptionsParser
Lecture 3: Configuring properties of the Job object
Lecture 4: Customizing the Partitioner, Sort Comparator, and Group Comparator
Chapter 9: The Inverted Index, Custom Data Types for Keys, Bigram Counts and Unit Tests!
Lecture 1: The heart of search engines – The Inverted Index
Lecture 2: Generating the inverted index using MapReduce
Lecture 3: Custom data types for keys – The Writable Interface
Lecture 4: Represent a Bigram using a WritableComparable
Lecture 5: MapReduce to count the Bigrams in input text
Lecture 6: Setting up your Hadoop project
Lecture 7: Test your MapReduce job using MRUnit
Chapter 10: Input and Output Formats and Customized Partitioning
Lecture 1: Introducing the File Input Format
Lecture 2: Text And Sequence File Formats
Lecture 3: Data partitioning using a custom partitioner
Lecture 4: Make the custom partitioner real in code
Lecture 5: Total Order Partitioning
Lecture 6: Input Sampling, Distribution, Partitioning and configuring these
Lecture 7: Secondary Sort
Chapter 11: Recommendation Systems using Collaborative Filtering
Lecture 1: Introduction to Collaborative Filtering
Lecture 2: Friend recommendations using chained MR jobs
Lecture 3: Get common friends for every pair of users – the first MapReduce
Lecture 4: Top 10 friend recommendation for every user – the second MapReduce
Chapter 12: Hadoop as a Database
Lecture 1: Structured data in Hadoop
Lecture 2: Running an SQL Select with MapReduce
Lecture 3: Running an SQL Group By with MapReduce
Lecture 4: A MapReduce Join – The Map Side
Lecture 5: A MapReduce Join – The Reduce Side
Lecture 6: A MapReduce Join – Sorting and Partitioning
Lecture 7: A MapReduce Join – Putting it all together
Chapter 13: K-Means Clustering
Lecture 1: What is K-Means Clustering?
Lecture 2: A MapReduce job for K-Means Clustering
Lecture 3: K-Means Clustering – Measuring the distance between points
Lecture 4: K-Means Clustering – Custom Writables for Input/Output
Lecture 5: K-Means Clustering – Configuring the Job
Lecture 6: K-Means Clustering – The Mapper and Reducer
Lecture 7: K-Means Clustering : The Iterative MapReduce Job
Chapter 14: Setting up a Hadoop Cluster
Lecture 1: Manually configuring a Hadoop cluster (Linux VMs)
Lecture 2: Getting started with Amazon Web Servicies
Lecture 3: Start a Hadoop Cluster with Cloudera Manager on AWS
Chapter 15: Appendix
Lecture 1: Setup a Virtual Linux Instance (For Windows users)
Lecture 2: [For Linux/Mac OS Shell Newbies] Path and other Environment Variables
Instructors
-
Loony Corn
An ex-Google, Stanford and Flipkart team
Rating Distribution
- 1 stars: 19 votes
- 2 stars: 34 votes
- 3 stars: 126 votes
- 4 stars: 427 votes
- 5 stars: 504 votes
Frequently Asked Questions
How long do I have access to the course materials?
You can view and review the lecture materials indefinitely, like an on-demand channel.
Can I take my courses with me wherever I go?
Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don’t have an internet connection, some instructors also let their students download course lectures. That’s up to the instructor though, so make sure you get on their good side!
You may also like
- Top 10 Video Editing Courses to Learn in November 2024
- Top 10 Music Production Courses to Learn in November 2024
- Top 10 Animation Courses to Learn in November 2024
- Top 10 Digital Illustration Courses to Learn in November 2024
- Top 10 Renewable Energy Courses to Learn in November 2024
- Top 10 Sustainable Living Courses to Learn in November 2024
- Top 10 Ethical AI Courses to Learn in November 2024
- Top 10 Cybersecurity Fundamentals Courses to Learn in November 2024
- Top 10 Smart Home Technology Courses to Learn in November 2024
- Top 10 Holistic Health Courses to Learn in November 2024
- Top 10 Nutrition And Diet Planning Courses to Learn in November 2024
- Top 10 Yoga Instruction Courses to Learn in November 2024
- Top 10 Stress Management Courses to Learn in November 2024
- Top 10 Mindfulness Meditation Courses to Learn in November 2024
- Top 10 Life Coaching Courses to Learn in November 2024
- Top 10 Career Development Courses to Learn in November 2024
- Top 10 Relationship Building Courses to Learn in November 2024
- Top 10 Parenting Skills Courses to Learn in November 2024
- Top 10 Home Improvement Courses to Learn in November 2024
- Top 10 Gardening Courses to Learn in November 2024