Coursera
Coursera Logo

Big Data Analysis with Scala and Spark 

  • Offered byCoursera
  • Public/Government Institute

Big Data Analysis with Scala and Spark
 at 
Coursera 
Overview

Duration

28 hours

Total fee

Free

Mode of learning

Online

Difficulty level

Intermediate

Official Website

Explore Free Course External Link Icon

Credential

Certificate

Big Data Analysis with Scala and Spark
Table of content
Accordion Icon V3

Big Data Analysis with Scala and Spark
 at 
Coursera 
Highlights

  • This Course Plus the Full Specialization.
  • Self-Paced Learning Option.
  • Graded Programming Assignments.
Details Icon

Big Data Analysis with Scala and Spark
 at 
Coursera 
Course details

More about this course
  • Manipulating big data distributed over a cluster using functional concepts is rampant in industry, and is arguably one of the first widespread industrial uses of functional ideas. This is evidenced by the popularity of MapReduce and Hadoop, and most recently Apache Spark, a fast, in-memory distributed collections framework written in Scala. In this course, we'll see how the data parallel paradigm can be extended to the distributed case, using Spark throughout. We'll cover Spark's programming model in detail, being careful to understand how and when it differs from familiar programming models, like shared-memory parallel collections or sequential Scala collections. Through hands-on examples in Spark and Scala, we'll learn when important issues related to distribution like latency and network communication should be considered and how they can be addressed effectively for improved performance.
  • Learning Outcomes. By the end of this course you will be able to:
  • - read data from persistent storage and load it into Apache Spark,
  • - manipulate data with Spark and Scala,
  • - express algorithms for data analysis in a functional style,
  • - recognize how to avoid shuffles and recomputation in Spark,
  • Recommended background: You should have at least one year programming experience. Proficiency with Java or C# is ideal, but experience with other languages such as C/C++, Python, Javascript or Ruby is also sufficient. You should have some familiarity using the command line. This course is intended to be taken after Parallel Programming: https://www.coursera.org/learn/parprog1.
Read more

Big Data Analysis with Scala and Spark
 at 
Coursera 
Curriculum

Getting Started + Spark Basics

Introduction, Logistics, What You'll Learn

Data-Parallel to Distributed Data-Parallel

Latency

RDDs, Spark's Distributed Collection

RDDs: Transformation and Actions

Evaluation in Spark: Unlike Scala Collections!

Cluster Topology Matters!

Tools setup

Sbt tutorial

Intellij IDEA Tutorial

Eclipse tutorial

Submitting solutions

Reduction Operations & Distributed Key-Value Pairs

Reduction Operations

Pair RDDs

Transformations and Actions on Pair RDDs

Joins

Partitioning and Shuffling

Shuffling: What it is and why it's important

Partitioning

Optimizing with Partitioners

Wide vs Narrow Dependencies

Structured data: SQL, Dataframes, and Datasets

Structured vs Unstructured Data

Spark SQL

DataFrames (1)

DataFrames (2)

Datasets

Other courses offered by Coursera

– / –
3 months
Beginner
– / –
20 hours
Beginner
– / –
2 months
Beginner
– / –
3 months
Beginner
View Other 6716 CoursesRight Arrow Icon

Big Data Analysis with Scala and Spark
 at 
Coursera 
Students Ratings & Reviews

4/5
Verified Icon2 Ratings
C
Chitroju Narayanacharyulu
Big Data Analysis with Scala and Spark
Offered by Coursera
4
Learning Experience: Learning experience was good
Faculty: They teach from basics for good understanding Yes its updated, the practice exercises are well organized
Course Support: No career support provided
Reviewed on 1 Jun 2022Read More
Thumbs Up IconThumbs Down Icon
View 1 ReviewRight Arrow Icon
qna

Big Data Analysis with Scala and Spark
 at 
Coursera 

Student Forum

chatAnything you would want to ask experts?
Write here...