Washington University - Data Manipulation at Scale: Systems and Algorithms

Offered byCoursera
Public/Government Institute

Data Manipulation at Scale: Systems and Algorithms
at
Coursera
Overview

Duration	20 hours
Total fee	Free
Mode of learning	Online
Official Website	Explore Free Course
Credential	Certificate

Data Manipulation at Scale: Systems and Algorithms

Table of contents

Overview
Highlights
Course Details
Curriculum

Data Manipulation at Scale: Systems and Algorithms
at
Coursera
Highlights

Shareable Certificate Earn a Certificate upon completion
100% online Start instantly and learn at your own schedule.
Course 1 of 4 in the Data Science at Scale Specialization
Flexible deadlines Reset deadlines in accordance to your schedule.
Approx. 20 hours to complete
English Subtitles: Arabic, French, Portuguese (European), Italian, Vietnamese, German, Russian, English, Spanish

Data Manipulation at Scale: Systems and Algorithms
at
Coursera
Course details

Skills you will learn

Python Hbase Cloud Computing Data analysis Statistics Big Data Spark Data Modeling NoSQL Hadoop Data Science CouchDB

More about this course

Data analysis has replaced data acquisition as the bottleneck to evidence-based decision making --- we are drowning in it. Extracting knowledge from large, heterogeneous, and noisy datasets requires not only powerful computing resources, but the programming abstractions to use them effectively. The abstractions that emerged in the last decade blend ideas from parallel databases, distributed systems, and programming languages to create a new class of scalable data analytics platforms that form the foundation for data science at realistic scales.
In this course, you will learn the landscape of relevant systems, the principles on which they rely, their tradeoffs, and how to evaluate their utility against your requirements. You will learn how practical systems were derived from the frontier of research in computer science and what systems are coming on the horizon. Cloud computing, SQL and NoSQL databases, MapReduce and the ecosystem it spawned, Spark and its contemporaries, and specialized systems for graphs and arrays will be covered.
You will also learn the history and context of data science, the skills, challenges, and methodologies the term implies, and how to structure a data science project. At the end of this course, you will be able to:
Learning Goals:
1. Describe common patterns, challenges, and approaches associated with data science projects, and what makes them different from projects in related fields.
2. Identify and use the programming models associated with scalable data manipulation, including relational algebra, mapreduce, and other data flow models.
3. Use database technology adapted for large-scale analytics, including the concepts driving parallel databases, parallel query processing, and in-database analytics
4. Evaluate key-value stores and NoSQL systems, describe their tradeoffs with comparable systems, the details of important examples in the space, and future trends.
5. ?Think? in MapReduce to effectively write algorithms for systems including Hadoop and Spark. You will understand their limitations, design details, their relationship to databases, and their associated ecosystem of algorithms, extensions, and languages.
write programs in Spark
6. Describe the landscape of specialized Big Data systems for graphs, arrays, and streams

Data Manipulation at Scale: Systems and Algorithms
at
Coursera
Curriculum

Data Science Context and Concepts

Appetite Whetting: Politics

Appetite Whetting: Extreme Weather

Appetite Whetting: Digital Humanities

Appetite Whetting: Bibliometrics

Appetite Whetting: Food, Music, Public Health

Appetite Whetting: Public Health cont'd, Earthquakes, Legal

Characterizing Data Science

Characterizing Data Science, cont'd

Distinguishing Data Science from Related Topics

Four Dimensions of Data Science

Tools vs. Abstractions

Desktop Scale vs. Cloud Scale

Hackers vs. Analysts

Structs vs. Stats

Structs vs. Stats cont'd

A Fourth Paradigm of Science

Data-Intensive Science Examples

Big Data and the 3 Vs

Big Data Definitions

Big Data Sources

Course Logistics

Twitter Assignment: Getting Started

Supplementary: Three-Course Reading List

Supplementary: Resources for Learning Python

Supplementary: Class Virtual Machine

Supplementary: Github Instructions

Relational Databases and the Relational Algebra

Data Models, Terminology

From Data Models to Databases

Pre-Relational Databases

Motivating Relational Databases

Relational Databases: Key Ideas

Algebraic Optimization Overview

Relational Algebra Overview

Relational Algebra Operators: Union, Difference, Selection

Relational Algebra Operators: Projection, Cross Product

Relational Algebra Operators: Cross Product cont'd, Join

Relational Algebra Operators: Outer Join

Relational Algebra Operators: Theta-Join

From SQL to RA

Thinking in RA: Logical Query Plans

Practical SQL: Binning Timeseries

Practical SQL: Genomic Intervals

User-Defined Functions

Support for User-Defined Functions

Optimization: Physical Query Plans

Optimization: Choosing Physical Plans

Declarative Languages

Declarative Languages: More Examples

Views: Logical Data Independence

Indexes

MapReduce and Parallel Dataflow Programming

What Does Scalable Mean?

A Sketch of Algorithmic Complexity

A Sketch of Data-Parallel Algorithms

"Pleasingly Parallel" Algorithms

More General Distributed Algorithms

MapReduce Abstraction

MapReduce Data Model

Map and Reduce Functions

MapReduce Simple Example

MapReduce Simple Example cont'd

MapReduce Example: Word Length Histogram

MapReduce Examples: Inverted Index, Join

Relational Join: Map Phase

Relational Join: Reduce Phase

Simple Social Network Analysis: Counting Friends

Matrix Multiply Overview

Matrix Multiply Illustrated

Shared Nothing Computing

MapReduce Implementation

MapReduce Phases

A Design Space for Large-Scale Data Systems

Parallel and Distributed Query Processing

Teradata Example, MR Extensions

RDBMS vs. MapReduce: Features

RDBMS vs. Hadoop: Grep

RDBMS vs. Hadoop: Select, Aggregate, Join

NoSQL: Systems and Concepts

NoSQL Context and Roadmap

NoSQL Roundup

Relaxing Consistency Guarantees

Two-Phase Commit and Consensus Protocols

Eventual Consistency

CAP Theorem

Types of NoSQL Systems

ACID, Major Impact Systems

Memcached: Consistent Hashing

Consistent Hashing, cont'd

DynamoDB: Vector Clocks

Vector Clocks, cont'd

CouchDB Overview

CouchB Views

BigTable Overview

BigTable Implementation

HBase, Megastore

Spanner

Spanner cont'd, Google Systems

MapReduce-based Systems

Bringing Back Joins

NoSQL Rebuttal

Almost SQL: Pig

Pig Architecture and Performance

Data Model

Load, Filter, Group

Group, Distinct, Foreach, Flatten

CoGroup, Join

Join Algorithms

Skew

Other Commands

Evaluation Walkthrough

Review

Context

Spark Examples

RDDs, Benefits

Graph Overview

Structural Analysis

Degree Histograms, Structure of the Web

Connectivity and Centrality

PageRank

PageRank in more Detail

Traversal Tasks: Spanning Trees and Circuits

Traversal Tasks: Maximum Flow

Pattern Matching

Querying Edge Tables

Relational Algebra and Datalog for Graphs

Querying Hybrid Graph/Relational Data

Graph Query Example: NSA

Graph Query Example: Recursion

Evaluation of Recursive Programs

Recursive Queries in MapReduce

The End-Game Problem

Representation: Edge Table, Adjacency List

Representation: Adjacency Matrix

PageRank in MapReduce

PageRank in Pregel

Other courses offered by Coursera

Databases and SQL for Data Science with Python

IBM - Institute of Business ManagementCertificate

Total Fees

– / –

Duration

3 months

Databases and SQL for Data Science with Python

IBM - Institute of Business ManagementCertificate

Total Fees

– / –

Duration

20 hours

Skills

Python RDBMS

Machine Learning for Marketing Specialization

CourseraCertificate

Total Fees

– / –

Duration

3 months

Skills

Data analysis

Learn SQL Basics for Data Science Specialization

UC DavisCertificate

Total Fees

– / –

Duration

2 months

Skills

Data analysis MySQL Apache

View Other 6709 Courses

Data Manipulation at Scale: Systems and Algorithms

Coursera

Student Forum

Anything you would want to ask experts?

Write here...

CourseraCoursesData Manipulation at Scale: Systems and Algorithms

Data Manipulation at Scale: Systems and Algorithms
at
Coursera

News & Updates

Latest

Anangsha Patra · Jul 29, 2026

4.1L views

Last 5 Years CBSE Class 12 Physics Question Papers PDF Download: Check Important Topics & Weightages

Anangsha Patra · Jul 28, 2026

1.5M views · 8 comments

CBSE Class 12 Chemistry Question Paper 2026 (Available) PDF Download

Anangsha Patra · Jul 28, 2026

21.8K views

CBSE Exam Class 12 2026: Compartment Admit Card (Released), Exam Date (July 28), Result

Anangsha Patra · Jul 28, 2026

1.9L views

CBSE 12th Question Papers 2027: Download Question Papers PDFs Here

Anangsha Patra · Jul 28, 2026

28.1K views

Useful Links

Know more about Coursera

All About Coursera

Courses 2026

Reviews on Placements, Faculty & Facilities

Know more about Programs

Engineering

Food Technology

Instrumentation Technology

BTech Chemical Engineering

AI & ML Courses

Aeronautical Engineering

BTech Petroleum Engineering

Petroleum Engineering

VLSI Design

MTech in Computer Science Engineering

Metallurgical Engineering

BTech Robotics Engineering

BTech in Biotechnology Engineering

Aerospace Engineering

BTech Mechatronics Engineering

Washington University - Data Manipulation at Scale: Systems and Algorithms

Data Manipulation at Scale: Systems and Algorithms at Coursera Overview

Data Manipulation at Scale: Systems and Algorithms at Coursera Highlights

Data Manipulation at Scale: Systems and Algorithms at Coursera Course details

Data Manipulation at Scale: Systems and Algorithms at Coursera Curriculum

Other courses offered by Coursera

Databases and SQL for Data Science with Python

Databases and SQL for Data Science with Python

Machine Learning for Marketing Specialization

Learn SQL Basics for Data Science Specialization

Student Forum

Data Manipulation at Scale: Systems and Algorithms at Coursera News & Updates

Useful Links

Know more about Coursera

Know more about Programs

Data Manipulation at Scale: Systems and Algorithms
at
Coursera
Overview

Data Manipulation at Scale: Systems and Algorithms
at
Coursera
Highlights

Data Manipulation at Scale: Systems and Algorithms
at
Coursera
Course details

Data Manipulation at Scale: Systems and Algorithms
at
Coursera
Curriculum

Data Manipulation at Scale: Systems and Algorithms
at
Coursera

News & Updates