CPSC 436C Cloud Computing for Data Science

CPSC 436C: CLOUD COMPUTING FOR DATA SCIENCE

Fall 2024

COURSE DESCRIPTION

This course is an introduction to cloud computing designed for the students who wish to use the cloud for data science applications. It covers the topics of how cloud computing can be used to support data science workflows, including data storage, processing, analysis, and visualization. It also includes security considerations for the entire pipeline. Overall, the course provides students with the skills and knowledge necessary to effectively use cloud computing for design, implementation, test, and deployment of data science applications.


LECTURES & CLASSROOMS

Lecture Time : Tuesday/Thursday - 11AM -12:30PM

Classroom: CEME-Floor 1-Room 1202

Office hours: Tuesday, 1PM - 3PM


TEXTBOOKS

The course will rely mainly on the following textbook.

  •  Learning Spark: Lightning-fast Data Analytics by: Jules Damji,                         
    Brooke Wenig, Tathagata Das 
TOPICS
  • Cloud service delivery models
  • Cloud storage systems
  • Batch processing
  • Stream processing
  • Cloud security



TA TEAM

Mahmoud Alkhatib, Email: makhatib@students.cs.ubc.ca 

Arman Moztarzade, Email: arman88@student.ubc.ca

Sopuruchi Chisom, Email: schisom@student.ubc.ca 

Chen ZHao, Email: laozc@student.ubc.ca  


SYLLABUS

Download the syllabus (v1.0)

 

HANDOUT

Lecture 1

Introduction to Data centres and Cloud [SLIDES]

Lecture 2

Function as a service & Containerization [SLIDES][SLIDES]

Lecture 3

Virtualization [SLIDES]

Lecture 4

Big Data [SLIDES]

Lecture 5

Data Stores [SLIDES1] , [SLIDES2] , [SLIDES3]

Lecture 6

Data Management Systems [SLIDES]

Lecture 7

Data Processing [SLIDES1] , [SLIDES2]

Lecture 8

Structured Data Processing [SLIDES]

Machine Learning [SLIDES]

Lecture 9

Stream Processing [SLIDES1] , [SLIDES2]

Lecture 10

Graph Processing  [SLIDES] 

Lecture 11

Resource Management [SLIDES]

Lecture 12

Cloud Security [SLIDES1][SLIDES2]

Lecture 13

Guest speaker for Cloud Security

Lecture 14

Guest speaker for Advanced Topics

Lecture 15

Course wrap up

Assignments

 

  • Assignment 0: Go Serverless (5%); [AWS] [Azure] [Rubric]
  • Assignment 1: Containerization Vs. Serverless (5%);  [AWS] [Azure] [Rubric]
  • Assignment 2: Running Image recognition on a Virtual Machine (5%);  [AWS] [Azure] [Rubric]
  • Assignment 3: Running image recognition in a VM using Object Store (5%);  [AWS] [Azure] [Rubric]
  • Assignment 4: Building a Machine Learning Pipeline through Jupyter Notebook (5%);  [AWS] [Azure][Rubric]

 

Lab Tutorials

 

  • Tutorial 1: Setting up a Cloud Account [AWS] [Azure] 
  • Tutorial 2: Cloud Computing Services and Setup [AWS] [Azure] 
  • Tutorial 3: Cloud Storage Services and Setup [AWS] [Azure] 
  • Tutorial 4: Setting up a spark and spark streaming on a local machine [AWS] [Azure] 
  • Tutorial 5: Comparing Single Node vs Cluster Performance in Image Classification [AWS] [Azure]

 

Project

In teams of up to 5, students will design and implement a cloud-based data processing pipeline using one storage system, one computing engine, and one database to store or visualize the results. The project topic is student-chosen but must cover the entire pipeline, giving students the opportunity to apply the practical skills learned in the course to solve a real-world problem. Evaluation will focus on the quality of the architecture, cost-efficiency, performance, and security. The project will culminate in a final presentation at the end of the semester [Project Description Document]

Recourses