Courses
DSC 232R: Big Data Analytics Using Spark
Course Information
Course Type
Core
Course Description
(Prereq. DSC 255R)
Techniques for achieving scalability in data analysis, using tools such as MapReduce, Hadoop and Spark. Minimizing bottlenecks in massive parallel computations using the Spark framework. Perform supervised and unsupervised machine learning on massive datasets using the Machine Learning Library (MLlib); Programming Spark using Pyspark; Identifying the computational tradeoffs in a Spark application; Performing data loading and cleaning using Spark and Parquet; Modeling data through statistical and machine learning methods.