Courses

DSC 208R: Data Management for Analytics

Course Information

Course Type
Core
Course Description

(Prereq. DSC 207R)

Principles, techniques, and tools for organizing, storing, querying, transforming, and using data for analytics and machine learning computations at scale; including basics of data storage, acquisition, governance, organization, principles of the relational data model, relational algebra and its relationship to DataFrames, the Structured Query Language (SQL), relational database system features for faster querying and analytics, and basics of non-relational data systems. Coverage of major data quality issues and methodologies to clean data. An introduction to cluster and cloud computing, MapReduce and Spark, and the use of these tools and SQL to transform data at scale for ML feature engineering. Methodologies to critically evaluate analytics results, including debugging and reasoning about bias and fairness in the data science pipeline.