| August 19 | Welcome • Why DSLs? • Why HPC? • Course Policies • Course Overview | Readings: • Programming Pearls: Little Languages • A New Golden Age for Computer Architecture • How to Read a Paper Discussion: •A Performance-Optimizing Compiler for Cyber-Physical Digital Microfluidic Biochips •Compiling Functions onto Digital Microfluidics |
| August 21 | Getting Started with DSLs • What is a DSL? • Abstract Syntax Trees (ASTs) • Lexical Analysis and Parsing • Embedded DSLs (Lazy Evaluation, Functions, and Macros) | Readings: • Compilers: Principles, Techniques, and Tools, Chapter 4.1-4.3 • finch-tensor-lite embedded parser • Building domain-specific embedded languages Discussion: • Terra: A Multi-Stage Language for High-Performance Computing |
| August 26 | Measurement • What is Fast? • How to measure • Benchmarking Compilers | Readings: • Sigplan Guidelines for Empirical Measurement • Scientific benchmarking of parallel computing systems: twelve ways to tell the masses when reporting performance results Discussion: • Producing wrong data without doing anything obviously wrong! • PandasBench: A Benchmark for the Pandas API |
| August 28 | Performance Models and Engineering • Performance Engineering: an overview • Models: Ahmdals Law, Uniprocessor Model, Roofline | Readings: • Roofline: an insightful visual performance model for multicore architectures Discussion: • Synthesis of High-Performance Parallel Programs for a Class of Ab Initio Quantum Chemistry Models |
| September 2 | Collection-Oriented Languages • How do we represent collections of data? • Relational, set, array models | Readings: • Collection-Oriented Languages Discussion: • An introduction to the set theoretical language SETL • Notation as a tool of thought |
| September 4 | Dense Array Programming Dense Array Programming • Classical Optimizations: Loop Fusion, Loop Unrolling, Vectorization • Mechanism Vs. Policy • Halide | Readings: • Compilers: Principles, Techniques, and Tools, Chapter 10.4, 11.1, 11.2, 11.3, 11.7.8, 11.11 Discussion: • Halide: decoupling algorithms from schedules for high-performance image processing |
| September 9 | Dataflow Analysis and Optimization • Program Analysis • Dataflow Analysis • Static vs. Dynamic Analysis • Interval Analysis • Heap Modeling | Readings: • Compilers: Principles, Techniques, and Tools, Second Edition, Chapter 9 • Lecture 4, Dataflow Analysis, Cornell CS 6120 Discussion: • Representing Data Collections in an SSA Form |
| September 11 | Rewriting and Transformation • Rewriting Systems • E-graphs | Readings: • Achieving high-performance the functional way: a functional pearl on expressing high-performance optimizations as rewrite strategies • Software Design for Flexibility, Chapter 4 • SymbolicUtils.jl Discussion: • Spiral: Extreme Performance Portability |
| September 16 | E-graphs • What is an E-graph? • E-graph Representation • Saturation, Search | Readings: • egg: Fast and extensible equality saturation • egg talk Discussion: • Guided Equality Saturation • Caviar: an e-graph based TRS for automatic code optimization |
| September 18 | Sparse Array Programming • Domain: Sparse Arrays, Graphs, Meshes, Databases • Representation (Sparse Matrix Formats, Columnar storage, Fibertree) | Readings: • Automatic Performance Tuning of Sparse Matrix Kernels • A relational model of data for large shared data banks Discussion: • The tensor algebra compiler • Format abstraction for sparse tensor algebra compilers |
| September 23 | Sparse Array Programming Revisited • Coiteration (Merge Strategies, Looplets) • Loops and Iteration • Three Major Algorithms for Matrix Multiplication | Readings: • Gamma: Leveraging Gustavson’s Algorithm to Accelerate Sparse Matrix Multiplication Discussion: • Looplets: A Language for Structured Coiteration • Functional Collection Programming with Semi-ring Dictionaries |
| September 25 | Autoscheduling • What is Autoscheduling? • Three Ingredients of Autoscheduling (Search Space, Search Strategy, Cost Model) • Autotuning vs. Autoscheduling • Search Space Characterization • Search Strategies | Readings: • Automatically tuned linear algebra software • OpenTuner: An Extensible Framework for Program Autotuning • Autotuning in High-Performance Computing Applications Discussion: • ATF: A Generic Auto-Tuning Framework • BaCO: A Fast and Portable Bayesian Compiler Optimization Framework |
| September 30 | Final Project Proposals Session • Bring a 5-min Slideshow on your crazy idea that just might work • Peer Feedback • Thinking about Final Projects | TBD |
| October 2 | Autoscheduling Revisited • Cost Modeling (Feature-based/ML, Sparse Cost Models/Cardinality Estimation) | Readings: • How to Architect a Query Compiler • Autoscheduling for sparse tensor algebra with an asymptotic cost model Discussion: • Galley: Modern Query Optimization for Sparse Tensor Programs • Learning to Optimize Halide with Tree Search and Random Program |
| October 9 | Vectorization • Auto-Vectorization • Guest Lecture from Tom Chen on “All you need is SLP” | Readings: • All you need is superword-level parallelism: systematic control-flow vectorization with SLP • Vectorization in Halide Discussion: • Vectorizing Sparse Matrix Computations with Partially-Strided Codelets |
| October 14 | Final Project Work Session • Use this time to get a head start on your final project | TBD |
| October 16 | Final Project Work Session • Use this time to get a head start on your final project | TBD |
| October 21 | Multicore Parallelism • Multicore Architectures • Parallel Programming Models • Cilk | Readings: • The implementation of the Cilk-5 multithreaded language • Tapir: Embedding Fork-Join Parallelism into LLVM’s Intermediate Representation Discussion: • Heartbeat Scheduling: Provable Efficiency for Nested Parallelism |
| October 28 | Distributed Memory Parallelism • Distributed Memory Architectures • Message Passing Interface (MPI) • MapReduce • UPC • Legion • Communication-avoiding Matrix Multiply | Readings: • MapReduce: Simplified Data Processing on Large Clusters • Legion: Expressing Locality and Independence with Logical Regions • Cyclops Tensor Framework Discussion: • DISTAL: The Distributed Tensor Algebra Compiler |
| October 28 | Accelerators and GPUs • GPU Programming Languages • Data Parallel Languages
| Readings: •Task-Based Tensor Computations on Modern GPUs Discussion: • Taichi: a language for high-performance computation on spatially sparse data structures |
| October 30 | Guest Lecture: Michel Steuwer • Talk Title: “How to design the next 700 optimizing compilers” • Website | Readings: • Achieving high-performance the functional way: a functional pearl on expressing high-performance optimizations as rewrite strategies |
| November 4 | Staged Programming • What is Staged Programming? • Abstract Interpretation • Implementations | Readings: • Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs Discussion: • BuildIt: A Type-Based Multi-stage Programming Framework for Code Generation in C++ |
| November 6 | Dense Array Programming Revisited • Revisiting Dense Array Programming | Readings: • A Practical Automatic Polyhedral Parallelizer and Locality Optimizer • The Pochoir Stencil Compiler Discussion: • Exocompilation for Productive Programming of Hardware Accelerators |
| November 11 | Final Project Feedback • Set up a time to meet with me to discuss final project progress | TBD |
| November 13 | Final Project Feedback • Set up a time to meet with me to discuss final project progress | TBD |
| November 18 | Guest Lecture: Fredrik Kjolstad • Talk Title: “Portable Compilation in an Accelerated World” • Website | Readings: • Portal: Portable Accelerated Learning • DISTAL: The Distributed Tensor Algebra Compiler |
| November 20 | Guest Lecture: Jacob Laurel • Talk Title: “Differentiable Programming” • Website | Readings: • Synthesizing Sound and Precise Abstract Transformers for Nonlinear Hyperbolic PDE Solvers |
| November 25 | Final Project Session • Presentations on final projects • Peer feedback • You did it!
| TBD |
| December 2 | Final Project Session • Presentations on final projects • Peer feedback • You did it!
| TBD |