August 19 | Welcome • Why DSLs? • Why HPC? • Course Policies • Course Overview | Readings: • Programming Pearls: Little Languages • A New Golden Age for Computer Architecture • How to Read a Paper Discussion: •A Performance-Optimizing Compiler for Cyber-Physical Digital Microfluidic Biochips •Compiling Functions onto Digital Microfluidics |
August 21 | Getting Started with DSLs • What is a DSL? • Abstract Syntax Trees (ASTs) • Lexical Analysis and Parsing • Embedded DSLs (Lazy Evaluation, Functions, and Macros) | Readings: • Compilers: Principles, Techniques, and Tools, Chapter 4.1-4.3 • finch-tensor-lite embedded parser • Building domain-specific embedded languages Discussion: • Terra: A Multi-Stage Language for High-Performance Computing |
August 26 | Measurement • What is Fast? • How to measure • Benchmarking Compilers | Readings: • Sigplan Guidelines for Empirical Measurement • Scientific benchmarking of parallel computing systems: twelve ways to tell the masses when reporting performance results Discussion: • Producing wrong data without doing anything obviously wrong! • PandasBench: A Benchmark for the Pandas API |
August 28 | Performance Models and Engineering • Performance Engineering: an overview • Models: Ahmdals Law, Uniprocessor Model, Roofline | Readings: • Roofline: an insightful visual performance model for multicore architectures Discussion: • Synthesis of High-Performance Parallel Programs for a Class of Ab Initio Quantum Chemistry Models |
September 2 | Collection-Oriented Languages • How do we represent collections of data? • Relational, set, array models | Readings: • Collection-Oriented Languages Discussion: • An introduction to the set theoretical language SETL • Notation as a tool of thought |
September 4 | Dense Array Programming Dense Array Programming • Classical Optimizations: Loop Fusion, Loop Unrolling, Vectorization • Mechanism Vs. Policy • Halide | Readings: • Compilers: Principles, Techniques, and Tools, Chapter 10.4, 11.1, 11.2, 11.3, 11.7.8, 11.11 Discussion: • Halide: decoupling algorithms from schedules for high-performance image processing |
September 9 | Dataflow Analysis and Optimization • Program Analysis • Dataflow Analysis • Static vs. Dynamic Analysis • Interval Analysis • Heap Modeling | Readings: • Compilers: Principles, Techniques, and Tools, Second Edition, Chapter 9 • Lecture 4, Dataflow Analysis, Cornell CS 6120 Discussion: • Representing Data Collections in an SSA Form |
September 11 | Rewriting and Transformation • Rewriting Systems • E-graphs | Readings: • Achieving high-performance the functional way: a functional pearl on expressing high-performance optimizations as rewrite strategies • Software Design for Flexibility, Chapter 4 • SymbolicUtils.jl Discussion: • Spiral: Extreme Performance Portability |
September 16 | E-graphs • What is an E-graph? • E-graph Representation • Saturation, Search | Readings: • egg: Fast and extensible equality saturation Discussion: • Guided Equality Saturation • Caviar: an e-graph based TRS for automatic code optimization |
September 18 | Sparse Array Programming • Domain: Sparse Arrays, Graphs, Meshes, Databases • Representation (Sparse Matrix Formats, Columnar storage, Fibertree) | Readings: • Automatic Performance Tuning of Sparse Matrix Kernels • A relational model of data for large shared data banks Discussion: • The tensor algebra compiler • Format abstraction for sparse tensor algebra compilers |
September 23 | Sparse Array Programming Revisited • Coiteration (Merge Strategies, Looplets) • Loops and Iteration • Three Major Algorithms for Matrix Multiplication | Readings: • Gamma: Leveraging Gustavson’s Algorithm to Accelerate Sparse Matrix Multiplication Discussion: • Looplets: A Language for Structured Coiteration • Functional Collection Programming with Semi-ring Dictionaries |
September 25 | Autoscheduling • What is Autoscheduling? • Three Ingredients of Autoscheduling (Search Space, Search Strategy, Cost Model) • Autotuning vs. Autoscheduling • Search Space Characterization • Search Strategies | Readings: • Automatically tuned linear algebra software • OpenTuner: An Extensible Framework for Program Autotuning • Autotuning in High-Performance Computing Applications Discussion: • ATF: A Generic Auto-Tuning Framework • BaCO: A Fast and Portable Bayesian Compiler Optimization Framework |
September 30 | Final Project Proposals Session • Bring a 5-min Slideshow on your crazy idea that just might work • Peer Feedback • Thinking about Final Projects | TBD |
October 2 | Autoscheduling Revisited • Cost Modeling (Feature-based/ML, Sparse Cost Models/Cardinality Estimation) | Readings: • How to Architect a Query Compiler • Autoscheduling for sparse tensor algebra with an asymptotic cost model Discussion: • Galley: Modern Query Optimization for Sparse Tensor Programs • Learning to Optimize Halide with Tree Search and Random Program |
October 9 | Vectorization • SSA Form • Cleanup Strategies in Halide • Auto-Vectorization | Readings: • All you need is superword-level parallelism: systematic control-flow vectorization with SLP • Vectorization in Halide Discussion: • Vectorizing Sparse Matrix Computations with Partially-Strided Codelets |
October 14 | Multicore Parallelism • Multicore Architectures • Parallel Programming Models • Cilk | Readings: • The implementation of the Cilk-5 multithreaded language Discussion: • Heartbeat Scheduling: Provable Efficiency for Nested Parallelism • Tapir: Embedding Fork-Join Parallelism into LLVM’s Intermediate Representation |
October 16 | Final Project Work Session • Use this time to get a head start on your final project | TBD |
October 21 | Distributed Memory Parallelism • Distributed Memory Architectures • Message Passing Interface (MPI) • MapReduce • UPC • Legion • Communication-avoiding Matrix Multiply | Readings: • MapReduce: Simplified Data Processing on Large Clusters • Legion: Expressing Locality and Independence with Logical Regions • Cyclops Tensor Framework Discussion: • DISTAL: The Distributed Tensor Algebra Compiler |
October 23 | Accelerators and GPUs • GPU Programming Languages • Data Parallel Languages | Discussion: • Taichi: a language for high-performance computation on spatially sparse data structures • Task-Based Tensor Computations on Modern GPUs |
October 28 | Staged Programming • What is Staged Programming? • Abstract Interpretation • Implementations | Readings: • Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs Discussion: • BuildIt: A Type-Based Multi-stage Programming Framework for Code Generation in C++ |
October 30 | Dense Array Programming Revisited • Revisiting Dense Array Programming | Readings: • A Practical Automatic Polyhedral Parallelizer and Locality Optimizer • The Pochoir Stencil Compiler Discussion: • Exocompilation for Productive Programming of Hardware Accelerators |
November 4 | Compiling Programs Faster • Compilation Time vs. Execution Time • Prepared queries • Intepreters vs. Compilers • Turn off Compilation | Readings: • Firefox’s Baseline WebAssembly Compiler Discussion: • Copy-and-patch compilation: a fast compilation algorithm for high-level languages and bytecode |
November 6 | Guest Lecture: Fredrik Kjolstad | TBD |
November 11 | Guest Lecture: Charith Mendis | TBD |
November 13 | Guest Lecture: Jacob Laurel | TBD |
November 18 | Guest Lecture: Felix Herman | TBD |
November 20 | Final Project Session • Presentations on final projects • Peer feedback • You did it!
| TBD |
November 25 | Final Project Session • Presentations on final projects • Peer feedback • You did it!
| TBD |
December 2 | Final Project Session • Presentations on final projects • Peer feedback • You did it!
| TBD |