CS8803-DSL: Domain Specific Languages for High Performance Computing
CRN-93621
This course will empower students to design and implement Domain-Specific programming Languages (DSLs) to solve problems in High-Performance Computing (HPC) contexts. Students will learn basics of compiler construction and language design as it relates to the challenges of high-performance computing and performance engineering. The course will cover topics such as architectural modelling, optimization techniques, and program analysis, through the context of contemorary DSLs and research in the field. It is my hope that any student taking the course will leave with the skills to build compilers to solve programming problems in their own HPC research areas. The course will be project-focused, with a few assignments and a final project to design and implement a DSL for some HPC problem.
Learning Objectives
By the end of this course, students will be able to:
- Measure and analyze the performance of compilers and HPC programs.
- Design and implement languages to capture concepts in data/scientific domains.
- Embed/integrate DSLs within existing programming languages and frameworks.
- Adapt languages to model and target HPC architectures.
- Apply optimization techniques to improve the performance of generated programs.
- Communicate technical concepts to build your career in academic contexts.
We will accomplish these objectives through a combination of homework problems, discussions, group peer review activities, and a special project.
Policies
See the Course Policies for more information on grading, assignments, and other course policies.
Note: This course website is currently under construction. Please check back later for more information.
Course Information
- Course Code: CS8803-DSL
- CRN: 93621
- Credits: 3
- Instructor: Willow Ahrens (ahrens@gatech.edu)
- Office Hours: Wednesday, 2:30-3:30, KACB 3144
- TA: Vickrant Sreekanth (vickrant@gatech.edu)
- TA Office Hours: Friday, 4:00-5:00,
KACB 2335 (Crnch Lounge), Teams - Meeting Room: Arch 107
- Meeting Time: Tuesday & Thursday, 12:30–1:45 PM
Course Materials
- Textbook: Compilers: Principles, Techniques, and Tools (also known as the Dragon Book). I highly recommend you purchase a physical copy of this book, as it is a classic reference text in the field. There is a course reserve at the library with 2 physical copies, you can ask for them at the library INFOdesk. Please let me know if you have any issues accessing the book, or if cost is an issue.
- Additional Readings: Additional readings will be recommended throughout the course. You will need to authenticate to the Georgia Tech Library Proxy to access the official versions of these readings. For convenience, try adding the papers you read to a citation manager, such as Zotero or Mendeley.
Assignments
There will be three assignments:
- Write an essay about a problem you’ve been thinking about.
- Write a compiler for APL. We’ll likely have options for whether you wish to compile to c or to llvm.
- An assignment involving writing Halide schedules.
Schedule
| Date | Topics | Readings |
|---|---|---|
| August 19 | Welcome • Why DSLs? • Why HPC? • Course Policies • Course Overview | Readings: • Programming Pearls: Little Languages • A New Golden Age for Computer Architecture • How to Read a Paper Discussion: •A Performance-Optimizing Compiler for Cyber-Physical Digital Microfluidic Biochips •Compiling Functions onto Digital Microfluidics |
| August 21 | Getting Started with DSLs • What is a DSL? • Abstract Syntax Trees (ASTs) • Lexical Analysis and Parsing • Embedded DSLs (Lazy Evaluation, Functions, and Macros) | Readings: • Compilers: Principles, Techniques, and Tools, Chapter 4.1-4.3 • finch-tensor-lite embedded parser • Building domain-specific embedded languages Discussion: • Terra: A Multi-Stage Language for High-Performance Computing |
| August 26 | Measurement • What is Fast? • How to measure • Benchmarking Compilers | Readings: • Sigplan Guidelines for Empirical Measurement • Scientific benchmarking of parallel computing systems: twelve ways to tell the masses when reporting performance results Discussion: • Producing wrong data without doing anything obviously wrong! • PandasBench: A Benchmark for the Pandas API |
| August 28 | Performance Models and Engineering • Performance Engineering: an overview • Models: Ahmdals Law, Uniprocessor Model, Roofline | Readings: • Roofline: an insightful visual performance model for multicore architectures Discussion: • Synthesis of High-Performance Parallel Programs for a Class of Ab Initio Quantum Chemistry Models |
| September 2 | Collection-Oriented Languages • How do we represent collections of data? • Relational, set, array models | Readings: • Collection-Oriented Languages Discussion: • An introduction to the set theoretical language SETL • Notation as a tool of thought |
| September 4 | Dense Array Programming Dense Array Programming • Classical Optimizations: Loop Fusion, Loop Unrolling, Vectorization • Mechanism Vs. Policy • Halide | Readings: • Compilers: Principles, Techniques, and Tools, Chapter 10.4, 11.1, 11.2, 11.3, 11.7.8, 11.11 Discussion: • Halide: decoupling algorithms from schedules for high-performance image processing |
| September 9 | Dataflow Analysis and Optimization • Program Analysis • Dataflow Analysis • Static vs. Dynamic Analysis • Interval Analysis • Heap Modeling | Readings: • Compilers: Principles, Techniques, and Tools, Second Edition, Chapter 9 • Lecture 4, Dataflow Analysis, Cornell CS 6120 Discussion: • Representing Data Collections in an SSA Form |
| September 11 | Rewriting and Transformation • Rewriting Systems • E-graphs | Readings: • Achieving high-performance the functional way: a functional pearl on expressing high-performance optimizations as rewrite strategies • Software Design for Flexibility, Chapter 4 • SymbolicUtils.jl Discussion: • Spiral: Extreme Performance Portability |
| September 16 | E-graphs • What is an E-graph? • E-graph Representation • Saturation, Search | Readings: • egg: Fast and extensible equality saturation • egg talk Discussion: • Guided Equality Saturation • Caviar: an e-graph based TRS for automatic code optimization |
| September 18 | Sparse Array Programming • Domain: Sparse Arrays, Graphs, Meshes, Databases • Representation (Sparse Matrix Formats, Columnar storage, Fibertree) | Readings: • Automatic Performance Tuning of Sparse Matrix Kernels • A relational model of data for large shared data banks Discussion: • The tensor algebra compiler • Format abstraction for sparse tensor algebra compilers |
| September 23 | Sparse Array Programming Revisited • Coiteration (Merge Strategies, Looplets) • Loops and Iteration • Three Major Algorithms for Matrix Multiplication | Readings: • Gamma: Leveraging Gustavson’s Algorithm to Accelerate Sparse Matrix Multiplication Discussion: • Looplets: A Language for Structured Coiteration • Functional Collection Programming with Semi-ring Dictionaries |
| September 25 | Autoscheduling • What is Autoscheduling? • Three Ingredients of Autoscheduling (Search Space, Search Strategy, Cost Model) • Autotuning vs. Autoscheduling • Search Space Characterization • Search Strategies | Readings: • Automatically tuned linear algebra software • OpenTuner: An Extensible Framework for Program Autotuning • Autotuning in High-Performance Computing Applications Discussion: • ATF: A Generic Auto-Tuning Framework • BaCO: A Fast and Portable Bayesian Compiler Optimization Framework |
| September 30 | Final Project Proposals Session • Bring a 5-min Slideshow on your crazy idea that just might work • Peer Feedback • Thinking about Final Projects | TBD |
| October 2 | Autoscheduling Revisited • Cost Modeling (Feature-based/ML, Sparse Cost Models/Cardinality Estimation) | Readings: • How to Architect a Query Compiler • Autoscheduling for sparse tensor algebra with an asymptotic cost model Discussion: • Galley: Modern Query Optimization for Sparse Tensor Programs • Learning to Optimize Halide with Tree Search and Random Program |
| October 9 | Vectorization • Auto-Vectorization • Guest Lecture from Tom Chen on “All you need is SLP” | Readings: • All you need is superword-level parallelism: systematic control-flow vectorization with SLP • Vectorization in Halide Discussion: • Vectorizing Sparse Matrix Computations with Partially-Strided Codelets |
| October 14 | Final Project Work Session • Use this time to get a head start on your final project | TBD |
| October 16 | Final Project Work Session • Use this time to get a head start on your final project | TBD |
| October 21 | Multicore Parallelism • Multicore Architectures • Parallel Programming Models • Cilk | Readings: • The implementation of the Cilk-5 multithreaded language • Tapir: Embedding Fork-Join Parallelism into LLVM’s Intermediate Representation Discussion: • Heartbeat Scheduling: Provable Efficiency for Nested Parallelism |
| October 28 | Distributed Memory Parallelism • Distributed Memory Architectures • Message Passing Interface (MPI) • MapReduce • UPC • Legion • Communication-avoiding Matrix Multiply | Readings: • MapReduce: Simplified Data Processing on Large Clusters • Legion: Expressing Locality and Independence with Logical Regions • Cyclops Tensor Framework Discussion: • DISTAL: The Distributed Tensor Algebra Compiler |
| October 28 | Accelerators and GPUs • GPU Programming Languages • Data Parallel Languages | Readings: •Task-Based Tensor Computations on Modern GPUs Discussion: • Taichi: a language for high-performance computation on spatially sparse data structures |
| October 30 | Guest Lecture: Michel Steuwer • Talk Title: “How to design the next 700 optimizing compilers” • Website | Readings: • Achieving high-performance the functional way: a functional pearl on expressing high-performance optimizations as rewrite strategies |
| November 4 | Staged Programming • What is Staged Programming? • Abstract Interpretation • Implementations | Readings: • Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs Discussion: • BuildIt: A Type-Based Multi-stage Programming Framework for Code Generation in C++ |
| November 6 | Dense Array Programming Revisited • Revisiting Dense Array Programming | Readings: • A Practical Automatic Polyhedral Parallelizer and Locality Optimizer • The Pochoir Stencil Compiler Discussion: • Exocompilation for Productive Programming of Hardware Accelerators |
| November 11 | Final Project Feedback • Set up a time to meet with me to discuss final project progress | TBD |
| November 13 | Final Project Feedback • Set up a time to meet with me to discuss final project progress | TBD |
| November 18 | Guest Lecture: Fredrik Kjolstad | TBD |
| November 20 | Guest Lecture: Jacob Laurel | TBD |
| November 25 | Final Project Session • Presentations on final projects • Peer feedback • You did it! | TBD |
| December 2 | Final Project Session • Presentations on final projects • Peer feedback • You did it! | TBD |
Inspired by
- Stanford CS343D: Domain-Specific Programming Models and Compilers, Fredrik Kjolstad.
- MIT 6.172: Performance Engineering of Software Systems,
- Cornell CS 6120: Advanced Compilers
- UC Berkeley CS267: Applications of Parallel Computers, Aydin Buluc, Kathy Yelick, James Demmel.
- UC Berkeley CS294: Building User-Centered Programming Tools
- 6.S894: Accelerated Computing