CS8803-DSL: Domain Specific Languages for High Performance Computing
CRN-93621
This course will empower students to design and implement Domain-Specific programming Languages (DSLs) to solve problems in High-Performance Computing (HPC) contexts. Students will learn basics of compiler construction and language design as it relates to the challenges of high-performance computing and performance engineering. The course will cover topics such as architectural modelling, optimization techniques, and program analysis, through the context of contemorary DSLs and research in the field. It is my hope that any student taking the course will leave with the skills to build compilers to solve programming problems in their own HPC research areas. The course will be project-focused, with a few assignments and a final project to design and implement a DSL for some HPC problem.
Learning Objectives
By the end of this course, students will be able to:
- Measure and analyze the performance of compilers and HPC programs.
- Design and implement languages to capture concepts in data/scientific domains.
- Embed/integrate DSLs within existing programming languages and frameworks.
- Adapt languages to model and target HPC architectures.
- Apply optimization techniques to improve the performance of generated programs.
- Communicate technical concepts to build your career in academic contexts.
We will accomplish these objectives through a combination of homework problems, discussions, group peer review activities, and a special project.
Policies
See the Course Policies for more information on grading, assignments, and other course policies.
Note: This course website is currently under construction. Please check back later for more information.
Course Information
- Course Code: CS8803-DSL
- CRN: 93621
- Credits: 3
- Instructor: Willow Ahrens (ahrens@gatech.edu)
- Office Hours: TBA
- Meeting Room: Arch 107
- Meeting Time: Tuesday & Thursday, 12:30–1:45 PM
Course Materials
- Textbook: Compilers: Principles, Techniques, and Tools (also known as the Dragon Book). I highly recommend you purchase a physical copy of this book, as it is a classic reference text in the field. Please let me know if you have any issues accessing the book, or if cost is an issue.
- Additional Readings: Additional readings will be recommended throughout the course. You will need to authenticate to the Georgia Tech Library Proxy to access the official versions of these readings. For convenience, try adding the papers you read to a citation manager, such as Zotero or Mendeley.
Schedule
Date | Topics | Readings | Notes |
---|---|---|---|
August 19 | Welcome • Why DSLs? • Course Policies • Course Overview | Readings: • Programming Pearls: Little Languages • A New Golden Age for Computer Architecture • How to Read a Paper | Slides (from Stanford CS343D) |
August 21 | Getting Started with DSLs • What is a DSL? • Abstract Syntax Trees (ASTs) • Lexical Analysis and Parsing • Embedded DSLs (Lazy Evaluation, Functions, and Macros) | Readings: • Compilers: Principles, Techniques, and Tools, Chapter 4.1-4.3 • finch-tensor-lite embedded parser Discussion: • Terra: A Multi-Stage Language for High-Performance Computing | Slides (from Stanford CS343D) |
August 26 | Measurement • What is Fast? Benchmarking and Performance Metrics • Amdahl’s Law • Roofline Model • Benchmarking Compilers | Readings: • Roofline: an insightful visual performance model for multicore architectures • Scientific benchmarking of parallel computing systems: twelve ways to tell the masses when reporting performance results Discussion: • Producing wrong data without doing anything obviously wrong! • PandasBench: A Benchmark for the Pandas API | Slides (from UC Berkeley CS267) |
August 28 | Dense Array Programming / Loop Optimization • What is High-Performance Computing? An Overview • Dense Array Programming • Classical Optimizations: Loop Fusion, Loop Unrolling, Vectorization • Mechanism Vs. Policy • Halide | Readings: • Compilers: Principles, Techniques, and Tools, Chapter 10.4, 11.1, 11.2, 11.3, 11.7.8, 11.11 Discussion: • Halide: decoupling algorithms from schedules for high-performance image processing • Synthesis of High-Performance Parallel Programs for a Class of Ab Initio Quantum Chemistry Models | Slides (from Stanford CS343D, Cornell CS 6120) |
TBA | Dataflow Analysis and Optimization • Program Analysis • Dataflow Analysis • Static vs. Dynamic Analysis • Interval Analysis • Heap Modeling | Readings: • Compilers: Principles, Techniques, and Tools, Second Edition, Chapter 9 • Lecture 4, Dataflow Analysis, Cornell CS 6120 Discussion: • Representing Data Collections in an SSA Form | Slides (from Cornell CS 6120) |
TBA | Rewriting and Transformation • Rewriting Systems • E-graphs | Readings: • Achieving high-performance the functional way: a functional pearl on expressing high-performance optimizations as rewrite strategies • Software Design for Flexibility, Chapter 4 • SymbolicUtils.jl Discussion: • Spiral: Extreme Performance Portability | |
TBA | E-graphs • What is an E-graph? • E-graph Representation • Saturation, Search | Readings: • egg: Fast and extensible equality saturation Discussion: • Guided Equality Saturation • Caviar: an e-graph based TRS for automatic code optimization | Slides TBD |
TBA | Autotuning • What is Autotuning? • Autotuning Frameworks • Search Space Exploration | Readings: • Automatically tuned linear algebra software • OpenTuner: An Extensible Framework for Program Autotuning • ATF: A Generic Auto-Tuning Framework • Autotuning in High-Performance Computing Applications | |
TBA | Sparse Array Programming • Domain: Sparse Arrays, Graphs, Meshes, Databases • Representation (Sparse Matrix Formats, Columnar storage, Fibertree) | Readings: • Automatic Performance Tuning of Sparse Matrix Kernels • A relational model of data for large shared data banks Discussion: • The tensor algebra compiler • Format abstraction for sparse tensor algebra compilers | Slides (from Stanford CS343D) |
TBA | Sparse Array Programming Revisited • Coiteration (Merge Strategies, Looplets) • Loops and Iteration • Three Major Algorithms for Matrix Multiplication | Readings: • Gamma: Leveraging Gustavson’s Algorithm to Accelerate Sparse Matrix Multiplication • Looplets: A Language for Structured Coiteration • Functional Collection Programming with Semi-ring Dictionaries | |
TBA | Wild And Crazy Ideas Session • Bring a 5-min Slideshow on your crazy idea that just might work • Peer Feedback • Thinking about Final Projects | Readings: • | |
TBA | Vectorization • SSA Form • Cleanup Strategies in Halide • Auto-Vectorization | Readings: • All you need is superword-level parallelism: systematic control-flow vectorization with SLP • Vectorization in Halide Discussion: • Vectorizing Sparse Matrix Computations with Partially-Strided Codelets | |
TBA | Multicore Parallelism • Multicore Architectures • Parallel Programming Models • Cilk | Readings: • The implementation of the Cilk-5 multithreaded language Discussion: • Heartbeat Scheduling: Provable Efficiency for Nested Parallelism • Tapir: Embedding Fork-Join Parallelism into LLVM’s Intermediate Representation | Slides (from UC Berkeley CS267, MIT 6.172) |
TBA | Distributed Memory Parallelism • Distributed Memory Architectures • Message Passing Interface (MPI) • MapReduce • UPC • Legion • Communication-avoiding Matrix Multiply | Readings: • MapReduce: Simplified Data Processing on Large Clusters • DISTAL: The Distributed Tensor Algebra Compiler • Legion: Expressing Locality and Independence with Logical Regions • Cyclops Tensor Framework | Slides (from UC Berkeley CS267) |
TBA | Accelerators and GPUs • GPU Programming Languages • Data Parallel Languages | Discussion: • Taichi: a language for high-performance computation on spatially sparse data structures • Task-Based Tensor Computations on Modern GPUs | Slides (from UC Berkeley CS267) |
TBA | Staged Programming • What is Staged Programming? • Abstract Interpretation • Implementations | Readings: • Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs Discussion: • BuildIt: A Type-Based Multi-stage Programming Framework for Code Generation in C++ | |
TBA | Dense Array Programming Revisited • Revisiting Dense Array Programming | Readings: • A Practical Automatic Polyhedral Parallelizer and Locality Optimizer • The Pochoir Stencil Compiler Discussion: • Exocompilation for Productive Programming of Hardware Accelerators | |
TBA | Autoscheduling • What is Autoscheduling? • Three Ingredients of Autoscheduling (Search Space, Search Strategy, Cost Model) • Autotuning vs. Autoscheduling • Search Space Characterization • Search Strategies | Readings: • Learning to Optimize Halide with Tree Search and Random Program | TBD slides |
TBA | Autoscheduling Revisited • Cost Modeling (Feature-based/ML, Sparse Cost Models/Cardinality Estimation) | Readings: • How to Architect a Query Compiler • Autoscheduling for sparse tensor algebra with an asymptotic cost model Discussion: • Galley: Modern Query Optimization for Sparse Tensor Programs | TBD slides |
TBA | Compiling Programs Faster • Compilation Time vs. Execution Time • Prepared queries • Intepreters vs. Compilers • Turn off Compilation | Readings: • Firefox’s Baseline WebAssembly Compiler Discussion: • Copy-and-patch compilation: a fast compilation algorithm for high-level languages and bytecode | Slides (from Stanford CS343D) |
TBA | Guest Lecture: Fredrik Kjolstad | Readings: | |
TBA | Guest Lecture: Charith Mendis | Readings: | |
TBA | Guest Lecture: Jacob Laurel | Readings: | |
TBA | Final Project Session • Bring a 5-min Slideshow on your crazy idea that just might work • Peer Feedback • Thinking about Final Projects | Readings: • |
Inspired by
- Stanford CS343D: Domain-Specific Programming Models and Compilers, Fredrik Kjolstad.
- MIT 6.172: Performance Engineering of Software Systems,
- Cornell CS 6120: Advanced Compilers
- UC Berkeley CS267: Applications of Parallel Computers, Aydin Buluc, Kathy Yelick, James Demmel.
- UC Berkeley CS294: Building User-Centered Programming Tools
- 6.S894: Accelerated Computing