CS8803-DSL: Domain Specific Languages for High Performance Computing

CRN-93621

This course will empower students to design and implement Domain-Specific programming Languages (DSLs) to solve problems in High-Performance Computing (HPC) contexts. Students will learn basics of compiler construction and language design as it relates to the challenges of high-performance computing and performance engineering. The course will cover topics such as architectural modelling, optimization techniques, and program analysis, through the context of contemorary DSLs and research in the field. It is my hope that any student taking the course will leave with the skills to build compilers to solve programming problems in their own HPC research areas. The course will be project-focused, with a few assignments and a final project to design and implement a DSL for some HPC problem.

Learning Objectives

By the end of this course, students will be able to:

Measure and analyze the performance of compilers and HPC programs.
Design and implement languages to capture concepts in data/scientific domains.
Embed/integrate DSLs within existing programming languages and frameworks.
Adapt languages to model and target HPC architectures.
Apply optimization techniques to improve the performance of generated programs.
Communicate technical concepts to build your career in academic contexts.

We will accomplish these objectives through a combination of homework problems, discussions, group peer review activities, and a special project.

Policies

See the Course Policies for more information on grading, assignments, and other course policies.

Note: This course website is currently under construction. Please check back later for more information.

Course Information

Course Code: CS8803-DSL
CRN: 93621
Credits: 3
Instructor: Willow Ahrens (ahrens@gatech.edu)
Office Hours: Wednesday, 2:30-3:30, KACB 3144
TA: Vickrant Sreekanth (vickrant@gatech.edu)
TA Office Hours: Friday, 4:00-5:00, ~~KACB 2335 (Crnch Lounge)~~, Teams
Meeting Room: Arch 107
Meeting Time: Tuesday & Thursday, 12:30–1:45 PM

Course Materials

Textbook: Compilers: Principles, Techniques, and Tools (also known as the Dragon Book). I highly recommend you purchase a physical copy of this book, as it is a classic reference text in the field. There is a course reserve at the library with 2 physical copies, you can ask for them at the library INFOdesk. Please let me know if you have any issues accessing the book, or if cost is an issue.
Additional Readings: Additional readings will be recommended throughout the course. You will need to authenticate to the Georgia Tech Library Proxy to access the official versions of these readings. For convenience, try adding the papers you read to a citation manager, such as Zotero or Mendeley.

Assignments

There will be three assignments:

Write an essay about a problem you’ve been thinking about.
Write a compiler for APL. We’ll likely have options for whether you wish to compile to c or to llvm.
An assignment involving writing Halide schedules.

Schedule

Date	Topics	Readings
August 19	Welcome • Why DSLs? • Why HPC? • Course Policies • Course Overview	Readings: • Programming Pearls: Little Languages • A New Golden Age for Computer Architecture • How to Read a Paper Discussion: •A Performance-Optimizing Compiler for Cyber-Physical Digital Microfluidic Biochips •Compiling Functions onto Digital Microfluidics
August 21	Getting Started with DSLs • What is a DSL? • Abstract Syntax Trees (ASTs) • Lexical Analysis and Parsing • Embedded DSLs (Lazy Evaluation, Functions, and Macros)	Readings: • Compilers: Principles, Techniques, and Tools, Chapter 4.1-4.3 • finch-tensor-lite embedded parser • Building domain-specific embedded languages Discussion: • Terra: A Multi-Stage Language for High-Performance Computing
August 26	Measurement • What is Fast? • How to measure • Benchmarking Compilers	Readings: • Sigplan Guidelines for Empirical Measurement • Scientific benchmarking of parallel computing systems: twelve ways to tell the masses when reporting performance results Discussion: • Producing wrong data without doing anything obviously wrong! • PandasBench: A Benchmark for the Pandas API
August 28	Performance Models and Engineering • Performance Engineering: an overview • Models: Ahmdals Law, Uniprocessor Model, Roofline	Readings: • Roofline: an insightful visual performance model for multicore architectures Discussion: • Synthesis of High-Performance Parallel Programs for a Class of Ab Initio Quantum Chemistry Models
September 2	Collection-Oriented Languages • How do we represent collections of data? • Relational, set, array models	Readings: • Collection-Oriented Languages Discussion: • An introduction to the set theoretical language SETL • Notation as a tool of thought
September 4	Dense Array Programming Dense Array Programming • Classical Optimizations: Loop Fusion, Loop Unrolling, Vectorization • Mechanism Vs. Policy • Halide	Readings: • Compilers: Principles, Techniques, and Tools, Chapter 10.4, 11.1, 11.2, 11.3, 11.7.8, 11.11 Discussion: • Halide: decoupling algorithms from schedules for high-performance image processing
September 9	Dataflow Analysis and Optimization • Program Analysis • Dataflow Analysis • Static vs. Dynamic Analysis • Interval Analysis • Heap Modeling	Readings: • Compilers: Principles, Techniques, and Tools, Second Edition, Chapter 9 • Lecture 4, Dataflow Analysis, Cornell CS 6120 Discussion: • Representing Data Collections in an SSA Form
September 11	Rewriting and Transformation • Rewriting Systems • E-graphs	Readings: • Achieving high-performance the functional way: a functional pearl on expressing high-performance optimizations as rewrite strategies • Software Design for Flexibility, Chapter 4 • SymbolicUtils.jl Discussion: • Spiral: Extreme Performance Portability
September 16	E-graphs • What is an E-graph? • E-graph Representation • Saturation, Search	Readings: • egg: Fast and extensible equality saturation • egg talk Discussion: • Guided Equality Saturation • Caviar: an e-graph based TRS for automatic code optimization
September 18	Sparse Array Programming • Domain: Sparse Arrays, Graphs, Meshes, Databases • Representation (Sparse Matrix Formats, Columnar storage, Fibertree)	Readings: • Automatic Performance Tuning of Sparse Matrix Kernels • A relational model of data for large shared data banks Discussion: • The tensor algebra compiler • Format abstraction for sparse tensor algebra compilers
September 23	Sparse Array Programming Revisited • Coiteration (Merge Strategies, Looplets) • Loops and Iteration • Three Major Algorithms for Matrix Multiplication	Readings: • Gamma: Leveraging Gustavson’s Algorithm to Accelerate Sparse Matrix Multiplication Discussion: • Looplets: A Language for Structured Coiteration • Functional Collection Programming with Semi-ring Dictionaries
September 25	Autoscheduling • What is Autoscheduling? • Three Ingredients of Autoscheduling (Search Space, Search Strategy, Cost Model) • Autotuning vs. Autoscheduling • Search Space Characterization • Search Strategies	Readings: • Automatically tuned linear algebra software • OpenTuner: An Extensible Framework for Program Autotuning • Autotuning in High-Performance Computing Applications Discussion: • ATF: A Generic Auto-Tuning Framework • BaCO: A Fast and Portable Bayesian Compiler Optimization Framework
September 30	Final Project Proposals Session • Bring a 5-min Slideshow on your crazy idea that just might work • Peer Feedback • Thinking about Final Projects	TBD
October 2	Autoscheduling Revisited • Cost Modeling (Feature-based/ML, Sparse Cost Models/Cardinality Estimation)	Readings: • How to Architect a Query Compiler • Autoscheduling for sparse tensor algebra with an asymptotic cost model Discussion: • Galley: Modern Query Optimization for Sparse Tensor Programs • Learning to Optimize Halide with Tree Search and Random Program
October 9	Vectorization • Auto-Vectorization • Guest Lecture from Tom Chen on “All you need is SLP”	Readings: • All you need is superword-level parallelism: systematic control-flow vectorization with SLP • Vectorization in Halide Discussion: • Vectorizing Sparse Matrix Computations with Partially-Strided Codelets
October 14	Final Project Work Session • Use this time to get a head start on your final project	TBD
October 16	Final Project Work Session • Use this time to get a head start on your final project	TBD
October 21	Multicore Parallelism • Multicore Architectures • Parallel Programming Models • Cilk	Readings: • The implementation of the Cilk-5 multithreaded language • Tapir: Embedding Fork-Join Parallelism into LLVM’s Intermediate Representation Discussion: • Heartbeat Scheduling: Provable Efficiency for Nested Parallelism
October 28	Distributed Memory Parallelism • Distributed Memory Architectures • Message Passing Interface (MPI) • MapReduce • UPC • Legion • Communication-avoiding Matrix Multiply	Readings: • MapReduce: Simplified Data Processing on Large Clusters • Legion: Expressing Locality and Independence with Logical Regions • Cyclops Tensor Framework Discussion: • DISTAL: The Distributed Tensor Algebra Compiler
October 28	Accelerators and GPUs • GPU Programming Languages • Data Parallel Languages	Readings: •Task-Based Tensor Computations on Modern GPUs Discussion: • Taichi: a language for high-performance computation on spatially sparse data structures
October 30	Guest Lecture: Michel Steuwer • Talk Title: “How to design the next 700 optimizing compilers” • Website	Readings: • Achieving high-performance the functional way: a functional pearl on expressing high-performance optimizations as rewrite strategies
November 4	Staged Programming • What is Staged Programming? • Abstract Interpretation • Implementations	Readings: • Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs Discussion: • BuildIt: A Type-Based Multi-stage Programming Framework for Code Generation in C++
November 6	Dense Array Programming Revisited • Revisiting Dense Array Programming	Readings: • A Practical Automatic Polyhedral Parallelizer and Locality Optimizer • The Pochoir Stencil Compiler Discussion: • Exocompilation for Productive Programming of Hardware Accelerators
November 11	Final Project Feedback • Set up a time to meet with me to discuss final project progress	TBD
November 13	Final Project Feedback • Set up a time to meet with me to discuss final project progress	TBD
November 18	Guest Lecture: Fredrik Kjolstad • Talk Title: “Portable Compilation in an Accelerated World” • Website	Readings: • Portal: Portable Accelerated Learning • DISTAL: The Distributed Tensor Algebra Compiler
November 20	Guest Lecture: Jacob Laurel • Talk Title: “Differentiable Programming” • Website	Readings: • Synthesizing Sound and Precise Abstract Transformers for Nonlinear Hyperbolic PDE Solvers
November 25	Final Project Session • Presentations on final projects • Peer feedback • You did it!	TBD
December 2	Final Project Session • Presentations on final projects • Peer feedback • You did it!	TBD

Inspired by

Stanford CS343D: Domain-Specific Programming Models and Compilers, Fredrik Kjolstad.
MIT 6.172: Performance Engineering of Software Systems,
Cornell CS 6120: Advanced Compilers
UC Berkeley CS267: Applications of Parallel Computers, Aydin Buluc, Kathy Yelick, James Demmel.
UC Berkeley CS294: Building User-Centered Programming Tools
6.S894: Accelerated Computing