CS8803-DSL: Domain Specific Languages for High Performance Computing

CRN-93621

This course will empower students to design and implement Domain-Specific programming Languages (DSLs) to solve problems in High-Performance Computing (HPC) contexts. Students will learn basics of compiler construction and language design as it relates to the challenges of high-performance computing and performance engineering. The course will cover topics such as architectural modelling, optimization techniques, and program analysis, through the context of contemorary DSLs and research in the field. It is my hope that any student taking the course will leave with the skills to build compilers to solve programming problems in their own HPC research areas. The course will be project-focused, with a few assignments and a final project to design and implement a DSL for some HPC problem.

Learning Objectives

By the end of this course, students will be able to:

  • Measure and analyze the performance of compilers and HPC programs.
  • Design and implement languages to capture concepts in data/scientific domains.
  • Embed/integrate DSLs within existing programming languages and frameworks.
  • Adapt languages to model and target HPC architectures.
  • Apply optimization techniques to improve the performance of generated programs.
  • Communicate technical concepts to build your career in academic contexts.

We will accomplish these objectives through a combination of homework problems, discussions, group peer review activities, and a special project.

Policies

See the Course Policies for more information on grading, assignments, and other course policies.

Note: This course website is currently under construction. Please check back later for more information.

Course Information

  • Course Code: CS8803-DSL
  • CRN: 93621
  • Credits: 3
  • Instructor: Willow Ahrens (ahrens@gatech.edu)
  • Office Hours: TBA
  • Meeting Room: Arch 107
  • Meeting Time: Tuesday & Thursday, 12:30–1:45 PM

Course Materials

  • Textbook: Compilers: Principles, Techniques, and Tools (also known as the Dragon Book). I highly recommend you purchase a physical copy of this book, as it is a classic reference text in the field. Please let me know if you have any issues accessing the book, or if cost is an issue.
  • Additional Readings: Additional readings will be recommended throughout the course. You will need to authenticate to the Georgia Tech Library Proxy to access the official versions of these readings. For convenience, try adding the papers you read to a citation manager, such as Zotero or Mendeley.

Schedule

DateTopicsReadingsNotes
August 19Welcome
• Why DSLs?
• Course Policies
• Course Overview
Readings:
Programming Pearls: Little Languages
A New Golden Age for Computer Architecture
How to Read a Paper
Slides (from Stanford CS343D)
August 21Getting Started with DSLs
• What is a DSL?
• Abstract Syntax Trees (ASTs)
• Lexical Analysis and Parsing
• Embedded DSLs (Lazy Evaluation, Functions, and Macros)
Readings:
Compilers: Principles, Techniques, and Tools, Chapter 4.1-4.3
finch-tensor-lite embedded parser
Discussion:
Terra: A Multi-Stage Language for High-Performance Computing
Slides (from Stanford CS343D)
August 26Measurement
• What is Fast? Benchmarking and Performance Metrics
• Amdahl’s Law
• Roofline Model
• Benchmarking Compilers
Readings:
Roofline: an insightful visual performance model for multicore architectures
Scientific benchmarking of parallel computing systems: twelve ways to tell the masses when reporting performance results
Discussion:
Producing wrong data without doing anything obviously wrong!
PandasBench: A Benchmark for the Pandas API
Slides (from UC Berkeley CS267)
August 28Dense Array Programming / Loop Optimization
• What is High-Performance Computing? An Overview
• Dense Array Programming
• Classical Optimizations: Loop Fusion, Loop Unrolling, Vectorization
• Mechanism Vs. Policy
• Halide
Readings:
Compilers: Principles, Techniques, and Tools, Chapter 10.4, 11.1, 11.2, 11.3, 11.7.8, 11.11
Discussion:
Halide: decoupling algorithms from schedules for high-performance image processing
Synthesis of High-Performance Parallel Programs for a Class of Ab Initio Quantum Chemistry Models
Slides (from Stanford CS343D, Cornell CS 6120)
TBADataflow Analysis and Optimization
• Program Analysis
• Dataflow Analysis
• Static vs. Dynamic Analysis
• Interval Analysis
• Heap Modeling
Readings:
Compilers: Principles, Techniques, and Tools, Second Edition, Chapter 9
Lecture 4, Dataflow Analysis, Cornell CS 6120
Discussion:
Representing Data Collections in an SSA Form
Slides (from Cornell CS 6120)
TBARewriting and Transformation
• Rewriting Systems
• E-graphs
Readings:
Achieving high-performance the functional way: a functional pearl on expressing high-performance optimizations as rewrite strategies
Software Design for Flexibility, Chapter 4
SymbolicUtils.jl
Discussion:
Spiral: Extreme Performance Portability
 
TBAE-graphs
• What is an E-graph?
• E-graph Representation
• Saturation, Search
Readings:
egg: Fast and extensible equality saturation
Discussion:
Guided Equality Saturation
Caviar: an e-graph based TRS for automatic code optimization
Slides TBD
TBAAutotuning
• What is Autotuning?
• Autotuning Frameworks
• Search Space Exploration
Readings:
Automatically tuned linear algebra software
OpenTuner: An Extensible Framework for Program Autotuning
ATF: A Generic Auto-Tuning Framework
Autotuning in High-Performance Computing Applications
 
TBASparse Array Programming
• Domain: Sparse Arrays, Graphs, Meshes, Databases
• Representation (Sparse Matrix Formats, Columnar storage, Fibertree)
Readings:
Automatic Performance Tuning of Sparse Matrix Kernels
A relational model of data for large shared data banks
Discussion:
The tensor algebra compiler
Format abstraction for sparse tensor algebra compilers
Slides (from Stanford CS343D)
TBASparse Array Programming Revisited
• Coiteration (Merge Strategies, Looplets)
• Loops and Iteration
• Three Major Algorithms for Matrix Multiplication
Readings:
Gamma: Leveraging Gustavson’s Algorithm to Accelerate Sparse Matrix Multiplication
Looplets: A Language for Structured Coiteration
Functional Collection Programming with Semi-ring Dictionaries
 
TBAWild And Crazy Ideas Session
• Bring a 5-min Slideshow on your crazy idea that just might work
• Peer Feedback
• Thinking about Final Projects
Readings:
 
TBAVectorization
• SSA Form
• Cleanup Strategies in Halide
• Auto-Vectorization
Readings:
All you need is superword-level parallelism: systematic control-flow vectorization with SLP
Vectorization in Halide
Discussion:
Vectorizing Sparse Matrix Computations with Partially-Strided Codelets
 
TBAMulticore Parallelism
• Multicore Architectures
• Parallel Programming Models
• Cilk
Readings:
The implementation of the Cilk-5 multithreaded language
Discussion:
Heartbeat Scheduling: Provable Efficiency for Nested Parallelism
Tapir: Embedding Fork-Join Parallelism into LLVM’s Intermediate Representation
Slides (from UC Berkeley CS267, MIT 6.172)
TBADistributed Memory Parallelism
• Distributed Memory Architectures
• Message Passing Interface (MPI)
• MapReduce
• UPC
• Legion
• Communication-avoiding Matrix Multiply
Readings:
MapReduce: Simplified Data Processing on Large Clusters
DISTAL: The Distributed Tensor Algebra Compiler
Legion: Expressing Locality and Independence with Logical Regions
Cyclops Tensor Framework
Slides (from UC Berkeley CS267)
TBAAccelerators and GPUs
• GPU Programming Languages
• Data Parallel Languages
Discussion:
Taichi: a language for high-performance computation on spatially sparse data structures
Task-Based Tensor Computations on Modern GPUs
Slides (from UC Berkeley CS267)
TBAStaged Programming
• What is Staged Programming?
• Abstract Interpretation
• Implementations
Readings:
Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs
Discussion:
BuildIt: A Type-Based Multi-stage Programming Framework for Code Generation in C++
 
TBADense Array Programming Revisited
• Revisiting Dense Array Programming
Readings:
A Practical Automatic Polyhedral Parallelizer and Locality Optimizer
The Pochoir Stencil Compiler
Discussion:
Exocompilation for Productive Programming of Hardware Accelerators
 
TBAAutoscheduling
• What is Autoscheduling?
• Three Ingredients of Autoscheduling (Search Space, Search Strategy, Cost Model)
• Autotuning vs. Autoscheduling
• Search Space Characterization
• Search Strategies
Readings:
Learning to Optimize Halide with Tree Search and Random Program
TBD slides
TBAAutoscheduling Revisited
• Cost Modeling (Feature-based/ML, Sparse Cost Models/Cardinality Estimation)
Readings:
How to Architect a Query Compiler
Autoscheduling for sparse tensor algebra with an asymptotic cost model
Discussion:
Galley: Modern Query Optimization for Sparse Tensor Programs
TBD slides
TBACompiling Programs Faster
• Compilation Time vs. Execution Time
• Prepared queries
• Intepreters vs. Compilers
• Turn off Compilation
Readings:
Firefox’s Baseline WebAssembly Compiler
Discussion:
Copy-and-patch compilation: a fast compilation algorithm for high-level languages and bytecode
Slides (from Stanford CS343D)
TBAGuest Lecture: Fredrik KjolstadReadings: 
TBAGuest Lecture: Charith MendisReadings: 
TBAGuest Lecture: Jacob LaurelReadings: 
TBAFinal Project Session
• Bring a 5-min Slideshow on your crazy idea that just might work
• Peer Feedback
• Thinking about Final Projects
Readings:
 

Inspired by