Software Optimizations for the Memory Subsystem

Upcoming workshop: May 18th until May 21st:

  • US West Coast: 8 AM to 12 PM
  • US East Coast: 11 AM to 3 PM
  • Europe Central: 5 PM to 9 PM

To express interest at participating at this workshop, use the Contact page.

Why is Memory Performance Important?

Modern CPUs are incredibly fast, but they spend a surprising amount of time waiting for data. If your program is limited by memory bandwidth or latency, adding more cores or a faster CPU won’t help — the memory subsystem becomes the bottleneck. Optimizing memory performance often gives bigger speedups than micro-optimizing instructions, especially in data-intensive applications. In short: faster memory access = faster software.

This workshop was inspired by a series of blog posts about software optimizations for the memory subsystem. The full list of links is available here.

Who is This Workshop For?

This workshop is designed for:

  • C and C++ developers who want to understand and fix memory bottlenecks.
  • Performance engineers looking to push systems closer to hardware limits.
  • Teams working on high-performance software in areas like algorithmic trading, machine learning, telecom, embedded, or scientific computing.
  • Anyone who has profiled their application and found that “the CPU is waiting for memory.”

Workshop Details

  • Prerequisites: C or C++ knowledge on an intermediate level.
  • Duration: 2 days or 4 half-days
  • Agenda: see bellow
  • Hardware and software prerequisites: see bellow
  • Training materials:
    • Slides for the workshop participants.
    • Practical labs after most units of the workshop.
    • Source code for all examples mentioned in the workshop.
  • Written language: English
  • Spoken language of the workshop: English

Types of Sessions

Public Online Sessions

  • Public online sessions are organized when enough participants register.
  • Number of participants: at most 12.
  • We offer public sessions in two times:
    • Two 8-hour sessions starting from 9 AM – 5 PM CET (Germany, France), 8 AM – 4 PM GMT (United Kingdom), 1:30 PM – 9:30 PM IST (India) and 4 PM -12 AM CST (China).
    • Four 4-hour sessions starting from 5 PM – 9 PM CET (Germany, France), 11 AM – 3 PM (New York) and 8 AM – 12 PM (California).
  • Registration
    • To express interest at participating at this workshop, use the Contact page.
    • We will inform you about the dates of the workshop once enough people register. At most you will wait three months; we will organize the workshop regardless of number of participants.
  • Price
    • International wire transfer:
      • Full price: € 749, $799
      • Discounted price: € 599, $639, for people who register for the workshop up to 30 days before the workshop starts.
    • Paypal (you can pay with Visa or Master card)
      • Full price: € 809, $859
      • Discounted price: € 649, $689, for people who register for the workshop up to 30 days before the workshop starts.

Agenda

Memory Performance Workshop
Introduction to Memory Subsystem + exercise
Decreasing Total Memory Accesses + exercise
Changing the Data Access Pattern + exercise
Changing the Data Layout: Classes + exercise
Changing the Data Layout: Data Structures
Decreasing the Dataset Size
Changing the Memory Layout + exercise
Increasing instruction-level parallelism
Software prefetching for random data accesses + exercise
Decreasing TLB cache misses
Saving the memory subsystem bandwidth
Branch prediction and data caches
Remaining topics + exercise
Multithreading and the Memory Subsystem
Low-latency applications

Hardware and Software Prerequisites

We highly recommend running the code with LIKWID library enabled. LIKWID will allow you to collect information such as Cycles, Instruction Count, Instruction Per Cycle, Data Cache Miss Rates, Data Throughput etc. These information will allow you develop a better intuition about the memory subsystem.

Prerequisites:

  • If using LIKWID: LIKWID library (available in most distro’s repositories), Linux running on a native system (x86-64), Clang
  • If using LIKWID stub: Linux/MacOs/Windows with Clang or MSVC