Upcoming workshop: May 18th until May 21st:
- US West Coast: 8 AM to 12 PM
- US East Coast: 11 AM to 3 PM
- Europe Central: 5 PM to 9 PM
To express interest at participating at this workshop, use the Contact page.
Why is Memory Performance Important?
Modern CPUs are incredibly fast, but they spend a surprising amount of time waiting for data. If your program is limited by memory bandwidth or latency, adding more cores or a faster CPU won’t help — the memory subsystem becomes the bottleneck. Optimizing memory performance often gives bigger speedups than micro-optimizing instructions, especially in data-intensive applications. In short: faster memory access = faster software.
This workshop was inspired by a series of blog posts about software optimizations for the memory subsystem. The full list of links is available here.
Who is This Workshop For?
This workshop is designed for:
- C and C++ developers who want to understand and fix memory bottlenecks.
- Performance engineers looking to push systems closer to hardware limits.
- Teams working on high-performance software in areas like algorithmic trading, machine learning, telecom, embedded, or scientific computing.
- Anyone who has profiled their application and found that “the CPU is waiting for memory.”
Workshop Details
- Prerequisites: C or C++ knowledge on an intermediate level.
- Duration: 2 days or 4 half-days
- Agenda: see bellow
- Hardware and software prerequisites: see bellow
- Training materials:
- Slides for the workshop participants.
- Practical labs after most units of the workshop.
- Source code for all examples mentioned in the workshop.
- Written language: English
- Spoken language of the workshop: English
Types of Sessions
- Public online sessions, see bellow for details on available dates and pricing.
- Private online sessions: on request
- Private onsite sessions: on request
Public Online Sessions
- Public online sessions are organized when enough participants register.
- Number of participants: at most 12.
- We offer public sessions in two times:
- Two 8-hour sessions starting from 9 AM – 5 PM CET (Germany, France), 8 AM – 4 PM GMT (United Kingdom), 1:30 PM – 9:30 PM IST (India) and 4 PM -12 AM CST (China).
- Four 4-hour sessions starting from 5 PM – 9 PM CET (Germany, France), 11 AM – 3 PM (New York) and 8 AM – 12 PM (California).
- Registration
- To express interest at participating at this workshop, use the Contact page.
- We will inform you about the dates of the workshop once enough people register. At most you will wait three months; we will organize the workshop regardless of number of participants.
- Price
- International wire transfer:
- Full price: € 749, $799
- Discounted price: € 599, $639, for people who register for the workshop up to 30 days before the workshop starts.
- Paypal (you can pay with Visa or Master card)
- Full price: € 809, $859
- Discounted price: € 649, $689, for people who register for the workshop up to 30 days before the workshop starts.
- International wire transfer:
Agenda
| Memory Performance Workshop |
|---|
| Introduction to Memory Subsystem + exercise |
| Decreasing Total Memory Accesses + exercise |
| Changing the Data Access Pattern + exercise |
| Changing the Data Layout: Classes + exercise |
| Changing the Data Layout: Data Structures |
| Decreasing the Dataset Size |
| Changing the Memory Layout + exercise |
| Increasing instruction-level parallelism |
| Software prefetching for random data accesses + exercise |
| Decreasing TLB cache misses |
| Saving the memory subsystem bandwidth |
| Branch prediction and data caches |
| Remaining topics + exercise |
| Multithreading and the Memory Subsystem |
| Low-latency applications |
Hardware and Software Prerequisites
We highly recommend running the code with LIKWID library enabled. LIKWID will allow you to collect information such as Cycles, Instruction Count, Instruction Per Cycle, Data Cache Miss Rates, Data Throughput etc. These information will allow you develop a better intuition about the memory subsystem.
Prerequisites:
- If using LIKWID: LIKWID library (available in most distro’s repositories), Linux running on a native system (x86-64), Clang
- If using LIKWID stub: Linux/MacOs/Windows with Clang or MSVC
