In this blog I wrote 18 blog posts about memory subsystem optimizations. By memory subsystem optimizations, I mean optimizations that aim at making software faster by better using the memory subsystem. Most of them are applicable to software that works with large datasets; but some of them are applicable to software that works with any data regardless of its size.
Do you need to discuss a performance problem in your project? Or maybe you want a vectorization training for yourself or your team? Contact us
Or follow us on LinkedIn , Twitter or Mastodon and get notified as soon as new content becomes available.
Here is a list of all posts that we covered on Johnny’s Software Lab:
Topic | Description | Link |
---|---|---|
Decreasing Total Memory Accesses | We speed up software by keeping data in registers instead of reloading it from the memory subsystem several times. | Decreasing the Number of Memory Accesses 1/2 Decreasing the Number of Memory Accesses: The Compiler’s Secret Life 2/2 |
Changing the Data Access Pattern to Increase Locality | By changing our data access pattern we increase the possibility our data is in the fastest level of data cache. | For Software Performance, the Way Data is Accessed Matters! |
Changing the Data Layout: Classes | Selecting proper class data layout can improve software performance. | Software Performance and Class Layout |
Changing the Data Layout: Data Structures | By changing the data layout of common data structures, such as linked lists, trees or hash maps we can improve their performance. | Faster hash maps, binary trees etc. through data layout modification |
Decreasing the Dataset Size | Memory efficiency can be improved by decreasing the dataset size. This results in speed improvements as well. | Memory consumption, dataset size and performance: how does it all relate? |
Changing the Memory Layout | Whereas data layout is determined at compile time, memory layout is determined by the system allocator at runtime. We examine how changing the memory layout using custom allocators influences software performance. | Performance Through Memory Layout |
Increasing instruction-level parallelism | Some codes cannot utilize the memory subsystem fully because of instruction dependencies. Here we investigate techniques that break dependencies and improve performance. | Instruction-level parallelism in practice: speeding up memory-bound programs with low ILP Hiding Memory Latency With In-Order CPU Cores OR How Compilers Optimize Your Code |
Software prefetching for random data accesses | Explicit software prefetches tell hardware that you will be accessing a certain piece of data soon. When used smartly, they can improve software performance. | The pros and cons of explicit software prefetching |
Decreasing TLB cache misses | TLB cache is a small cache that speeds up translation of virtual to physical memory addresses. In some cases, it can be the reason for poor performance. We investigate techniques for decreasing TLB cache misses. | Speeding Up Translation of Virtual To Physical Memory Addresses: TLB and Huge Pages |
Saving the memory subsystem bandwidth | In some cases, we don’t care about software performance, but we do care about being a good neighbor. We investigate techniques that make our software consume least possible amount of memory subsystem resources. | Frugal Programming: Saving Memory Subsystem Bandwidth |
Branch prediction and data caches | We investigate the delicate interplay of the branch prediction and the memory subsystem. | Unexpected Ways Memory Subsystem Interacts with Branch Prediction |
Multithreading and the Memory Subsystem | Here we investigate how memory subsystem behaves in the presence of multithreading and how does that effect software speed. | Multithreading and the Memory Subsystem |
Low-latency applications | In some cases we are more interested in short latency than high throughput. We investigate the techniques aimed at improving latency, either by modifying our programs, or reconfiguring the system. | Latency-Sensitive Applications and the Memory Subsystem: Keeping the Data in the Cache Latency-Sensitive Application and the Memory Subsystem Part 2: Memory Management Mechanisms |
Measuring Memory Subsystem Performance | We talk about tools and metrics you can use to understand what is going on with the memory subsystem. | Measuring Memory Subsystem Performance |
Other topics | A few remaining topics related to memory subsystem optimizations that didn’t fit any of the other categories. | Memory Subsystem Optimizations – The Remaining Topics |
Any feedback on the material covered in this posts will be highly appreciated.
Do you need to discuss a performance problem in your project? Or maybe you want a vectorization training for yourself or your team? Contact us
Or follow us on LinkedIn , Twitter or Mastodon and get notified as soon as new content becomes available.