Memory Subsystem Optimizations

In this blog I wrote 18 blog posts about memory subsystem optimizations. By memory subsystem optimizations, I mean optimizations that aim at making software faster by better using the memory subsystem. Most of them are applicable to software that works with large datasets; but some of them are applicable to software that works with any data regardless of its size.

Do you need to discuss a performance problem in your project? Or maybe you want a vectorization training for yourself or your team? Contact us
Or follow us on LinkedIn , Twitter or Mastodon and get notified as soon as new content becomes available.

Here is a list of all posts that we covered on Johnny’s Software Lab:

TopicDescriptionLink
Decreasing Total Memory AccessesWe speed up software by keeping data in registers instead of reloading it from the memory subsystem several times.Decreasing the Number of Memory Accesses 1/2

Decreasing the Number of Memory Accesses: The Compiler’s Secret Life 2/2
Changing the Data Access Pattern to Increase LocalityBy changing our data access pattern we increase the possibility our data is in the fastest level of data cache.For Software Performance, the Way Data is Accessed Matters!
Changing the Data Layout: ClassesSelecting proper class data layout can improve software performance.Software Performance and Class Layout
Changing the Data Layout: Data StructuresBy changing the data layout of common data structures, such as linked lists, trees or hash maps we can improve their performance.Faster hash maps, binary trees etc. through data layout modification
Decreasing the Dataset SizeMemory efficiency can be improved by decreasing the dataset size. This results in speed improvements as well.Memory consumption, dataset size and performance: how does it all relate?
Changing the Memory LayoutWhereas data layout is determined at compile time, memory layout is determined by the system allocator at runtime. We examine how changing the memory layout using custom allocators influences software performance.Performance Through Memory Layout
Increasing instruction-level parallelismSome codes cannot utilize the memory subsystem fully because of instruction dependencies. Here we investigate techniques that break dependencies and improve performance.Instruction-level parallelism in practice: speeding up memory-bound programs with low ILP

Hiding Memory Latency With In-Order CPU Cores OR How Compilers Optimize Your Code
Software prefetching for random data accessesExplicit software prefetches tell hardware that you will be accessing a certain piece of data soon. When used smartly, they can improve software performance.The pros and cons of explicit software prefetching
Decreasing TLB cache missesTLB cache is a small cache that speeds up translation of virtual to physical memory addresses. In some cases, it can be the reason for poor performance. We investigate techniques for decreasing TLB cache misses.Speeding Up Translation of Virtual To Physical Memory Addresses: TLB and Huge Pages
Saving the memory subsystem bandwidthIn some cases, we don’t care about software performance, but we do care about being a good neighbor. We investigate techniques that make our software consume least possible amount of memory subsystem resources.Frugal Programming: Saving Memory Subsystem Bandwidth
Branch prediction and data cachesWe investigate the delicate interplay of the branch prediction and the memory subsystem.Unexpected Ways Memory Subsystem Interacts with Branch Prediction
Multithreading and the Memory SubsystemHere we investigate how memory subsystem behaves in the presence of multithreading and how does that effect software speed.Multithreading and the Memory Subsystem
Low-latency applicationsIn some cases we are more interested in short latency than high throughput. We investigate the techniques aimed at improving latency, either by modifying our programs, or reconfiguring the system.Latency-Sensitive Applications and the Memory Subsystem: Keeping the Data in the Cache

Latency-Sensitive Application and the Memory Subsystem Part 2: Memory Management Mechanisms
Measuring Memory Subsystem PerformanceWe talk about tools and metrics you can use to understand what is going on with the memory subsystem.Measuring Memory Subsystem Performance
Other topicsA few remaining topics related to memory subsystem optimizations that didn’t fit any of the other categories.Memory Subsystem Optimizations – The Remaining Topics

Any feedback on the material covered in this posts will be highly appreciated.

Do you need to discuss a performance problem in your project? Or maybe you want a vectorization training for yourself or your team? Contact us
Or follow us on LinkedIn , Twitter or Mastodon and get notified as soon as new content becomes available.

Leave a Reply

Your email address will not be published. Required fields are marked *