Memory Subsystem Optimizations - Johnny's Software Lab

In this blog I wrote 18 blog posts about memory subsystem optimizations. By memory subsystem optimizations, I mean optimizations that aim at making software faster by better using the memory subsystem. Most of them are applicable to software that works with large datasets; but some of them are applicable to software that works with any data regardless of its size.

Do you need to discuss a performance problem in your project? Or maybe you want a vectorization training for yourself or your team? Contact us

You can also subscribe to our mailing list (link top right of this page) or follow us on LinkedIn , Twitter or Mastodon and get notified as soon as new content becomes available.

Here is a list of all posts that we covered on Johnny’s Software Lab:

Topic	Description	Link
Decreasing Total Memory Accesses	We speed up software by keeping data in registers instead of reloading it from the memory subsystem several times.	Decreasing the Number of Memory Accesses 1/2 Decreasing the Number of Memory Accesses: The Compiler’s Secret Life 2/2
Changing the Data Access Pattern to Increase Locality	By changing our data access pattern we increase the possibility our data is in the fastest level of data cache.	For Software Performance, the Way Data is Accessed Matters!
Changing the Data Layout: Classes	Selecting proper class data layout can improve software performance.	Software Performance and Class Layout
Changing the Data Layout: Data Structures	By changing the data layout of common data structures, such as linked lists, trees or hash maps we can improve their performance.	Faster hash maps, binary trees etc. through data layout modification
Decreasing the Dataset Size	Memory efficiency can be improved by decreasing the dataset size. This results in speed improvements as well.	Memory consumption, dataset size and performance: how does it all relate?
Changing the Memory Layout	Whereas data layout is determined at compile time, memory layout is determined by the system allocator at runtime. We examine how changing the memory layout using custom allocators influences software performance.	Performance Through Memory Layout
Increasing instruction-level parallelism	Some codes cannot utilize the memory subsystem fully because of instruction dependencies. Here we investigate techniques that break dependencies and improve performance.	Instruction-level parallelism in practice: speeding up memory-bound programs with low ILP Hiding Memory Latency With In-Order CPU Cores OR How Compilers Optimize Your Code
Software prefetching for random data accesses	Explicit software prefetches tell hardware that you will be accessing a certain piece of data soon. When used smartly, they can improve software performance.	The pros and cons of explicit software prefetching
Decreasing TLB cache misses	TLB cache is a small cache that speeds up translation of virtual to physical memory addresses. In some cases, it can be the reason for poor performance. We investigate techniques for decreasing TLB cache misses.	Speeding Up Translation of Virtual To Physical Memory Addresses: TLB and Huge Pages
Saving the memory subsystem bandwidth	In some cases, we don’t care about software performance, but we do care about being a good neighbor. We investigate techniques that make our software consume least possible amount of memory subsystem resources.	Frugal Programming: Saving Memory Subsystem Bandwidth
Branch prediction and data caches	We investigate the delicate interplay of the branch prediction and the memory subsystem.	Unexpected Ways Memory Subsystem Interacts with Branch Prediction
Multithreading and the Memory Subsystem	Here we investigate how memory subsystem behaves in the presence of multithreading and how does that effect software speed.	Multithreading and the Memory Subsystem
Low-latency applications	In some cases we are more interested in short latency than high throughput. We investigate the techniques aimed at improving latency, either by modifying our programs, or reconfiguring the system.	Latency-Sensitive Applications and the Memory Subsystem: Keeping the Data in the Cache Latency-Sensitive Application and the Memory Subsystem Part 2: Memory Management Mechanisms
Measuring Memory Subsystem Performance	We talk about tools and metrics you can use to understand what is going on with the memory subsystem.	Measuring Memory Subsystem Performance
Other topics	A few remaining topics related to memory subsystem optimizations that didn’t fit any of the other categories.	Memory Subsystem Optimizations – The Remaining Topics

Any feedback on the material covered in this posts will be highly appreciated.

Leave a Reply Cancel reply