In this post we explore how to speed up our memory intensive programs by decreasing the number of TLB cache misses
In this post we introduce a few most common tools used for memory subsystem performance debugging.
We continue the investigation from the previous post, trying to measure how the memory subsystem affects software performance. We write small programs (kernels) to quantify the effects of cache line, memory latency, TLB cache, cache conflicts, vectorization and branch prediction.
We investigate how memory consumption, dataset size and software performance correlate…