We investigate the unusual way memory subsystem interacts with branch prediction and how this interaction shapes software performance.
We continue the investigation from the previous post, trying to measure how the memory subsystem affects software performance. We write small programs (kernels) to quantify the effects of cache line, memory latency, TLB cache, cache conflicts, vectorization and branch prediction.
We try to answer the question of why is quicksort faster than heapsort and then we dig deeper into these algorithms’ hardware efficiency. The goal: making them faster.
In this articles we investigate on how branches influence the performance of the code and what can we do to improve the speed of our branchfull code.