instruction level parallelism Archives

Exposing More Parallelism Is the Hidden Reason Why Some Vectorized Loops Are Faster – Not Vectorization per se

February 26, 2026February 26, 2026Ivica BogosavljevićLow Level Performance, Performance, Vectorization2 Replies

I was preparing an article about Highway – portable vectorization library by Google – so I ported a few examples from my vectorization workshop from AVX to Highway. One of the examples was vectorized binary search. I assume most readers are familiar with simple binary search. It looks something like this: We take a lookup…

Read

A story of a very large loop with a long instruction dependency chain

February 29, 2024February 29, 2024Ivica BogosavljevićComputational Performance, Low Level Performance, Performance, VectorizationLeave a Reply

A story of a very large loop with a long instruction dependency chain.

Hiding Memory Latency With In-Order CPU Cores OR How Compilers Optimize Your Code

June 26, 2023July 31, 2023Ivica BogosavljevićMemory Subsystem Performance, PerformanceLeave a Reply

We investigate techniques for hiding memory latency on in-order CPU cores. The same techniques that the compilers employ.

When an instruction depends on the previous instruction depends on the previous instructions… : long instruction dependency chains and performance

September 24, 2022February 29, 2024Ivica BogosavljevićComputational Performance, Low Level Performance, PerformanceLeave a Reply

This post has a second part, the same problem is solved differently. Read more. In this post we investigate long dependency chains: when an instruction depends on the previous instruction depends on the previous instruction… We want to see how long dependency chains lower CPU performance, and we want to measure the effect of interleaving…

Read

Instruction-level parallelism in practice: speeding up memory-bound programs with low ILP

June 19, 2022April 3, 2024Ivica BogosavljevićLow Level Performance, Memory Subsystem Performance, PerformanceLeave a Reply

We talk about instruction level parallelism: what instruction-level parallelism is, why is it important for your code’s performance and how you can add instruction-level parallelism to improve the performance of your memory-bound program.