Exposing More Parallelism Is the Hidden Reason Why Some Vectorized Loops Are Faster – Not Vectorization per se

Exposing More Parallelism Is the Hidden Reason Why Some Vectorized Loops Are Faster – Not Vectorization per se

I was preparing an article about Highway – portable vectorization library by Google – so I ported a few examples from my vectorization workshop from AVX to Highway. One of the examples was vectorized binary search. I assume most readers are familiar with simple binary search. It looks something like this: We take a lookup…

The messy reality of SIMD (vector) functions

The messy reality of SIMD (vector) functions

We’ve discussed SIMD and vectorization extensively on this blog, and it was only a matter of time before SIMD (or vector) functions came up. In this post, we explore what SIMD functions are, when they are useful, and how to declare and use them effectively. A SIMD function is a function that processes more than…

Memory Subsystem Optimizations – The Remaining Topics

Memory Subsystem Optimizations – The Remaining Topics

This is the last memory optimization that we are covering in this blog. You can see the full list of all memory subsystem optimization that we covered earlier here. Definitely a read for anyone who is trying to improve performance of memory intensive software. In this post, we are covering a few remaining optimization techniques…