As I already mentioned in earlier posts, vectorization is the holy grail of software optimizations: if your hot loop is efficiently vectorized, it is pretty much running at fastest possible speed. So, it is definitely a goal worth pursuing, under two assumptions: (1) that your code has a hardware-friendly memory access pattern1 and (2) that…
All posts in Performance
Making your program run faster: the key concepts of software performance
In this post we present key concepts of software performance engineering.
Why is quicksort faster than heapsort? And how to make them faster?
We try to answer the question of why is quicksort faster than heapsort and then we dig deeper into these algorithms’ hardware efficiency. The goal: making them faster.
When vectorization hits the memory wall: investigating the AVX2 memory gather instruction
For all the engineers who like to tinker with software performance, vectorization is the holy grail: if it vectorizes, this means that it runs faster. Unfortunately, many times this is not the case, and the results of forcing vectorization by any means can mean lower performance. This happens when vectorization hits the memory wall: although…
What are premature optimizations?
In this post we try to answer the questions “what are premature optimizations?”
Why do programs get slower with time?
We investigate why software gets slower as new features are added or data set grows and what can you do about it.
Loop Optimizations: taking matters into your hands
We try to answer two questions related to compiler optimizations: how can you help the compiler do a better job and when does it make sense to do the compiler optimizations manually.
Loop Optimizations: how does the compiler do it?
We investigate what are the techniques your compiler employs to make your loop run faster.
The quest for the fastest linked list
Linked lists are celebrity data structures of software development. They are celebrities because every engineer has had something to do with them in one part of their career. They are used in many places: from low-level memory management in operating systems up to data wrangling and data filtering in machine learning. They promise a lot:…
Performance Tuning Contest: July 2021 edition
Denis Bakhvalov from Easyperf.net and me are organizing a performance tuning contest. We give you a source code to optimize, and your task is to investigate it, find the performance bottlenecks, and fix them. In this edition, we are trying to speed up KALDI open source speech recognition toolkit. Click here if you wish to…