Low Level Performance Archives - Page 3 of 5

Frugal Programming: Saving Memory Subsystem Bandwidth

January 30, 2023February 3, 2023Ivica BogosavljevićMemory Subsystem PerformanceLeave a Reply

We investigate techniques of frugal programming: how to program so you don’t waste the limited memory resources in your computer system.

For Software Performance, the Way Data is Accessed Matters!

November 12, 2022March 3, 2023Ivica BogosavljevićLow Level Performance, Memory Subsystem Performance, Performance2 Replies

In our experiments with the memory access pattern, we have seen that good data locality is a key to good software performance. Accessing memory sequentially and splitting the data set into small-sized pieces which are processed individually improves data locality and software speed. In this post, we will present a few techniques to improve the…

Read

When an instruction depends on the previous instruction depends on the previous instructions… : long instruction dependency chains and performance

September 24, 2022February 29, 2024Ivica BogosavljevićComputational Performance, Low Level Performance, PerformanceLeave a Reply

This post has a second part, the same problem is solved differently. Read more. In this post we investigate long dependency chains: when an instruction depends on the previous instruction depends on the previous instruction… We want to see how long dependency chains lower CPU performance, and we want to measure the effect of interleaving…

Read

The memory subsystem from the viewpoint of software: how memory subsystem affects software performance 2/3

August 17, 2022February 3, 2023Ivica BogosavljevićLow Level Performance, Memory Subsystem Performance, Performance2 Replies

We continue the investigation from the previous post, trying to measure how the memory subsystem affects software performance. We write small programs (kernels) to quantify the effects of cache line, memory latency, TLB cache, cache conflicts, vectorization and branch prediction.

The memory subsystem from the viewpoint of software: how memory subsystem affects software performance 1/3

July 26, 2022November 12, 2022Ivica BogosavljevićLow Level Performance, Memory Subsystem Performance, Performance2 Replies

In this post we investigate the memory subsystem of a desktop, server and embedded system from the software viewpoint. We use small kernels to illustrate various aspects of the memory subsystem and how it effects performance and runtime.

Instruction-level parallelism in practice: speeding up memory-bound programs with low ILP

June 19, 2022April 3, 2024Ivica BogosavljevićLow Level Performance, Memory Subsystem Performance, PerformanceLeave a Reply

We talk about instruction level parallelism: what instruction-level parallelism is, why is it important for your code’s performance and how you can add instruction-level parallelism to improve the performance of your memory-bound program.

Memory consumption, dataset size and performance: how does it all relate?

May 22, 2022May 31, 2025Ivica BogosavljevićLow Level Performance, Memory Subsystem Performance, Performance2 Replies

We investigate how memory consumption, dataset size and software performance correlate…

Vectorization, dependencies and outer loop vectorization: if you can’t beat them, join them

March 13, 2022August 14, 2022Ivica BogosavljevićComputational Performance, Low Level Performance4 Replies

As I already mentioned in earlier posts, vectorization is the holy grail of software optimizations: if your hot loop is efficiently vectorized, it is pretty much running at fastest possible speed. So, it is definitely a goal worth pursuing, under two assumptions: (1) that your code has a hardware-friendly memory access pattern1 and (2) that…

Read

Why is quicksort faster than heapsort? And how to make them faster?

January 28, 2022April 23, 2025Ivica BogosavljevićAlgorithms and Performance, Low Level Performance, PerformanceLeave a Reply

We try to answer the question of why is quicksort faster than heapsort and then we dig deeper into these algorithms’ hardware efficiency. The goal: making them faster.

When vectorization hits the memory wall: investigating the AVX2 memory gather instruction

December 24, 2021March 19, 2022Ivica BogosavljevićComputational Performance, Low Level Performance, Performance4 Replies

For all the engineers who like to tinker with software performance, vectorization is the holy grail: if it vectorizes, this means that it runs faster. Unfortunately, many times this is not the case, and the results of forcing vectorization by any means can mean lower performance. This happens when vectorization hits the memory wall: although…

Read