Welcome to Johnny’s Software Lab, a blog for all interested in fast software written in C and C++.
If your project is struggling to deliver because of performance issues, check out the Consulting page on the ways we can help you.
Featured posts
Memory Subsystem Performance
- The memory subsystem from the viewpoint of software: how memory subsystem affects software performance 1/3
- The memory subsystem from the viewpoint of software: how memory subsystem affects software performance 2/3
- Memory consumption, dataset size and performance: how does it all relate?
- Decreasing the Number of Memory Accesses 1/2
- Decreasing the Number of Memory Accesses: The Compiler’s Secret Life 2/2
- Software Performance and Class Layout
- For Software Performance, the Way Data is Accessed Matters!
- Faster hash maps, binary trees etc. through data layout modification
Posts related to improving software performance by better using the memory subsystem:
C++ Performance
Posts related to better usage of C++ language features:
Low-level optimizations
Posts related to better using the underlying hardware:
- Make your program run faster by better using the data cache
- How branches influence the performance of your code and what can you do about it?
- Make your programs run faster: avoid function calls
- Make your programs run faster: avoid expensive instructions
- Making your program run faster in a multithreaded environment
Compilers, Toolchains and Performance
Posts related to improving the performance of your code by better using the compilers and toolchains:
Parallel Programming
Posts related to improving the performance of your code by using additional resources in the computer system:
Performance Analysis Tools
Posts related to analyzing software’s performance:
Debugging
Posts sorted by most recent
- Speeding Up Convergence Loops. Or, on Vectorization and Precision Control
- Latency-Sensitive Application and the Memory Subsystem Part 2: Memory Management Mechanisms
- Latency-Sensitive Applications and the Memory Subsystem: Keeping the Data in the Cache
- The pros and cons of explicit software prefetching
- A story of a very large loop with a long instruction dependency chain
- On Avoiding Register Spills in Vectorized Code with Many Constants
- Unexpected Ways Memory Subsystem Interacts with Branch Prediction
- Multithreading and the Memory Subsystem
- Speeding Up Translation of Virtual To Physical Memory Addresses: TLB and Huge Pages
- Faster hash maps, binary trees etc. through data layout modification
- Performance Through Memory Layout
- Measuring Memory Subsystem Performance
- Hiding Memory Latency With In-Order CPU Cores OR How Compilers Optimize Your Code
- Software Performance and Class Layout
- Horrible Code, Clean Performance
- Decreasing the Number of Memory Accesses: The Compiler’s Secret Life 2/2
- Decreasing the Number of Memory Accesses 1/2
- Frugal Programming: Saving Memory Subsystem Bandwidth
- Loop Optimizations: interpreting the compiler optimization report
- For Software Performance, the Way Data is Accessed Matters!
- What is faster: vec.emplace_back(x) or vec[x] ?
- When an instruction depends on the previous instruction depends on the previous instructions… : long instruction dependency chains and performance
- The memory subsystem from the viewpoint of software: how memory subsystem affects software performance 2/3
- The memory subsystem from the viewpoint of software: how memory subsystem affects software performance 1/3
- Instruction-level parallelism in practice: speeding up memory-bound programs with low ILP
- Memory consumption, dataset size and performance: how does it all relate?
- Crash course introduction to parallelism: Multithreading
- Vectorization, dependencies and outer loop vectorization: if you can’t beat them, join them
- Making your program run faster: the key concepts of software performance
- Why is quicksort faster than heapsort? And how to make them faster?
- When vectorization hits the memory wall: investigating the AVX2 memory gather instruction
- What are premature optimizations?
- Why do programs get slower with time?
- Loop Optimizations: taking matters into your hands
- Loop Optimizations: how does the compiler do it?
- The quest for the fastest linked list
- Performance Tuning Contest: July 2021 edition
- Debugging performance issues in kernel space: minor fault and major faults
- Hardware performance counters the easy way: quickstart likwid-perfctr
- Debugging performance issues in kernel space: system calls
- Memory Access Pattern and Performance: the Example of Matrix Multiplication
- “Premature optimization is the root of all evil”. Or is it?
- Flexibility and Performance
- Speedscope: visualize what your program is doing and where it is spending time
- Speeding up an Image Processing Algorithm
- The true price of virtual functions in C++
- 2-minute read: Class Size, Member Layout and Speed
- Performance Tuning Contest: February 2021 edition
- Crash course introduction to parallelism: SIMD Parallelism
- 2-minute read: How is Big O notation relevant on modern systems?
- Crash course introduction to parallelism: the algorithms
- Making your program run faster in a multithreaded environment
- 2-minute read: the Magic Touch of Parallel Algorithms
- 2-minute read: What is faster, std::endl or ‘\n’?
- Use explicit data prefetching to faster process your data structure
- Multitime: a small utility to measure your program’s runtime
- Excessive copying in C++ and your program’s speed
- Make your programs run faster: avoid expensive instructions
- Tune your program’s speed with profile guided optimizations
- Process polymorphic classes in lightning speed
- RR: The magic of record and replay debugging
- The price of dynamic memory: Memory Access
- The price of dynamic memory: Allocation
- Lessons in debugging: observe how programs interact with the Linux kernel with STRACE
- How branches influence the performance of your code and what can you do about it?
- CPU Dispatching: Make your code both portable and fast
- GDB: A quick guide to make your debugging easier
- Make your programs run faster: avoid function calls
- FlameGraphs: Understand where your program is spending time
- MOSH – a simple SSH replacement that works when network conditions are bad
- Link Time Optimizations: New Way to Do Compiler Optimizations
- Make your programs run faster by better using the data cache
- A story about error recovery a.k.a those boring recurring bugs and what to do about them