Performance Trainings

We at Johnny’s Software Lab organize performance trainings, aimed at software developers who wish to make their programs faster.

Vectorization Workshop

Summary: In this 1 day training, we explore manual vectorization (SIMD) as a means to speed up the performance of your code.

Target audience: developers in algorithmic trading; image, audio and video processing; embedded systems; machine learning; scientific computing and game development.

Many modern CPUs, including CPUs in desktop, server and embedded systems (e.g. mobile phones) have special vector processing units. These units can process more than one piece of data in a single instruction. Using these instructions can results in a performance speedup of several times.

Although compilers can automatically emit SIMD instructions in some circumstances, in many cases this is not possible and needs to be done manually. This workshop is all about that: using explicit vectorization APIs to profit from vectorization on codes that the compiler couldn’t vectorize itself.

In this workshop we use three vectorization frameworks:

  • AVX compiler intrinsics, supported with Intel and AMD chips.
  • NEON compiler intrinsics, supported with ARM chips.
  • Google Highway vectorization library, a light wrapper around compiler intrinsics. This is the preferred way to do SIMD programming, as the library is portable, much more readable, and more maintainable.

This training consists of lectures and exercises.

For the training, we expect basic knowledge of C or C++. You don’t need to know anything about vectorization or SIMD to take part in this training.

We cover following topics:

  • Introduction to vectorization: what is it and how it works internally.
  • Basic vector instructions: data loading and storing, arithmetic and logical operations, permutes and conditional instructions. We cover this for each of the three frameworks.
  • Vectorizing Loops: how do we vectorize different types of loops, including various types of for, while and do/while loops.
  • Vectorization Ordering: when vectorizing, we can chose the order in which we access and process data. We introduce two most common orderings: inner-loop vectorization and outer-loop vectorization. We also talk about custom vectorization ordering.
  • Vectorization Inhibitors: we talk about things that inhibit vectorization and also we talk what techniques you can use to allow vectorization even in the presence of inhibitors. Removing inhibitors is part of the “preparation” before doing manual vectorization.
  • Advanced vectorization operations: this covers operations that are important for vectorizing various types of codes, but most of the time not supported directly by hardware instructions. Instead, they need to be implemented by combining several vector instructions. For example, they will allow to vectorize things such as quicksort, minmax, etc.

As a participant, you will learn:

  • Recognize the loops that are not automatically vectorized by the compiler.
  • Recognize the loops that are good candidates for vectorization.
  • Prepare the loop for manual vectorization.
  • Use Highway, AVX compiler intrinsics, or NEON compiler intrinsics to code the vectorized loop and make it several times faster.

For inquiries, use the contact page.