AVX/NEON Vectorization Workshops - Johnny's Software Lab

UPCOMING WORKSHOPS
Online:
• AVX Vectorization Workshop TBD, info [at] johnnysswlab [dot] com (Express interest, Express interest for AVX Workshop, Hi! I am interested in AVX Vectorization Workshop. Please inform me about the dates once this information is known. ), More info…
• NEON Vectorization Workshop: TBD, info [at] johnnysswlab [dot] com (Express interest, Expressing Interest for NEON Workshop, Hi! I am interested in NEON Vectorization Workshop. Please inform me about the dates once this information is known.) , More info…

The AVX/NEON Vectorization Workshop is a two-day (or four half-day) intensive training designed for C/C++ developers who want to learn how to dramatically speed up their code using modern SIMD vector instructions. The course walks you through basic vectorization concepts and builds up to advanced patterns using AVX intrinsics (for x86/x64 platforms) or NEON intrinsics (for ARM platforms). No prior knowledge of vectorization is needed—just a solid grasp of C or C++. The workshop offers a mix of instruction and hands-on labs, and is available in public online sessions (in English or German), or privately on demand. More info bellow.

What Is Vectorization?

Vectorization is a performance optimization technique that uses SIMD (Single Instruction, Multiple Data) instructions to process multiple data elements with one operation. For example, instead of iterating over an array and multiplying items one-by-one, vectorization allows you to multiply several elements simultaneously by packing them into wider registers (like AVX on x86 or NEON on ARM). This can dramatically increase performance, especially in compute-intensive loops and array processing.

Who are these Workshops for?

The AVX Vectorization Workshop and NEON Vectorization Workshops are designed for software developers and engineering teams who want to boost performance in C or C++ programs through vectorization. They especially valuable for those working in domains such as high-performance computing, signal processing, multimedia, algorithmic trading, embedded systems, and scientific simulations, where every ounce of speed matters.

The AVX Vectorization Workshop and the NEON Vectorization Workshop are closely related but offered as two separate trainings, each focusing on a specific architecture.

Workshop Details

Prerequisites: C or C++ knowledge on an intermediate level. No previous knowledge of vectorization required.
Duration: 2 days or 4 half-days
Agenda: see bellow
Hardware and software prerequisites: see bellow
Training materials:
- Slides for the workshop participants.
- Practical labs after most units of the workshop.
- Source code for all examples mentioned in the workshop.
Written language: English
Spoken language of the workshop: English or German

Types of Session

Public online sessions, see bellow for details on available dates and pricing.
Private online sessions: on request
Private onsite sessions: on request

Public Online Sessions

Public online sessions are organized when enough participants register.
Number of participants: at most 12.
We offer public sessions in two times:
- Two 8-hour sessions starting from 9 AM – 5 PM CET (Germany, France), 8 AM – 4 PM GMT (United Kingdom), 1:30 PM – 9:30 PM IST (India) and 4 PM -12 AM CST (China).
- Four 4-hour sessions starting from 5 PM – 9 PM CET (Germany, France), 11 AM – 3 PM (New York) and 8 AM – 12 PM (California).
Registration:
- To express your wish participate at a workshop, send an e-mail at info [at] johnnysswlab [dot] com (Registration for AVX/NEON Workshop, Hi! My name is YOUR_NAME. I would like to register for the workshop WORKSHOP…) with your full name and the course you wish to take (AVX or NEON) and the preferred times.
- We will inform you about the dates of the workshop once enough people register. At most you will wait three months; we will organize the workshop regardless of number of participants.
Price
- International wire transfer:
  - Full price: € 749, $799
  - Discounted price: € 599, $639, for people who register for the workshop up to 30 days before the workshop starts.
- Paypal (you can pay with Visa or Master card)
  - Full price: € 809, $859
  - Discounted price: € 649, $689, for people who register for the workshop up to 30 days before the workshop starts.

Agenda

Most of the content in AVX and NEON workshop overlap. However, they are split into two workshop because of considerable amount of new information participants need to take in.

AVX Workshop	NEON Workshop
A short introduction to vectorization
Introduction to AVX intrinsics	Introduction to NEON intrinsics
Advanced AVX intrinsics	Advanced NEON intrinsics
Basic vectorization patterns – vectorizing for loops, foor loops with early exit, while loops and convergence loops
Common vectorization patterns – vectorizing loops with conditions, conditional counting, loops with structs and matrix transposition.
Vectorization inhibitors – learn to detect and remove obstacles that hinder efficient vectorization
Vectorization types according to data access pattern – there are several ways to do vectorization, here we investigate inner-loop vectorization, outer-loop vectorization. For AVX we also talk about how to vectorize accesses to binary trees and hash maps.
Advanced vectorization patterns – we talk about how to vectorize copy_if, trees and lookup tables.
Memory performance – improve the performance of your vectorized code by better using the memory subsystem.
Peak performance – reach peak software performance by breaking instruction dependecies, avoiding register spills and cleverly using everything hardware has to offer.

Hardware and Software Prerequisites

AVX Workshop

For AVX workshop you will need:

Any CPU supporting AVX2 vector extensions. These include:
- Intel
  - Haswell processors (Q2 2013) and newer, except models branded as Celeron and Pentium.
  - Pentium and Celeron branded processors starting with Tiger Lake (Q3 2020) and newer.
- AMD
  - Excavator processors (Q2 2015) and newer.
On of the following environments:
- Windows with MSVC version 2019 or later
- Ubuntu running on WSL (Windows Services for Linux) with g++ compiler
- Ubuntu based Linux with g++ compiler

Macintosh is NOT supported at the moment.

NEON Workshop

Unfortunately, we don’t provide hardware for our workshops at the moment. You can use the emulator if you don’t have the hardware.

For doing the NEON workshop on the real hardware you will need one of the following:

An ARM 64-bit CPU with NEON extensions. These can be:
- Any embedded CPU with ARM64 and Linux, e.g. Raspberry Pi 3 or later support this. On a Linux ARM system, you can check by running: lscpu | grep -e asimd -e aarch64. The output of this command should be:
  Architecture: aarch64 Flags: fp asimd evtstrm crc32 cpuid,
- A MacOS device with Apple Silicon M1 or later.
You should have g++ installed for Linux or XCode for Mac.

For doing the NEON workshop on an emulator you will need:

An x86-64 system with either Ubuntu based Linux OR Ubuntu running on Windows Services for Linux (WSL)
qemu-aarch64 for emulating the system (available in the Ubuntu repositories through qemu-user package
aarch64-linux-gnu-g++ compiler (available in the Ubuntu repositories through g++-aarch64-linux-gnu package)