AVX/NEON Vectorization Workshop

For software developers and companies who wish to learn how to write fast software, we offer two workshops:

  • AVX vectorization workshop teaches you how accelerate your programs program using AVX vector extensions available in most modern Intel’s and AMD’s CPUs.
  • NEON vectorization workshop teaches you how to accelerate your programs using NEON vector extensions available in most modern ARM CPUs.

We introduce vectorization ground up from the most basic concept up to very advanced vectorization topics.

Workshop Details

  • Prerequisites: C or C++ knowledge on an intermediate level. No previous knowledge of vectorization required.
  • Duration: 2 days or 4 half-days
  • Agenda: see bellow
  • Hardware and software prerequisites: see bellow
  • Training materials:
    • Slides for the workshop participants.
    • Practical labs after most units of the workshop.
    • Source code for all examples mentioned in the workshop.
  • Written language: English
  • Spoken language of the workshop: English or German

Types of Session

Public Online Sessions

  • Public online sessions are organized when enough participants register.
  • Number of participants: at most 12.
  • We offer public sessions in two times:
    • Two 8-hour sessions starting from 9 AM – 5 PM CET (Germany, France), 8 AM – 4 PM GMT (United Kingdom), 1:30 PM – 9:30 PM IST (India) and 4 PM -12 AM CST (China).
    • Four 4-hour sessions starting from 5 PM – 9 PM CET (Germany, France), 11 AM – 3 PM (New York) and 8 AM – 12 PM (California).
  • Registration:
    • To express your wish participate at a workshop, send an e-mail at info@johnnysswlab.com with your full name and the course you wish to take (AVX or NEON) and the preferred times.
    • We will inform you about the dates of the workshop once enough people register. At most you will wait three months; we will organize the workshop regardless of number of participants.
  • Price:
    • Full price: € 749, $799
    • Discounted price: € 599, $639. To be eligible for the discounted price, you will need to pay a € 119, $129 when registering. The remaining part (€480, $510) is paid before the workshop begins.

Agenda

Most of the content in AVX and NEON workshop overlap. However, they are split into two workshop because of considerable amount of new information participants need to take in.

AVX WorkshopNEON Workshop
A short introduction to vectorization
Introduction to AVX intrinsicsIntroduction to NEON intrinsics
Advanced AVX intrinsicsAdvanced NEON intrinsics
Basic vectorization patterns – vectorizing for loops, foor loops with early exit, while loops and convergence loops
Common vectorization patterns – vectorizing loops with conditions, conditional counting, loops with structs and matrix transposition.
Vectorization inhibitors – learn to detect and remove obstacles that hinder efficient vectorization
Vectorization types according to data access pattern – there are several ways to do vectorization, here we investigate inner-loop vectorization, outer-loop vectorization. For AVX we also talk about how to vectorize accesses to binary trees and hash maps.
Advanced vectorization patterns – we talk about how to vectorize copy_if, trees and lookup tables.
Memory performance – improve the performance of your vectorized code by better using the memory subsystem.
Peak performance – reach peak software performance by breaking instruction dependecies, avoiding register spills and cleverly using everything hardware has to offer.

Hardware and Software Prerequisites

AVX Workshop

For AVX workshop you will need:

  • Any CPU supporting AVX2 vector extensions. These include:
    • Intel
      • Haswell processors (Q2 2013) and newer, except models branded as Celeron and Pentium.
      • Pentium and Celeron branded processors starting with Tiger Lake (Q3 2020) and newer.
    • AMD
      • Excavator processors (Q2 2015) and newer.
  • On of the following environments:
    • Windows with MSVC version 2019 or later
    • Ubuntu running on WSL (Windows Services for Linux) with g++ compiler
    • Ubuntu based Linux with g++ compiler

Macintosh is NOT supported at the moment.

NEON Workshop

Unfortunately, we don’t provide hardware for our workshops at the moment. You can use the emulator if you don’t have the hardware.

For doing the NEON workshop on the real hardware you will need one of the following:

  • An ARM 64-bit CPU with NEON extensions. Raspberry Pi 3 or later support this. On a Linux ARM system, you can check by running: lscpu | grep -e asimd -e aarch64. The output of this command should be:
    Architecture: aarch64
    Flags: fp asimd evtstrm crc32 cpuid
    ,
  • Operating system should be Linux with g++ installed.

For doing the NEON workshop on an emulator you will need:

  • An x86-64 system with either Ubuntu based Linux OR Ubuntu running on Windows Services for Linux (WSL)
  • qemu-aarch64 for emulating the system (available in the Ubuntu repositories through qemu-user package
  • aarch64-linux-gnu-g++ compiler (available in the Ubuntu repositories through g++-aarch64-linux-gnu package)

Macintosh is NOT supported at the moment, although Apple M1 and later chips support NEON.