Fall 2021 SPO600 Project
SPO600 is a project-based course. Usually, this project involves porting or optimization of some piece of open source software on a particular architecture (typically AArch64). However, at this point in time, there are few viable opportunities for AArch64 optimization of open source software packages, as ARMv8 optimization is largely complete and ARMv9 systems are not yet available for testing.
As a result, the SPO600 project for this semester includes three pieces:
- A practical experiment in benchmarking,
- An investigation into the state of AArch64 optimizations in an open source software package, and
- An analysis of, and recommendations for, future ARMv9 SVE2 optimizations in that same open source software package.
Part 1: Benchmarking
Task: Perform the Algorithm Selection Lab.
Due: Sunday, December 5, 11:59 pm
Part 2: Analysis of an AArch64 SIMD Optimization
Select one of these open source packages:
Perform the following steps:
- Obtain the source code for the latest version of the software. You can use the latest released version or the development version (from a version control system, such as git or cvs).
- Locate the SIMD code within the software. Note that this might be assembler code in a .s or .S file, or it might be inline assembler, or it might be C compiler intrinsics, or it might be inline assembler or intrinsics set up through preprocessor macros.
- Find and examine the SIMD code for AArch64. Figure out how it works.
- Examine the SIMD code for other architectures. Focus on x86_64 implementations, including SSE/SSE2 (128 bit), AVX (256 bit), and AVX512 (512 bit) SIMD extensions.
- Examine the selection mechanisms for the SIMD code, including the compile-time mechanisms (for example, compiler directives such as
#ifdef) and runtime mechanisms (such as CPU feature identification).
- Blog a thorough report of your review.
- Start with an introduction: Explain which package you selected, what that package does, why it is important, and how actively used and maintained that package is.
- Describe how SIMD is used within that package on various architectures. Which SIMD implementations are supported? Are they selected at compile-time, runtime, or both? Are they implemented using separate assembler source files, inline assembler, or intrinsics? Are macros used? Are there completely separate implementations for various architectures and SIMD implementations contained in separate files, or are are they interleaved together in the source code?
- Explain what the SIMD code is used for - for example, linear algebra, multimedia acceleration, crypto, or compression? Estimate the importance of the SIMD code to the package.
- In the conclusion of the report, evaluate the use of SIMD in the package. Does the package take full advantage of SIMD? Is the code well-structured? How could it be improved?
Due: Sunday, December 12 at midnight (11:59 pm).
Part 3: Recommendations for SVE2 Enhancements
ARMv9 is on the horizon, and it includes an improved SIMD implementation called Scalable Vector Extensions v2 (SVE2).
For the open source package that you selected in Part 2, blog a recommendation on how the software should be extended to support SVE2. In particular, consider the fact that the existing implementations are for fixed-width SIMD, while SVE2 is variable-width, and that AArch64 code will need to detect either at compile-time or runtime whether advanced SIMD or SVE2 SIMD instructions should be used.
Due: Wednesday, December 15 at midnight (11:59 pm).
Basic overview of SVE2 - broadly applicable
More detailed documentation - optional/may be useful
- ARM - SVE Programming Examples
- ARM - Introduction to C Language Extensions for SVE2 - documents the C Compiler Intrinsics for SVE2