Fall 2022 SPO600 Project

From CDOT Wiki
Revision as of 12:29, 11 November 2022 by Chris Tyler (talk | contribs) (Created page with "This page describes the SPO600 project in the Fall 2022 semester. == Overview == The autovectorizer in gcc (and other compilers, such as llvm/clang) has become very good...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

This page describes the SPO600 project in the Fall 2022 semester.

Overview

The autovectorizer in gcc (and other compilers, such as llvm/clang) has become very good -- to the point that it is automatically enabled at optimization level -O2 (standard optimization level) in recent versions of gcc.

However, there are many different implementations of SIMD instructions on various CPUs -- on 64-bit Arm systems, there's Advanced SIMD, SVE, and SVE2; on x86, there's SSE, SSE2, AVX, AVX512, and more. It is desirable to be able to build a single binary that takes optimal advantage of the available CPU capabilities.

There is a tool provided by the gcc compiler to allow the run-time selection of one of several different implementations of a function (or procedure or method or subroutine): ifunc. However, ifunc requires additional setup by the software developer.

The goal for this project is to produce a proof-of-concept tool that will take code that meets specific conditions and automatically build it with ifunc capability to select between multiple, autovectorized versions of a function, to take advantage of the best SIMD implementation available on the CPU on which the code is running.

Imagine that you have two source files:

main.c        # contains main() and possibly other functions
function.c    # contains one function named foo()

The file function.c will be built three times, each time using the autovectorizer, targeting different SIMD implementations for aarch64 (advanced SIMD, SVE, and SVE2). The appropriate ifunc code will be inserted so that the correct build of the foo() function is executed based on the capabilities of the computer on which it runs.

Limitations

Since the goal of this project is to produce a proof-of-concept, these limitations are accepted:

  1. This tool only operates on aarch64 systems
  2. There are three targets of interest: machines with advanced SIMD, SVE, and SVE2 capabilities.
  3. There are only two input source files, one containing main (and optionally other functions) (main.c) and one containing a function to be optimized (function.c)
  4. Only function.c is built multiple times for different SIMD implementations
  5. The file function.c may only contain one function

Requirements

The finished project:

  • Can be written in any language which will operate on the target environment, which is a 64-bit Arm system running Fedora 35 (such as israel.cdot.systems) with gcc 11.3.1. This means that the tool itself can be written in C, python, perl, bash, JS/node, haskell, or any other language available for that platform
  • Once started with the appropriate arguments, the tool must produce an output file which will use advanced SIMD, SVE, or SVE2 instructions for the function contained in the function.c file according to the capabilites of the platform on which it is run. Thus, if the code is executed on israel.cdot.systems directly, it will execute with advanced SIMD (non-SVE) instructions only. If it is executed on israel.cdot.systems using the qemu-aarch64 emulation tool, it will use SVE2 instructions.

Test Code

To test your solution, use the code available at https://github.com/ctyler/spo600-fall2022-project-test-code as the input.

Project Stages

Stage 1

What is required:

  • Provide a plan for your project
    • Specify the language that you're going to use
    • Specify the overall operation of your project -- how you're going to approach the problem
    • Describe the challenges you expect to face as you implement the code
  • Submit your plan in one or more clear and detailed blog posts

Due: Sunday, November 18, 11:59 pm

Mark: 15%

Stage 2

What is required:

  • Provide the initial implementation of your project
  • The initial implementation must be able to produce a usable output binary that correctly uses the best available SIMD implementation, but it may have additional limitations or bugs. These limitations and bugs must be appropriately documented
  • Provide clear documentation on what the project does and how to test it
  • Submit the implementation as one or more blog posts linked to your code hosted appropriately (recommendation: place it in an accessible git repository)

Due: Sunday, December 4, 11:59 pm

Mark: 20%

Stage 3

What is required:

  • A final implementation of your project
  • This implementation should not have any limitations beyond those listed in the #Limitations Limitations section above
  • Bonus points will be awarded if your project works well and is not subject to some of the #Limitations Limitations listed above. For example, your project could work on aarch64 and x86_64 systems, or it could accept multiple function files
  • Bonus points will be awarded if your project has additional useful features, such as notifying the user if autovectorization could not be applied to the code

Due: Wednesday, December 14, 11:59 pm

Mark: 25%