Winter 2014 SPO600 Weekly Schedule
This is a summary/index table. Please follow the links in each cell for additional detail -- especially for the Deliverables column.
|Communication||20%||Jan 31, Feb 28, March 31, April 13|
|Quizzes||10%||May be held during any class. A minimum of 5 one-page quizzes will be given. Lowest 3 scores will not be counted.|
|Labs||10%||See deliverables column above.|
|Project work||60%||Feb 28, March 31, April 13|
Tuesday (Jan 7)
- Introduction to the Problem
- Most software is written in a high-level language which can be compiled into machine code for a specific architecture. However, there is a lot of existing code that contains some architecture-specific code fragments written in Assembly Language.
- Reasons for writing code in Assembly Langauge include:
- Atomic Operations
- Direct access to hardware features, e.g., CPUID registers
- Most of the historical reasons for including assembler are no longer valid. Modern compilers can out-perform most hand-optimized assembly code, atomic operations can be handled by libraries or compiler intrinsics, and most hardware access should be performed through the operating system or appropriate libraries.
- A new architecture has appeared: Aarch64, which is part of ARMv8. This is the first new computer architecture to appear in several years.
- There are over 1400 software packages/modules present in GNU Linux systems which contain architecture-specific assembly language code. Most of these packages cannot be built on Aarch64 systems without modification.
- In this course, you will:
- Select two software packages from a list compiled by Steve Macintyre of Linaro. Each of the packages on this list contains assembly language code which is platform-specific.
- Prepare a fix/patch for the software so that it will run on 64-bit ARM systems (aarch64). This may be done at either of two levels:
- Port - Add additional assembly language code for aarch64 (basic solution).
- Make Portable - Remove architecture-specific code, replacing it with compiler intrinsics or high-level code so that the software will successfully build on multiple platforms.
- Benchmark - Prove that your changes do not cause a performance regression on existing platforms, and that (ideally) it improves performance.
- Upstream your Code - Submitting your code to the upstream (originating) software project so that it can be incorporated into future versions of the software. This will involve going through a code review to ensure that your code is compatible with and acceptable to the upstream community.
- Optional: You can participate in the Linaro Code Porting/Optimization contest. For details, see the YouTube video of Jon "maddog" Hall and Steve Mcintyre at Linaro Connect USA 2013.
- Course details:
- Course resources are linked from the CDOT wiki, starting at http://zenit.senecac.on.ca/wiki/index.php/SPO600 (Quick find: This page will usually be Google's top result for a search on "SPO600").
- Coursework is submitted by blogging.
- Quizzes will be short (1 page) and will be held without announcement at any time. Your lowest three quiz scores will not be counted, so do not worry if you miss one or two.
- Course marks:
- 60% - Project Deliverables
- 20% - Communication (Blog and Wiki writing)
- 20% - Labs and Quizzes
- Friday classes will be held in an "Active Learning Classroom". You are encouraged to bring your own laptop to these classes.
- For more course information, refer to the SPO600 Weekly Schedule (this page), the Course Outline, and SPO600 Course Policies.
Friday (Jan 10)
Week 1 Deliverables
Tuesday (Jan 14)
Friday (Jan 17)
Week 2 Deliverables
Tuesday (Jan 21)
Friday (Jan 24)
Week 3 Deliverables
- Blog your conclusion to the SPO600 Assembler Lab
Tuesday (Jan 28)
Friday (Jan 31)
Week 4 Deliverables
- Reminder: Week 1-3 blog posts are due for marking on Friday, January 31.
- Blog about the Codebase Analysis Lab
Tuesday (Feb 4)
Platform-specific code is often utilized for Memory Barriers and Atomics Operations.
Memory Barriers ensure that memory accesses are sequenced so that multiple threads, processes, cores, or IO devices see a predictable view of memory.
- Leif Lindholm provides an excellent explanation of memory barriers.
- Blog series - I recommend this series, especially the introduction, as a very clear explanation of memory barrier issues.
- Presentation at Embedded Linux Conference 2010 (Note: Acquire/Release in C++11 and ARMv8 aarch64 appeared after this presentation):
- Memory Barriers - A Hardware View for Software Hackers - This is a highly-rated paper that explains memory barrier issues - as the title suggests, it is designed to describe the hardware origin of the problem to software developers. Despite the fact that it is an introduction to the topic, it is still very technical.
- ARM Technical Support Knowlege Article - In what situations might I need to insert memory barrier instructions? - Note that there are some additional mechanisms present in ARMv8 aarch64, including Acquire/Release.
- Kernel Documentation on Memory Barriers - discusses the memory barrier issue generally, and the solutions used within the Linux kernel. This is part of the kernel documentation.
- Acquire-Release mechanisms
Atomics are operations which must be completed in a single step (or appear to be completed in a single step) without potential interruption.
- Wikipedia has a good basic overview of the need for atomicity in the article on Linerarizability
- GCC provides intrinsics (built-in functions) for atomic operations, as documented in the GCC manual:
- The Fedora project has some guidelines/recommendations for the use of these GCC builtins:
Friday (Feb 7)
Hack Session: Potential Project Analysis
Select a project from the Winter 2014 SPO600 Software List and perform these steps:
- Edit that page to put your name in the "Claimed by" column.
- Investigate the package to determine:
- If the current version has been built for ARM (e.g., exists in the Fedora aarch64 port - fastest way to test is to use 'yum' inside the arm64 emulation environment on Ireland)
- What the platform-specific code in the software does
- Whether portable work-arounds exist
- The need for an aarch64 port or for platform-specific code elimination
- Opportunities for optimization
- The amount of work involved in porting and optimizing, and your skills for performing that work
- Based on the result of your investigation, decide on your interest in the project.
- Repeat until you have two packages.
Week 5 Deliverables
- Blog about your two selected projects, including your detailed initial analysis of them.
- You may want to break this into a couple of posts - e.g., post about your first package while you're working on your second.
- Feel free to also blog about why you did not choose particular packages, too.
Tuesday (Feb 11)
- Architecture-specific code for Performance
- Sometimes assembler is used in a C/C++ program for performance. However, modern versions of C/C++ (such as C++11) and recent compilers provide portable ways of accessing high-performance processor capabilities, such as Single Instruction/Multiple Data (SIMD) instructions (called "marketing names" such as SSE, Neon, MMX, 3DNow, or AltaVec on various processors).
- Linaro enginener Matthew Gretton-Dann gave a good presentation on Porting and Optimizing Code for aarch64. The vectorization portion, beginning at 28:10, provides a good introduction to SIMD and autovectorization using GCC on aarch64 (Note that the earlier portion of the presentation includes good information about Atomics).
- Note that in the presentation above, Matthew takes the code beyond portability without straying into assembler (e.g., using compiler-specific, architecture-specific intrinsics). It is possible to achieve almost all of the performance gains without becoming arch-specific, and most of those can be attained without becoming compiler-specific as well.
- For full details on the SIMD instructions in aarch64, refer to the ARMv8 Instruction Set Overview, particularly section 5.7.
Week 6 Deliverables
- Complete your analysis of your two selected software projects (if you haven't already) - see Week 5. Blog in detail about your findings.
- Identify the upstream communities that develop and maintain the software you have selected to work on. Figure out how they are structured, how they communicate, how code is maintained, and how patches are accepted. Introduce yourself to each of the two communities (one for each of the two software projects you have selected). Blog about your findings.
- Project Work
Tuesday (March 11)
- Status updates
- Update from Linaro Connect
- Discussion of useful tools
Friday (March 14)
- Comparison of Emulation
- Fast Model and Foundation Model
- Install and configure the Foundation Model
- Baseline Benchmarking
- Foundation Model
- ARM Fast Models - Note that "fast" here refers to the modelling approach, not execution speed!
Week 9 Deliverables
- Set up the Foundation Model
- Upstream your proposed code changes
- Blog about your work
Tuesday (March 18)
- Profiling with
- Build with profiling enabled (
- Run the profile-enabled executable
- Analyze the data in the
gprof nameOfBinary# Displays text profile including call graph
gprof nameOfBinary | gprof2dot | dot | display -# Displays visualization of call graph
- Build with profiling enabled (
Friday (March 21)
- Gather baseline statistics for your software
Week 10 Deliverables
- Blog your baseline benchmark results