Dhrystone howto

From CDOT Wiki
Jump to: navigation, search

This Page serves as a guide for running Dhrystone benchmark on arm machines :: Please visit the Main Project Page

About Dhrystone

From Wikipedia, the free encyclopedia

Dhrystone is a synthetic computing benchmark program developed in 1984 by Reinhold P. Weicker intended to be representative of system (integer) programming. The Dhrystone grew to become representative of general processor (CPU) performance. Later the CPU89 benchmark suite from the Standard Performance Evaluation Corporation, today known as the "SPECint" suite was introduced, but SPEC programs are quite expensive whereas Dhrystone is free, therefore Dhrystone remains popular.

The name "Dhrystone" is a pun on a different benchmark algorithm called Whetstone.

With Dhrystone, Weicker gathered meta-data from a broad range of software, including programs written in FORTRAN, PL/1, SAL, ALGOL 68, and Pascal. He then characterized these programs in terms of various common constructs: procedure calls, pointer indirections, assignments, etc. From this he wrote the Dhrystone benchmark to correspond to a representative mix. Dhrystone was published in Ada, with the C version for Unix developed by Rick Richardson ("version 1.1") greatly contributing to its popularity.

The Dhrystone benchmark contains no floating point operations, thus the name is a pun on the then-popular Whetstone benchmark for floating point operations. The output from the benchmark is the number of Dhrystones per second (the number of iterations of the main code loop per second).

Both Whetstone and Dhrystone are synthetic benchmarks, meaning that they are simple programs that are carefully designed to statistically mimic the processor usage of some common set of programs. Whetstone, developed in 1972, originally strove to mimic typical Algol 60 programs based on measurements from 1970, but eventually became most popular in its Fortran version, reflecting the highly numerical orientation of computing in the 1960s.

Dhrystone Fundamentals

From ARM White Paper

Dhrystone has a number of attributes that have led to it being widely used in the past as a measure of CPU performance. Foremost, Dhrystone is compact, widely available in the public domain, and simple to run. Significantly, there are no lengthy certification processes to go through before citing Dhrystone figures. Dhrystone compares the performance of the processor under benchmark to that of a reference machine. This is an advantage over quoting ‘straight’ MIPS numbers since using a reference machine effectively compensates for differences in the richness of competing instruction sets. For example, literal comparison of the ‘millions of instructions per second’ numbers for a RISC architecture and a CISC architecture is not meaningful.

The industry has adopted the VAX 11/780 as the reference 1 MIP machine. The VAX 11/780 achieves 1757 Dhrystones per second. The Dhrystone figure is calculated by measuring the number of Dhrystones per second for the system, and dividing that by 1757. So "80 MIPS" means "80 Dhrystone VAX MIPS", which means 80 times faster than a VAX 11/780. A DMIPS/MHz rating takes this normalization process one step further, enabling comparison of processor performance at different clock rates. For all of these reasons, in the past, Dhrystone has been a widely quoted benchmark figure. In theory, Dhrystone should provide a basis for the comparison of processor performances.

However, some of the apparent advantages of Dhrystone are also significant weaknesses of the benchmark. Dhrystone numbers actually reflect the performance of the C compiler and libraries, probably more so than the performance of the processor itself. Also, lack of independent certification means that customers are dependent on processor vendors to quote accurate and meaningful Dhrystone data.

What Dhrystone really does

From Clarify.doc (Included in Dhrystone 2.1), Rick Richardson

  • DHRYSTONE is a measure of processor+compiler efficiency in executing a 'typical' program. The 'typical' program was designed by measuring statistics on a great number of 'real' programs. The 'typical' program was then written by Reinhold P. Weicker using these statistics. The program is balanced according to statement type, as well as data type.
  • DHRYSTONE does not use floating point. Typical programs don't.
  • DHRYSTONE does not do I/O. Typical programs do, but then we'd have a whole can of worms opened up.
  • DHRYSTONE does not contain much code that can be optimized by vector processors. That is why a CRAY doesn't look real fast, they weren't built to do this sort of computing.
  • DHRYSTONE does not measure OS performance, as it avoids calling the O.S. The O.S. is indicated in the results only to help in identifying the compiler technology.
  • DHRYSTONE is not perfect, but is a hell of a lot better than the "sieve", or "SI".
  • DHRYSTONE gives results in dhrystones/second. Bigger numbers are better. As a baseline, the original IBM PC gives around 300-400 dhrystones/second with a good compiler. The fastest machines today are approaching 100,000.

Dhrystone Characteristics

Strengths

  • Written in C language Code (Allows code portability)
  • Small in size (An easy to understand program)
  • Single easy to report score (DMIPS which uses a reference VAX MIPS)
  • Potentially useful for 8 and 16-bit microcontroller benchmark

Weaknesses

  • Cannot hope to mimic the breadth of applications encountered by a processor-based system
  • Dhrystone only measures a few mathematical and basic operations
  • Does not measure multiply- accumulate, floating-point, SIMD, or any other type of operations
  • Dhrystone’s execution is largely spent in standard C library functions, such as strcmp(),strcpy(), and memcpy(). Compiler vendors generally provide these libraries that are typically optimized and hand-written in assembly language. While you may think you are benchmarking a processor, you are really benchmarking are the compiler writer’s optimizations of the C library functions for a particular platform

Installation

1. Obtaining the Source Code

One of the most important defects in Dhrystone is that it is often unclear what version is being quoted. Furthermore, since there are no "disclosure rules" or independent certification of scores, companies and individuals are free to state, or not state, anything. Due to its non proprietary nature, individuals and companies modified their own versions of Dhrystone resulting in various alterations of the original source code.

The following package is the most quoted, well used Dhrystone release. It is the cleanest/customisable Dhrystone out in the internet.

Dhrystone-2.1.tar.gz

2. Extract the file

Extract the tarball using the command:

tar xvf dhrystone-2.1.tar.gz -C destination_directory/

There will be a total of 19 files once extracted. Move to the directory where the extracted files are.

3. Edit the Makefile

Open Makefile with any text editor; UNCOMMENT (if commented) then EDIT the following fields using the GIVEN values:

Line #25 Fedora uses -DTIME for TIME function, this field is commented out by default

TIME_FUNC=     -DTIME                # Use times(2) for measurement

Line #28 Check motherboard specifications to determine the memory clock speed ( beagleboardXM runs at 166MHz DDR speed )

HZ=             166                  # Frequency of times(2) clock ticks

Line #39 This option is for C compiler

OPTIMIZE=       -O2 -march=armv7-a -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp -fomit-frame-pointer # Optimization Level (generic UNIX)

Line #40 This option is for GCC compiler

GCCOPTIM=       -O2 -march=armv7-a -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp -fomit-frame-pointer

Comment out/disable the following lines:

Line #26

TIME_FUNC=     -DTIMES                # Use times(2) for measurement

Line #38

OPTIMIZE=      -Ox -G2                 # Optimization Level (MSC, 80286)

Makefile snapshot

Dhry21.png


Compiler Optimization Options

Please see more about GCC ARM-Options

The options used for lines #39~40 are for optimizing the dhrystone install to run specifically with armv7 architecture. Optimizations provide a performance boost for the program. Removing the optimizations would result in a nominal program performance.

4. Run "make"

Running make in the current directory should only produce warnings!! Here is an output of the make command with warnings relating to c library functions that can be ignored.

[mjeamiguel@cdot-beagleXM-0-3 dhrystone]$ make
gcc -O2 -march=armv7-a -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp -fomit-frame-pointer  -DTIME -DHZ=166			    
dhry_1.c dhry_2.c  -o gcc_dry2
dhry_1.c:31: warning: conflicting types for built-in function ‘malloc’
dhry_1.c: In function ‘main’:
dhry_1.c:98: warning: incompatible implicit declaration of built-in function ‘strcpy’
gcc -O2 -march=armv7-a -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp -fomit-frame-pointer  -DTIME -DHZ=166			    
-DREG=register dhry_1.c dhry_2.c  -o gcc_dry2reg
dhry_1.c:31: warning: conflicting types for built-in function ‘malloc’
dhry_1.c: In function ‘main’:
dhry_1.c:98: warning: incompatible implicit declaration of built-in function ‘strcpy’

Running the benchmark and gathering results

The make command outputs 2 files named gcc_dry2 and gcc_dry2reg. The author of this version decided to create 2 dhrystone executables . One with register variables, and one without. Either one will work for the benchmark so, feel free to test it out.


1. Run the executable by typing ./gcc_dry2

The program will start to ask you for a number of runs. By convention on newer machines (>=1GHz), the number of runs used is about ~100000000 (100 million). There are no rules or standards about how many runs it should be. Some people calculate the number of runs through "Dhrystone run time erros" (way too advanced); what matters is the consistency of the result. For consistent results, dhrystone is executed more than 5 times with same values for number of runs.


2. Calculate for DMIPS

One common representation of the Dhrystone benchmark is DMIPS. DMIPS (Dhrystone MIPS ). It is obtained when the Dhrystone score is divided by 1757 (the number of Dhrystones per second obtained on the VAX 11/780, nominally a 1 MIPS machine).

Given the result:

Microseconds for one run through Dhrystone: 0.8
Dhrystones per Second: 1333333.4

Using the formula:

1333333.4 / 1757 = 758.87 DMIPS


The result shown was an actual test for a beagleboardXM machine with 1GHz of processor speed.

And...

From ARM White paper

"When first released, the Dhrystone benchmark fulfilled a useful function – at least it gave an alternative indicator to vendors’ literal MIPS ratings. However, more than twenty years later, there are undoubtedly better benchmarks available for measuring processor performance."