I recently installed ATLAS on my new workstation and I’ve been wondering what kind of performance gain I would get compared to the reference BLAS. After some searching, I finally found a BLAS benchmarking tool called BLASbench, which is part of the LLCbench suite. This tool has a rather strange build process, so follow the instructions on the web page. You have to create a file called sys.def which contains build instructions for your system.
Here’s the sys.def file I used for my Gentoo system:
# Linux-mpich sys.def # Blasbench values BB_CC = gcc BB_F77 = gfortran BB_LD = gcc BB_CFLAGS = -O3 -Wall -DREGISTER -DINLINE BB_LDFLAGS = $(BB_CFLAGS) BB_LIBS = -lblas -lrt # Cachebench values CB_CC = $(BB_CC) CB_CFLAGS = -O -Wall CB_LDFLAGS = $(CB_CFLAGS) CB_LIBS = -lrt # MPbench values MP_MPI_CC = mpicc MP_CFLAGS = $(BB_CFLAGS) MP_LIBS = -lrt MPIRUNCMD = mpirun MPIRUNPROCS = -np MPIRUNPOSTOPTS = mpi_bench
Note that I haven’t messed with the cache or MPI benchmarking, so those values may be wrong. Initially, I kept getting this linking error when I tried to build it:
gcc -O3 -Wall -DREGISTER -DINLINE -o vblasbench bb.o flushall.o timer.o -lblas -lrt bb.o: In function `MAIN__': bb.c:(.text+0xe): undefined reference to `s_stop' collect2: ld returned 1 exit status make[1]: *** [vblasbench] Error 1
Most of the Google hits for “undefined reference to `s_stop'” talked about linking issues with g77 vs gfortran. Since this is an brand new installation that has never had g77, that couldn’t be my issue. I solved the problem by deleting lines 109-127 from the file blasbench/bb.c. The deleted lines are shown below:
#if defined(ia64) #else /* Entry points to fool fortran linkers...*/ #ifdef __linux__ int MAIN__() #endif #if defined(__hppa) || defined(_HPUX_SOURCE) int __main() #endif #if defined(__linux__) || defined(__hppa) { #if defined(__linux__) && defined(__GNUC__) /* Subroutine */ int s_stop(); s_stop("", 0L); #endif return(0); } #endif #endif
This looks like a hack that was added to work around issues with g77, but now it creates problems for modern compilers and linkers. After deleting that block of code, I was able to build the code successfully.
In Part 2, I’ll explain how to run BLASbench and compare the results I got for the reference BLAS implementation, BLAS-ATLAS and BLAS-ATLAS with threads.