Differences
This shows you the differences between two versions of the page.
software:scalapack:mills [2017-10-23 16:48] – created sraskar | software:scalapack:mills [2021-04-27 16:21] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== ScaLAPACK example1 ====== | ||
+ | |||
+ | Chapter two of the ScaLAPACK user guide has a section [[http:// | ||
+ | |||
+ | Steps: | ||
+ | |||
+ | - **get** - Download the '' | ||
+ | - **set** - Choose the compiler/ | ||
+ | - **compile** - Compile and link from the source file | ||
+ | - **submit** - Copy an openmpi template file and submit the job to run on a compute Node | ||
+ | |||
+ | ===== Get ====== | ||
+ | |||
+ | The ScaLAPACE user guide example is fully contained in one Fortran file, '' | ||
+ | |||
+ | The following bash source file will set four variables, download the source file (if needed), and print information to confirm that the '' | ||
+ | <file bash get> | ||
+ | base=" | ||
+ | file=" | ||
+ | let NPROC_ROWS=2 | ||
+ | let NPROC_COLS=3 | ||
+ | if [ ! -f $file ]; then | ||
+ | wget http:// | ||
+ | fi | ||
+ | grep DATA $file | ||
+ | echo "NPROW = $NPROC_ROWS NPCOL = $NPROC_COLS" | ||
+ | </ | ||
+ | |||
+ | This file must be sourced, so the variable values will be available in the next step. | ||
+ | <code text> | ||
+ | [dnairn@mills example1]$ . get | ||
+ | DATA NPROW / 2 / , NPCOL / 3 / | ||
+ | NPROW = 2 NPCOL = 3 | ||
+ | </ | ||
+ | ===== Set ====== | ||
+ | |||
+ | There are several components in the ScaLAPACK library: | ||
+ | - **BLACS** - Basic Linear Algebra Communication Subprograms | ||
+ | - **PBLAS** - Parallel BLAS | ||
+ | - **LAPACK** - Linear Algebra PACKage | ||
+ | - **BLAS** - Basic Linear Algebra Subprograms | ||
+ | |||
+ | BLACS and PBLAS use a distributed memory model of computing and must run in batch using MPI. Here we use '' | ||
+ | LAPACK is single threaded, but can get performance gains through optimized versions of BLAS. On Mills there are two ways to boost BLAS performance: | ||
+ | - **FMA4** - use the Fused Multiply and Add hardware vector instruction, | ||
+ | - **openMP** - use shared memory and multiple threads on multiple cores. | ||
+ | There is no one best way for all programs. | ||
+ | |||
+ | There are three VALET versions of ScaLAPACK, each built with different compilers. | ||
+ | |||
+ | In additions to the VALET packages, | ||
+ | |||
+ | ==== Set for gcc ==== | ||
+ | <file bash set-gcc> | ||
+ | name=gcc | ||
+ | packages=scalapack | ||
+ | libs=" | ||
+ | f90flags=' | ||
+ | </ | ||
+ | |||
+ | ==== Set for gcc using atlas ==== | ||
+ | |||
+ | Order of the packages is important, since the atlas library directory has and incomplete version of LAPACK. | ||
+ | <file bash set-gcc_atlas> | ||
+ | name=gcc_atlas | ||
+ | packages=' | ||
+ | libs=" | ||
+ | f90flags=' | ||
+ | </ | ||
+ | |||
+ | ==== Set for gcc using ACML ==== | ||
+ | <file bash set-gcc_acml> | ||
+ | name=gcc_acml | ||
+ | packages=' | ||
+ | libs=" | ||
+ | f90flags=' | ||
+ | </ | ||
+ | |||
+ | ==== Set for intel using ACML ==== | ||
+ | <file bash set-intel_acml> | ||
+ | me=intel_acml | ||
+ | packages=scalapack/ | ||
+ | libs=" | ||
+ | f90flags=" | ||
+ | </ | ||
+ | |||
+ | ==== Set for intel using MKL ==== | ||
+ | <file bash set-intel_mkl> | ||
+ | name=intel_mkl | ||
+ | packages=openmpi/ | ||
+ | libs=" | ||
+ | f90flags=" | ||
+ | </ | ||
+ | |||
+ | ==== Set for PGI using ACML ==== | ||
+ | <file bash set-pgi_acml> | ||
+ | name=pgi_acml | ||
+ | packages=scalapack/ | ||
+ | libs=" | ||
+ | f90flags=' | ||
+ | </ | ||
+ | |||
+ | ===== Compile ====== | ||
+ | |||
+ | This source file will use the shell variables set in the previous steps and add the VALET package before the | ||
+ | compile command. | ||
+ | <file bash compile> | ||
+ | # $packages set to VALET packages | ||
+ | # $libs set to libraries | ||
+ | # $f90flags set to compiler flags | ||
+ | # $file set to source file name | ||
+ | # $base set to source file name base | ||
+ | vpkg_rollback all | ||
+ | vpkg_devrequire $packages | ||
+ | |||
+ | mpif90 $f90flags -show $file $LDFLAGS $libs | ||
+ | mpif90 $f90flags -o $base $file $LDFLAGS $libs | ||
+ | </ | ||
+ | |||
+ | ===== Submit ====== | ||
+ | <file bash submit> | ||
+ | # $name set to name of job | ||
+ | # $packages set to VALET packages | ||
+ | # $NPROC_ROWS number of row processors | ||
+ | # $NPROC_COLS number of column processors | ||
+ | # $base set to base file name | ||
+ | let NPROC=$NPROC_ROWS*$NPROC_COLS | ||
+ | if [ ! -f " | ||
+ | sed " | ||
+ | / | ||
+ | echo "new copy of template in template.qs" | ||
+ | fi | ||
+ | sed " | ||
+ | |||
+ | qsub $@ -N tst$name -l standby=1 -l h_rt=04: | ||
+ | </ | ||
+ | |||
+ | ===== Source file ====== | ||
+ | |||
+ | To submit all jobs and wait before submitting the next: | ||
+ | <file bash sourceMe> | ||
+ | date " | ||
+ | . get | ||
+ | . set-gcc;. compile&& | ||
+ | . set-gcc_atlas; | ||
+ | . set-pgi_acml; | ||
+ | . set-intel_mkl; | ||
+ | . set-gcc_acml; | ||
+ | . set-intel_acml; | ||
+ | date "+end (%s) %c" | ||
+ | </ | ||
+ | |||
+ | |||
+ | ====== ScaLAPACK example2 ====== | ||
+ | |||
+ | The appendix of the ScaLAPACK user guide [[http:// | ||
+ | |||
+ | Example 2 is distributed as a compressed tar file. The expanded directory will contain file a '' | ||
+ | |||
+ | | TESTINGdir | Directory that will contain the executable and data files | | ||
+ | | F77LOADER | ||
+ | | F77LOADERFLAGS | Flags for the loader to find libraries, such as -L| | ||
+ | | LIBS | All the required libraries, such as -l | | ||
+ | | F77 | Fortran compiler | | ||
+ | | F77FLAGS | Fortran compiler flags | | ||
+ | | CC | C compiler | | ||
+ | | CCFLAGS | C compiler flags | | ||
+ | | CDEFS | Other C flags, such as -D or -I | | ||
+ | |||
+ | None of these values are set, but the very first line of '' | ||
+ | <code make> | ||
+ | include ../ | ||
+ | </ | ||
+ | This is the place to assign all the above variables to values that work on Mills. | ||
+ | |||
+ | ===== Include file ===== | ||
+ | |||
+ | Here is a sample make include file that assumss you have choosen the library and openmpi version with VALET. | ||
+ | |||
+ | <code make> | ||
+ | TESTINGdir = ../tst-gcc | ||
+ | F77LOADER = mpif90 | ||
+ | F77LOADFLAGS = $(LDFLAGS) | ||
+ | LIBS = -lscalapack -llapack -lblas | ||
+ | MAKE = / | ||
+ | F77 = mpif90 | ||
+ | F77FLAGS = -fast | ||
+ | CC = mpicc | ||
+ | CDEFS = $(CPPFLAGS) | ||
+ | </ | ||
+ | |||
+ | ===== Make session ===== | ||
+ | |||
+ | To use the make file with the make include file, you use a VALET '' | ||
+ | |||
+ | Sample session: | ||
+ | |||
+ | <code text> | ||
+ | [(it_css: | ||
+ | Adding dependency `openmpi/ | ||
+ | Adding package `scalapack/ | ||
+ | </ | ||
+ | <code text> | ||
+ | [(it_css: | ||
+ | </ | ||
+ | <code text> | ||
+ | [(it_css: | ||
+ | mpif90 -L/ | ||
+ | / | ||
+ | make[1]: Entering directory `/ | ||
+ | cp SCAEX.dat ../tst-gcc | ||
+ | make[1]: Leaving directory `/ | ||
+ | / | ||
+ | make[1]: Entering directory `/ | ||
+ | cp SCAEXMAT.dat ../tst-gcc | ||
+ | make[1]: Leaving directory `/ | ||
+ | / | ||
+ | make[1]: Entering directory `/ | ||
+ | cp SCAEXRHS.dat ../tst-gcc | ||
+ | make[1]: Leaving directory `/ | ||
+ | [(it_css: | ||
+ | |||
+ | </ | ||
+ | |||
+ | ====== Scalapack linsolve benchmark ====== | ||
+ | |||
+ | ===== Fortran 90 source code ===== | ||
+ | |||
+ | We base this benchmark in the '' | ||
+ | |||
+ | We get the program with | ||
+ | <code bash> | ||
+ | if [ ! -f " | ||
+ | wget http:// | ||
+ | fi | ||
+ | </ | ||
+ | |||
+ | This program reads one line to start the benchmark. The input must contain 5 numbers: | ||
+ | * N: order of linear system | ||
+ | * NPROC_ROWS: number of rows in process grid | ||
+ | * NPROC_COLS: number of columns in process grid | ||
+ | * ROW_BLOCK_SIZE: | ||
+ | * COL_BLOCK_SIZE: | ||
+ | |||
+ | Where '' | ||
+ | |||
+ | For this benchmark we will set '' | ||
+ | <code bash> | ||
+ | let N=3000 | ||
+ | let ROW_BLOCK_SIZE=500 | ||
+ | let COL_BLOCK_SIZE=500 | ||
+ | let NPROC_ROWS=$N/ | ||
+ | let NPROC_COLS=$N/ | ||
+ | echo "$N $NPROC_ROWS $NPROC_ROWS $ROW_BLOCK_SIZE $COL_BLOCK_SIZE" | ||
+ | </ | ||
+ | |||
+ | <note tip> | ||
+ | To allow larger blocks you could extend the two MAX parameters in the '' | ||
+ | |||
+ | MAX_VECTOR_SIZE from 1000 to 2000 | ||
+ | MMAX_MATRIX_SIZE from 250000 to 1000000 | ||
+ | | ||
+ | To accommodate these larger sizes some of the FORMAT statements should have I4 instead of I2 and I3. | ||
+ | </ | ||
+ | |||
+ | ===== Compiling ===== | ||
+ | |||
+ | First set the variables | ||
+ | |||
+ | * $packages set to VALET packages | ||
+ | * $libs set to libraries | ||
+ | * $f90flags set to compiler flags | ||
+ | |||
+ | Since this test is completely contained in one Fortran 90 program you can compile with one compile, link and load with one command. | ||
+ | |||
+ | < | ||
+ | vpkg_rollback all | ||
+ | vpkg_devrequire $packages | ||
+ | |||
+ | mpif90 $f90flags -o solve linsolve.f90 $LDFLAGS $libs | ||
+ | </ | ||
+ | |||
+ | Some version of the '' | ||
+ | |||
+ | |||
+ | | ||
+ | |||
+ | ===== Grid engine script file ===== | ||
+ | |||
+ | You must run the '' | ||
+ | a script, which we will copy | ||
+ | from ''/ | ||
+ | |||
+ | * $MY_EXEC: '' | ||
+ | * NPROC: '' | ||
+ | * vpkg_require line includes the Valet packages for the benchmark. | ||
+ | |||
+ | For example, with the '' | ||
+ | <code bash> | ||
+ | let NPROC=$NPROC_ROWS*$NPROC_COLS | ||
+ | if [ ! -f " | ||
+ | sed -e ' | ||
+ | / | ||
+ | echo "new copy of template in template.qs" | ||
+ | fi | ||
+ | sed " | ||
+ | </ | ||
+ | |||
+ | The file '' | ||
+ | Also '' | ||
+ | |||
+ | <note tip> | ||
+ | There is only one executable, '' | ||
+ | </ | ||
+ | |||
+ | ===== Submitting ===== | ||
+ | |||
+ | There is only linear system solve, and it should take just a few seconds. | ||
+ | <code bash> | ||
+ | qsub -N $name$N -l standby=1 -l h_rt=04: | ||
+ | </ | ||
+ | |||
+ | ===== Tests ===== | ||
+ | |||
+ | ==== gcc ==== | ||
+ | |||
+ | <code bash> | ||
+ | name=gcc | ||
+ | packages=scalapack/ | ||
+ | libs=" | ||
+ | f90flags='' | ||
+ | </ | ||
+ | |||
+ | |||
+ | ==== gcc and atlas ==== | ||
+ | |||
+ | <code bash> | ||
+ | name=gcc_atlas | ||
+ | packages=' | ||
+ | libs=" | ||
+ | f90flags='' | ||
+ | </ | ||
+ | |||
+ | The documentation in ''/ | ||
+ | |||
+ | Also from the same documentation: | ||
+ | <code text> | ||
+ | ATLAS does not provide a full LAPACK library. | ||
+ | </ | ||
+ | |||
+ | This means the order the VALET packages are added is important. | ||
+ | |||
+ | But this may not be optimal: | ||
+ | < | ||
+ | Just linking in ATLAS' | ||
+ | performance, | ||
+ | of ATLAS' | ||
+ | </ | ||
+ | |||
+ | With these variables set and '' | ||
+ | <code text> | ||
+ | packages=' | ||
+ | </ | ||
+ | we get '' | ||
+ | |||
+ | <code text> | ||
+ | ... | ||
+ | / | ||
+ | xerbla.f: | ||
+ | ... | ||
+ | </ | ||
+ | |||
+ | Explanation: | ||
+ | |||
+ | <code bash> | ||
+ | find /usr -name libg2c.a | ||
+ | </ | ||
+ | <code text> | ||
+ | find: `/ | ||
+ | / | ||
+ | / | ||
+ | </ | ||
+ | To remove these errors, change: | ||
+ | <code bash> | ||
+ | libs=" | ||
+ | </ | ||
+ | |||
+ | New '' | ||
+ | <code text> | ||
+ | ... | ||
+ | | ||
+ | ... | ||
+ | </ | ||
+ | |||
+ | Explanation: | ||
+ | |||
+ | <code bash> | ||
+ | nm -g / | ||
+ | </ | ||
+ | <code bash> | ||
+ | nm -g / | ||
+ | </ | ||
+ | <code text> | ||
+ | U slarnv_ | ||
+ | 0000000000000000 T slarnv_ | ||
+ | U slarnv_ | ||
+ | U slarnv_ | ||
+ | </ | ||
+ | |||
+ | No output from first '' | ||
+ | |||
+ | You can copy the full atlas directory in your working direction and then follow the directions | ||
+ | in ''/ | ||
+ | <code text> | ||
+ | **** GETTING A FULL LAPACK LIB **** | ||
+ | </ | ||
+ | |||
+ | We call this library '' | ||
+ | |||
+ | ==== gcc and myatlas ==== | ||
+ | |||
+ | <code bash> | ||
+ | name=gcc_myatlas | ||
+ | packages=' | ||
+ | libs=" | ||
+ | f90flags='' | ||
+ | </ | ||
+ | |||
+ | This requires a copy of atlas in your directory, '' | ||
+ | You need to build your own copy of '' | ||
+ | |||
+ | Assuming | ||
+ | you do not have a '' | ||
+ | <code bash> | ||
+ | cp -a / | ||
+ | ar x lib/ | ||
+ | cp / | ||
+ | ar r lib/ | ||
+ | rm *.o | ||
+ | cp / | ||
+ | </ | ||
+ | |||
+ | Now you have a '' | ||
+ | |||
+ | ==== gcc and myptatlas ==== | ||
+ | |||
+ | <code bash> | ||
+ | name=gcc_myptatlas | ||
+ | packages=' | ||
+ | libs=" | ||
+ | f90flags='' | ||
+ | </ | ||
+ | |||
+ | Parallel threads will dynamically uses all the cores available at compile time (24), but only if problem size indicates they will help. | ||
+ | |||
+ | ==== pgi and acml ==== | ||
+ | |||
+ | <code bash> | ||
+ | name=pgi_acml | ||
+ | packages=scalapack/ | ||
+ | libs=" | ||
+ | f90flags='' | ||
+ | </ | ||
+ | |||
+ | ==== intel and mkl ==== | ||
+ | |||
+ | <code bash> | ||
+ | name=intel_mkl | ||
+ | packages=openmpi/ | ||
+ | libs=" | ||
+ | f90flags=" | ||
+ | </ | ||
+ | |||
+ | ===== Results N=4000===== | ||
+ | |||
+ | ==== BLOCK=1000, NPROCS=16 ==== | ||
+ | |||
+ | Each test is repeated three time. | ||
+ | ^ File name ^ Time ^ | ||
+ | | gcc4000.o91943 | Elapsed time = 0.613728D+05 milliseconds | | ||
+ | | gcc4000.o92019 | Elapsed time = 0.862935D+05 milliseconds | | ||
+ | | gcc4000.o92030 | Elapsed time = 0.826695D+05 milliseconds | | ||
+ | | gcc_atlas4000.o91945 | Elapsed time = 0.386161D+04 milliseconds | | ||
+ | | gcc_atlas4000.o92023 | Elapsed time = 0.433195D+04 milliseconds | | ||
+ | | gcc_atlas4000.o92035 | Elapsed time = 0.424980D+04 milliseconds | | ||
+ | | gcc_myatlas4000.o92009 | Elapsed time = 0.448106D+04 milliseconds | | ||
+ | | gcc_myatlas4000.o92026 | Elapsed time = 0.461706D+04 milliseconds | | ||
+ | | gcc_myatlas4000.o92032 | Elapsed time = 0.441593D+04 milliseconds | | ||
+ | | intel_mkl4000.o91611 | Elapsed time = 0.222194D+05 milliseconds | | ||
+ | | intel_mkl4000.o92016 | Elapsed time = 0.215223D+05 milliseconds | | ||
+ | | intel_mkl4000.o92039 | Elapsed time = 0.214088D+05 milliseconds | | ||
+ | | pgi_acml4000.o91466 | ||
+ | | pgi_acml4000.o92017 | ||
+ | | pgi_acml4000.o92040 | ||
+ | |||
+ | ==== BLOCK=800, NPROCS=25 ==== | ||
+ | |||
+ | Each test is repeated three time. | ||
+ | ^ File name ^ Time ^ | ||
+ | | gcc4000.o92335 | Elapsed time = 0.638246D+05 milliseconds | | ||
+ | | gcc4000.o92386 | Elapsed time = 0.633060D+05 milliseconds | | ||
+ | | gcc4000.o92412 | Elapsed time = 0.629561D+05 milliseconds | | ||
+ | | gcc_atlas4000.o92336 | Elapsed time = 0.314615D+04 milliseconds | | ||
+ | | gcc_atlas4000.o92389 | Elapsed time = 0.358208D+04 milliseconds | | ||
+ | | gcc_atlas4000.o92413 | Elapsed time = 0.334147D+04 milliseconds | | ||
+ | | gcc_myatlas4000.o92337 | Elapsed time = 0.363176D+04 milliseconds | | ||
+ | | gcc_myatlas4000.o92390 | Elapsed time = 0.306922D+04 milliseconds | | ||
+ | | gcc_myatlas4000.o92414 | Elapsed time = 0.333779D+04 milliseconds | | ||
+ | | intel_mkl4000.o92339 | Elapsed time = 0.433877D+05 milliseconds | | ||
+ | | intel_mkl4000.o92393 | Elapsed time = 0.400862D+05 milliseconds | | ||
+ | | intel_mkl4000.o92417 | Elapsed time = 0.409855D+05 milliseconds | | ||
+ | | pgi_acml4000.o92338 | Elapsed time = 0.234248D+04 milliseconds | | ||
+ | | pgi_acml4000.o92392 | Elapsed time = 0.276856D+04 milliseconds | | ||
+ | | pgi_acml4000.o92415 | Elapsed time = 0.211567D+04 milliseconds | | ||
+ | ==== BLOCK=500, NPROCS=64 ==== | ||
+ | |||
+ | Each test is repeated three time. | ||
+ | ^ File name ^ Time ^ | ||
+ | | gcc4000.o92123 | Elapsed time = 0.284893D+05 milliseconds | | ||
+ | | gcc4000.o92144 | Elapsed time = 0.278744D+05 milliseconds | | ||
+ | | gcc4000.o92150 | Elapsed time = 0.289137D+05 milliseconds | | ||
+ | | gcc_atlas4000.o92130 | Elapsed time = 0.296471D+04 milliseconds | | ||
+ | | gcc_atlas4000.o92142 | Elapsed time = 0.264463D+04 milliseconds | | ||
+ | | gcc_atlas4000.o92148 | Elapsed time = 0.269103D+04 milliseconds | | ||
+ | | gcc_myatlas4000.o92133 | Elapsed time = 0.280457D+04 milliseconds | | ||
+ | | gcc_myatlas4000.o92138 | Elapsed time = 0.312135D+04 milliseconds | | ||
+ | | gcc_myatlas4000.o92153 | Elapsed time = 0.286337D+04 milliseconds | | ||
+ | | intel_mkl4000.o92134 | Elapsed time = 0.436288D+05 milliseconds | | ||
+ | | intel_mkl4000.o92140 | Elapsed time = 0.413780D+05 milliseconds | | ||
+ | | intel_mkl4000.o92152 | Elapsed time = 0.401095D+05 milliseconds | | ||
+ | | pgi_acml4000.o92137 | Elapsed time = 0.234475D+04 milliseconds | | ||
+ | | pgi_acml4000.o92145 | Elapsed time = 0.214514D+04 milliseconds | | ||
+ | | pgi_acml4000.o92149 | Elapsed time = 0.293480D+04 milliseconds | | ||
+ | |||
+ | ==== BLOCK=250, NPROCS=256 ==== | ||
+ | |||
+ | Each test is repeated three time. | ||
+ | ^ File name ^ Time ^ | ||
+ | | gcc4000.o92164 | Elapsed time = 0.148302D+05 milliseconds | | ||
+ | | gcc4000.o92168 | Elapsed time = 0.144862D+05 milliseconds | | ||
+ | | gcc4000.o92317 | Elapsed time = 0.160144D+05 milliseconds | | ||
+ | | gcc_atlas4000.o92167 | Elapsed time = 0.785104D+04 milliseconds | | ||
+ | | gcc_atlas4000.o92171 | Elapsed time = 0.749285D+04 milliseconds | | ||
+ | | gcc_atlas4000.o92318 | Elapsed time = 0.798376D+04 milliseconds | | ||
+ | | gcc_myatlas4000.o92165 | Elapsed time = 0.797618D+04 milliseconds | | ||
+ | | gcc_myatlas4000.o92222 | Elapsed time = 0.792745D+04 milliseconds | | ||
+ | | gcc_myatlas4000.o92320 | Elapsed time = 0.720193D+04 milliseconds | | ||
+ | | intel_mkl4000.o92162 | Elapsed time = 0.636915D+05 milliseconds | | ||
+ | | intel_mkl4000.o92169 | Elapsed time = 0.733785D+05 milliseconds | | ||
+ | | intel_mkl4000.o92324 | Elapsed time = 0.653791D+05 milliseconds | | ||
+ | | pgi_acml4000.o92161 | Elapsed time = 0.740457D+04 milliseconds | | ||
+ | | pgi_acml4000.o92170 | Elapsed time = 0.733668D+04 milliseconds | | ||
+ | | pgi_acml4000.o92322 | Elapsed time = 0.769606D+04 milliseconds | | ||
+ | |||
+ | ===== Summary ===== | ||
+ | ==== 4000 x 4000 matrix ==== | ||
+ | === Time to solve linear system === | ||
+ | |||
+ | A randomly generated matrix is solved using ScaLAPACK with different block sizes. | ||
+ | The times are the average elapsed time in seconds, as reported by '' | ||
+ | ^ Test ^ N=4000 | ||
+ | ^ name ^ np=16 ^ np=25 ^ np=64 ^ np=256 ^ | ||
+ | | [[# | ||
+ | | [[# | ||
+ | | [[# | ||
+ | | [[# | ||
+ | | [[# | ||
+ | |||
+ | |||
+ | There is not much difference between '' | ||
+ | |||
+ | === Speedup === | ||
+ | |||
+ | The speedup for '' | ||
+ | |||
+ | {{: | ||
+ | |||
+ | |||
+ | ==== 16000 x 16000 matrix ==== | ||
+ | === Time to solve linear system === | ||
+ | |||
+ | A randomly generated matrix is solved using ScaLAPACK with different block sizes. | ||
+ | The times are the average elapsed time in seconds, as reported by '' | ||
+ | ^ Test ^ N=16000 | ||
+ | ^ name^ np=16^ np=64^ np=256 ^ | ||
+ | | [[# | ||
+ | | [[# | ||
+ | | [[# | ||
+ | | [[# | ||
+ | | [[# | ||
+ | | [[# | ||
+ | |||
+ | === Speedup === | ||
+ | |||
+ | Speedup for ATLAS, MKL and ACML compared to the reference GCC with no optimized library. | ||
+ | |||
+ | {{: | ||
+ | |||
+ | |||
+ | === Time plot === | ||
+ | |||
+ | Elapsed time for ATLAS, MKL and ACML. | ||
+ | |||
+ | {{: | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||