Details on
the HPL Linpack benchmark run on the intel12 (12 cores/node, total 25
nodes or 300 cores) nodes
The HPL Linpack
benchmark version was 2.0. It was compiled
against OpenMPI version 1.4.2 and linked with GotoBLAS2 library.
The benchmark
was run using the OpenMPI 1.4.4rc2 mpirun
launcher since this version has the --loadbalance
option which
provides uniform distribution of ranks across all nodes. The best Linpack
performance recorded was 2.37 TFlop/second.
The
benchmark was launched from the job head node as follows
$>
/usr/local/openmpi/bin/mpiexec -np 312 --loadbalance ./xhpl
Contents
of
HPL.dat
file
follow
HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out output file name (if any)
6
device out (6=stdout,7=stderr,file)
3 #
of problems sizes (N)
10000 50000 173056 Ns
3 #
of NBs
90 112 128
NBs
0
PMAP process mapping (0=Row-,1=Column-major)
1 #
of process grids (P x Q)
13 Ps
24 Qs
16.0 threshold
1 #
of panel fact
2
PFACTs (0=left, 1=Crout, 2=Right)
1 #
of recursive stopping criterium
4
NBMINs (>= 1)
1 #
of panels in recursion
2
NDIVs
1 #
of recursive panel fact.
1
RFACTs (0=left, 1=Crout, 2=Right)
1 #
of broadcast
1
BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1 #
of lookahead depth
1
DEPTHs (>=0)
2
SWAP (0=bin-exch,1=long,2=mix)
64 swapping
threshold
0 L1
in (0=transposed,1=no-transposed) form
0
U in (0=transposed,1=no-transposed) form
1
Equilibration (0=no,1=yes)
8
memory alignment in double (> 0)
##### This line (no. 32) is ignored (it serves as a separator). ######
0
Number
of
additional
problem sizes for PTRANS
1200 10000
30000
values
of
N
0
number
of
additional
blocking sizes for PTRANS
40 9 8 13 13 20 16 32 64
values of NB
Contents
of
HPL.out
file
follow
================================================================================
HPLinpack 2.0 -- High-Performance Linpack benchmark
-- September 10, 2008
Written by A. Petitet and R. Clint Whaley, Innovative Computing
Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================
An explanation of the input/output parameters follows:
T/V : Wall time / encoded variant.
N : The order of the coefficient matrix A.
NB : The partitioning blocking factor.
P : The number of process rows.
Q : The number of process columns.
Time : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.
The following parameter values will be used:
N : 10000
50000 173056
NB :
90 112 128
PMAP : Row-major process mapping
P : 13
Q : 24
PFACT : Right
NBMIN : 4
NDIV : 2
RFACT : Crout
BCAST : 1ringM
DEPTH : 1
SWAP : Mix (threshold = 64)
L1 : transposed form
U : transposed form
EQUIL : yes
ALIGN : 8 double precision words
--------------------------------------------------------------------------------
- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
||Ax-b||_oo / ( eps * ( || x ||_oo * ||
A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to
be
1.110223e-16
- Computational tests pass if scaled residuals are less
than
16.0
================================================================================
T/V
N NB
P
Q
Time
Gflops
--------------------------------------------------------------------------------
WR11C2R4 10000
90 13
24
3.26
2.044e+02
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=
0.0035845 ...... PASSED
================================================================================
T/V
N NB
P
Q
Time
Gflops
--------------------------------------------------------------------------------
WR11C2R4 10000
112 13
24
2.60
2.565e+02
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=
0.0043536 ...... PASSED
================================================================================
T/V
N NB
P
Q
Time
Gflops
--------------------------------------------------------------------------------
WR11C2R4 10000
128 13
24
2.65
2.516e+02
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=
0.0040908 ...... PASSED
================================================================================
T/V
N NB
P
Q
Time
Gflops
--------------------------------------------------------------------------------
WR11C2R4 50000
90 13
24
63.64
1.310e+03
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=
0.0018813 ...... PASSED
================================================================================
T/V
N NB
P
Q
Time
Gflops
--------------------------------------------------------------------------------
WR11C2R4 50000
112 13
24
62.97
1.323e+03
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=
0.0023998 ...... PASSED
================================================================================
T/V
N NB
P
Q
Time
Gflops
--------------------------------------------------------------------------------
WR11C2R4 50000
128 13
24
62.78
1.327e+03
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=
0.0019518 ...... PASSED
================================================================================
T/V
N NB
P
Q
Time
Gflops
--------------------------------------------------------------------------------
WR11C2R4 173056
90 13
24
1486.97
2.324e+03
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=
0.0014705 ...... PASSED
================================================================================
T/V
N NB
P
Q
Time
Gflops
--------------------------------------------------------------------------------
WR11C2R4 173056
112 13
24
1457.29
2.371e+03
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=
0.0013976 ...... PASSED
================================================================================
T/V
N NB
P
Q
Time
Gflops
--------------------------------------------------------------------------------
WR11C2R4 173056
128 13
24
1482.51
2.331e+03
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=
0.0015291 ...... PASSED
================================================================================
Finished 9 tests with the following
results:
9 tests completed and passed residual checks,
0 tests completed and failed residual checks,
0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------
End of Tests.
================================================================================
Last Updated:
11/30/2011