Differences

This shows you the differences between two versions of the page.

--- software:lapack:caviness [2020-03-05 14:24] – [Compiling with intel and mkl library] anita
+++ software:lapack:caviness [2021-04-27 16:21] (current) – external edit 127.0.0.1
@@ Line 1: / Line 1: @@
+====== Compiling and testing LAPACK on Caviness ======
+The NAG sites has a [[http://www.nag.com/lapack-ex/|collection of examples]] to test LAPACK drivers.  A driver routine will call the necessary lower level routines to solve one particular problem.  For example, a real linear least square problem is solved by the ''dgels'' driver.   Driver routines may not be all LAPACK libraries, but you can download drivers from  [[http://www.netlib.org/lapack/lug/node25.html|netlib driver rouines]].  The source of the driver routine is useful for learning how to use the lower level routines.
+===== Getting the example files =====
+Each example in the NAG collection has a source file, a input file, and a output result file, which should match your result.
+  * ''dels-ex.f''  - The Fortran 77 source file
+  * ''ddels-ex.d''  - The input data file to be read from unit 5 (standard input)
+  * ''dgels-ex.r''  - Should match the output on unit 6 (standard output)
+You can use **''wget''** to get these with the script:
+<code bash>
+if [ ! -f "dgels-ex.f" ]; then
+  wget http://www.nag.com/lapack-ex/examples/source/dgels-ex.f
+  wget http://www.nag.com/lapack-ex/examples/data/dgels-ex.d
+  wget http://www.nag.com/lapack-ex/examples/results/dgels-ex.r
+else
+  touch "dgels-ex.f"
+fi
+</code>
+<note tip>You can just type the three **''wget''** commands in your terminal window, but it is a good idea to save them in a file for later reference.  In this case, you should enclose them in a conditional **''if''** statement to avoid downloading a file you
+already have.</note>
+===== Compiling with Intel and MKL library =====
+The [[https://software.intel.com/en-us/parallel-studio-xe/|Intel Parallel Studio XE]] comes installed with a Fortran compiler with the MKL library.  Use VALET, **''vpkg_versions intel''**. to find the latest version installed on Caviness.
+<note tip>**Versions:**
+You can get the Package name and infer the update number from VALET, but you may also need the version of the compiler and the version of the LAPACK interfaces supported in the MKL component of the package.  See the [[https://software.intel.com/en-us/documentation| Intel Dcoumentation]] site and look at complete specifications for [[https://software.intel.com/en-us/articles/intel-parallel-studio-xe-release-notes-and-new-features|release notes and features]].
+   Update 2 - February 2014
+     Intel Fortran Compiler updated to 14.0.2
+     Intel Math Kernal Library updated to 11.1 Update 2
+and the details on the main product page for MKL 11.1,
+   LAPACK 3.4.1 interfaces and enhancements
+</note>
+==== VALET and ifort ====
+Assuming you have the **''dgels-ex.f''** Fortran 77 source file, use the VALET and the appropriate compile commands to
+compile the source file to an executable that links with the MKL library. Remember VALET will choose the default version of the Intel Compiler Suite, if you do not specify a version.
+<code>
+workgroup -g <<investing-entity>>
+vpkg_devrequire intel
+ifort -mkl dgels-ex.f -o dgels-ex
+</code>
+The ''**ifort**'' compiler has an ''**-mkl**'' optimization flag, and from the man page or ''**ifort --help**''
+<code>
+   -mkl[=<arg>]
+          link to the Intel(R) Math Kernel Library (Intel(R) MKL) and bring
+          in the associated headers
+            parallel   - link using the threaded Intel(R) MKL libraries. This
+                         is the default when -mkl is specified
+            sequential - link using the non-threaded Intel(R) MKL libraries
+            cluster    - link using the Intel(R) MKL Cluster libraries plus
+                         the sequential Intel(R) MKL libraries
+</code>
+<note tip>**Using make**: This is a simple compiler command, but you may want to get prepared for more complicated projects, with multiple source files, libraries and compiler flags.  Here are the commands to run the same compile command (using [[https://www.gnu.org/software/make/manual/html_node/Implicit-Rules.html#Implicit-Rules|make's implicit rules]])
+<code>
+vpkg_devrequire intel
+export FC=ifort
+export FFLAGS=-mkl
+make dgels-ex
+</code>
+</note>
+==== sbatch file to test ====
+The ''ifort'' compiler with flag ''-mkl'' will compile and link to the threaded MKL libraries.  Thus you should test in the threaded parallel environment, and export the number of slots to the ''MKL_NUM_THREAD'' environment variable. Remember to use our templates for threaded jobs which can be found in ''/opt/shared/templates/slurm/generic/threads.qs'' as a starting point. Here is a simple ''test.qs'' based on the ''threads.qs'' template.
+<file bash test.qs>
+#!/bin/bash -l
+#
+# Sections of this script that can/should be edited are delimited by a
+# [EDIT] tag.  All Slurm job options are denoted by a line that starts
+# with "#SBATCH " followed by flags that would otherwise be passed on
+# the command line.  Slurm job options can easily be disabled in a
+# script by inserting a space in the prefix, e.g. "# SLURM " and
+# reenabled by deleting that space.
+#
+# This is a batch job template for a program using multiple processor
+# cores/threads on a single node.  This includes programs with OpenMP
+# parallelism or explicit threading via the pthreads library.
+#
+# Do not alter the --nodes/--ntasks options!
+#SBATCH --nodes=1
+#SBATCH --ntasks=1
+#
+# [EDIT] Indicate the number of processor cores/threads to be used
+#        by the job:
+#
+#SBATCH --cpus-per-task=4
+#
+# [EDIT] All jobs have memory limits imposed.  The default is 1 GB per
+#        CPU allocated to the job.  The default can be overridden either
+#        with a per-node value (--mem) or a per-CPU value (--mem-per-cpu)
+#        with unitless values in MB and the suffixes K|M|G|T denoting
+#        kibi, mebi, gibi, and tebibyte units.  Delete the space between
+#        the "#" and the word SBATCH to enable one of them:
+#
+# SBATCH --mem=8G
+# SBATCH --mem-per-cpu=1024M
+#
+# .... more options not used ....
+#
+# [EDIT] It can be helpful to provide a descriptive (terse) name for
+#        the job (be sure to use quotes if there's whitespace in the
+#        name):
+#
+#SBATCH --job-name=dgels-ex
+#
+# [EDIT] The partition determines which nodes can be used and with what
+#        maximum runtime limits, etc.  Partition limits can be displayed
+#        with the "sinfo --summarize" command.
+#
+# SBATCH --partition=standard
+#
+#        To run with priority-access to resources owned by your workgroup,
+#        use the "_workgroup_" partition:
+#
+#SBATCH --partition=_workgroup_
+#
+# [EDIT] The maximum runtime for the job; a single integer is interpreted
+#        as a number of minutes, otherwise use the format
+#
+#          d-hh:mm:ss
+#
+#        Jobs default to the default runtime limit of the chosen partition
+#        if this option is omitted.
+#
+#SBATCH --time=0-02:00:00
+#
+#        You can also provide a minimum acceptable runtime so the scheduler
+#        may be able to run your job sooner.  If you do not provide a
+#        value, it will be set to match the maximum runtime limit (discussed
+#        above).
+#
+# SBATCH --time-min=0-01:00:00
+#
+# .... more options not used ....
+#
+# Do standard OpenMP environment setup:
+#
+. /opt/shared/slurm/templates/libexec/openmp.sh
+#
+# [EDIT] Execute your OpenMP/threaded program using the srun command:
+#
+echo "--- Set environment ---"
+vpkg_require intel
+echo ""
+echo "--- Run Test with $SLURM_CPUS_PER_TASK threads ---"
+export MKL_NUM_THREADS=$SLURM_CPUS_PER_TASK
+time ./$SLURM_JOB_NAME < $SLURM_JOB_NAME.d
+echo ""
+echo "--- Compare Results ---"
+cat $SLURM_JOB_NAME.r
+</file>
+==== Test result output ====
+<code>
+[traine@login01 nagex]$ workgroup -g it_css
+[(it_css:traine)@login01 nagex]$ sbatch test.qs
+Submitted batch job 6718859
+[(it_css:traine)@login01 nagex]$ more slurm-6718859.out
+-- OpenMP job setup complete:
+--  OMP_THREAD_LIMIT     = 4
+--  OMP_PROC_BIND        = true
+--  OMP_PLACES           = cores
+--  MP_BLIST             = 0,1,2,3
+--- Set environment ---
+Adding package `intel/2018u4` to your environment
+--- Run Test with 4 threads ---
+ DGELS Example Program Results
+ Least squares solution
+.5339     1.8707    -1.5241     0.0392
+ Square root of the residual sum of squares
+.22E-02
+real    0m1.043s
+user    0m0.007s
+sys     0m0.049s
+--- Compare Results ---
+ DGELS Example Program Results
+ Least squares solution
+.5339     1.8707    -1.5241     0.0392
+ Square root of the residual sum of squares
+.22E-02
+</code>
+<note important>Sub-second timing results are not reliable.  This test is not a benchmark and was meant to show that
+you can compile an link a program to read the data file, call the LAPACK routine ''**dgels**'', and reproduce the correct results.</note>
+==== Sequential vs parallel ====
+This example used the default parallel MKL libraries.  The LAPACK library is a collection of routines, which parallelize nicely (for large problems), and MKL is an optimized multi-threaded library.  For large probrems you get the best performance with the default.  However, there are three important considerations when using MKL.
+  * Programs with small arrays will not benefit from the multi-threaded library, and may suffer a bit from the system overhead of maintaining multiple threads.
+  * Sequential programs are better suited for running simultaneous instances.  You could run ''n'' copies of the program on the same node, where ''n'' is the number of cores on that node, with better throughput when you compile them to be sequential.  (Too many threads on the same node will contend for limited resources)
+  * You may be able to take control of the parallelism in your program with OPENMP compiler directions.  This is easiest if you using the single threaded MKL in your parallel regions. See [[https://software.intel.com/en-us/articles/recommended-settings-for-calling-intelr-mkl-routines-from-multi-threaded-applications|recommended settings for calling intel MKL routines from multi threaded applications]].