Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
technical:recipes:tensorflow-in-virtualenv [2021-01-27 11:34] – [TensorFlow Python Virtual Environment] anitatechnical:recipes:tensorflow-in-virtualenv [2021-03-08 19:26] (current) – [VALET Package Definition] anita
Line 1: Line 1:
 +====== TensorFlow Python Virtual Environment ======
 +
 +This page documents the creation of a Python virtual environment (virtualenv) containing the TensorFlow software for machine learning on the Caviness HPC system((The steps should also work on the DARWIN HPC system, though with different package versions.)).  It assumes that the user is adding the software to the workgroup storage.
 +
 +===== Prepare Workgroup Directory =====
 +
 +Prepare to add software in the standard sub-directories of the workgroup storage:
 +
 +<code bash>
 +[user@login01 ~]$ workgroup -g my_workgroup
 +[(my_workgroup:user)@login01 ~]$ mkdir --mode=2775 --parent ${WORKDIR}/sw/tensorflow
 +[(my_workgroup:user)@login01 ~]$ mkdir --mode=2775 --parent ${WORKDIR}/sw/valet
 +</code>
 +
 +These commands create any missing directories.  All directories created will have group-write and -inherit permissions.
 +
 +===== Create TensorFlow Virtualenv =====
 +
 +The Intel Python distribution will form the basis for the Keras virtualenv, so add it to the environment:
 +
 +<code bash>
 +[(my_workgroup:user)@login01 ~]$ vpkg_require intel-python/2020u2:python3
 +Adding package `intel-python/2020u2:python3` to your environment
 +(base) [(my_workgroup:user)@login01 ~]$
 +</code>
 +
 +Notice the prompt changed:  the text ''(base)'' now prefixes it, indicating the directory that contains the active Python virtualenv.
 +
 +The ''conda search tensorflow'' command can be used to locate the specific version you wish to install.  Two examples are shown:  TensorFlow release at least 2.0 with GPU support; and an Intel-optimized version of TensorFlow 2.3.
 +
 +<code bash>
 +(base) [frey@login00 ~]$ conda search 'tensorflow>=2.0=gpu*'
 +Loading channels: done
 +# Name                       Version           Build  Channel             
 +tensorflow                     2.0.0 gpu_py27hb041a2f_0  pkgs/main           
 +tensorflow                     2.0.0 gpu_py36h6b29c10_0  pkgs/main           
 +tensorflow                     2.0.0 gpu_py37h768510d_0  pkgs/main          
 +tensorflow                     2.1.0 gpu_py27h9cdf9a9_0  pkgs/main           
 +tensorflow                     2.1.0 gpu_py36h2e5cdaa_0  pkgs/main           
 +tensorflow                     2.1.0 gpu_py37h7a4bb67_0  pkgs/main           
 +tensorflow                     2.2.0 gpu_py36hf933387_0  pkgs/main           
 +tensorflow                     2.2.0 gpu_py37h1a511ff_0  pkgs/main           
 +tensorflow                     2.2.0 gpu_py38hb782248_0  pkgs/main 
 +
 +(base) [frey@login00 ~]$ conda search 'tensorflow[version=2.3,channel=intel]'
 +Loading channels: done
 +# Name                       Version           Build  Channel             
 +tensorflow                     2.3.0          py36_0  intel               
 +tensorflow                     2.3.0          py37_0  intel               
 +tensorflow                     2.3.0          py38_0  intel 
 +</code>
 +
 +All versions of the TensorFlow virtualenv will be stored in the common base directory, ''$WORKDIR/sw/tensorflow''; each virtualenv must have a unique name that will become the VALET version of TensorFlow.  In this tutorial, the latest version of TensorFlow (with GPU support) is version 2.2.0, but the newest non-GPU version available with Python 3.8 is 2.3.0.  An appropriate version for the former would be ''2.2.0:gpu'' and the latter ''2.3.0:intel,python3.8'' Those versions can be translated to VALET-friendly directory names:
 +
 +<code bash>
 +[(my_workgroup:user)@login01 ~]$ vpkg_id2path --version-id=2.2.0:gpu
 +2.2.0-gpu
 +[(my_workgroup:user)@login01 ~]$ mkdir --mode=3750 ${WORKDIR}/sw/tensorflow/2.2.0-gpu
 +
 +[(my_workgroup:user)@login01 ~]$ vpkg_id2path --version-id=2.3.0:intel,python3.8
 +2.3.0-intel-python3.8
 +[(my_workgroup:user)@login01 ~]$ mkdir --mode=3750 ${WORKDIR}/sw/tensorflow/2.3.0-intel-python3.8
 +</code>
 +
 +The virtualenvs are created using the ''%%--%%prefix'' option to specify the directories created above:
 +
 +<code bash>
 +(base) [(my_workgroup:user)@login01 ~]$ conda create --prefix=${WORKDIR}/sw/tensorflow/2.2.0-gpu 'tensorflow[version=2.2.0,build=gpu_py38hb782248_0]'
 +WARNING: A directory already exists at the target location '/work/it_nss/sw/tensorflow/2.2.0-gpu'
 +but it is not a conda environment.
 +Continue creating environment (y/[n])? y
 +
 +   :
 +
 +Preparing transaction: done
 +Verifying transaction: done
 +Executing transaction: done
 +#
 +# To activate this environment, use
 +#
 +#     $ conda activate /work/it_nss/sw/tensorflow/2.2.0-gpu
 +#
 +# To deactivate an active environment, use
 +#
 +#     $ conda deactivate
 +</code>
 +
 +We're **not** going to activate that virtualenv -- we will install the other one next:
 +
 +<code bash>
 +(base) [(it_nss:frey)@login00 ~]$ conda create --prefix=${WORKDIR}/sw/tensorflow/2.3.0-intel-python3.8 'tensorflow[version=2.3.0,build=py38_0,channel=intel]'
 +WARNING: A directory already exists at the target location '/work/it_nss/sw/tensorflow/2.3.0-intel-python3.8'
 +but it is not a conda environment.
 +Continue creating environment (y/[n])? y
 +
 +   :
 +
 +Preparing transaction: done
 +Verifying transaction: done
 +Executing transaction: done
 +#
 +# To activate this environment, use
 +#
 +#     $ conda activate /work/it_nss/sw/tensorflow/2.3.0-intel-python3.8
 +#
 +# To deactivate an active environment, use
 +#
 +#     $ conda deactivate
 +</code>
 +
 +Ignore that ''conda activate'' command as well.  Rollback the ''intel-python'' environment changes before proceeding:
 +
 +<code bash>
 +(base) [(my_workgroup:user)@login01 ~]$ vpkg_rollback
 +[(my_workgroup:user)@login01 ~]$ 
 +</code>
 +
 +Notice the ''(base)'' has disappeared from the prompt, indicating that the baseline virtualenv has been deactivated.
 +
 +===== VALET Package Definition =====
 +
 +Assuming the workgroup does //not// already have a TensorFlow VALET package definition, the following text:
 +
 +<file tensorflow.vpkg_yaml>
 +tensorflow:
 +    prefix: /work/my_workgroup/sw/tensorflow
 +    description: TensorFlow Python environments
 +    flags:
 +        - no-standard-paths
 +    actions:
 +        - action: source
 +          script:
 +              sh: anaconda-activate.sh
 +          order: failure-first
 +          success: 0
 +    versions:
 +        "2.2.0:gpu":
 +            description: 2.2.0 with GPU support
 +            dependencies:
 +                - intel-python/2020u2:python3
 +        "2.3.0:intel,python3.8":
 +            description: 2.3.0 with Python 3.8, Intel optimizations
 +            dependencies:
 +                - intel-python/2020u2:python3
 +</file>
 +
 +would be added to ''${WORKDIR}/sw/valet/tensorflow.vpkg_yaml'' If that file already exists, add your new version at the same level as others:
 +
 +<file tensorflow.vpkg_yaml>
 +tensorflow:
 +    prefix: /work/my_workgroup/sw/tensorflow
 +    description: TensorFlow Python environments
 +    flags:
 +        - no-standard-paths
 +    actions:
 +        - action: source
 +          script:
 +              sh: anaconda-activate.sh
 +          order: failure-first
 +          success: 0
 +    versions:
 +        "2.2.0:gpu":
 +            description: 2.2.0 with GPU support
 +            dependencies:
 +                - intel-python/2020u2:python3
 +        "2.3.0:intel,python3.8":
 +            description: 2.3.0 with Python 3.8, Intel optimizations
 +            dependencies:
 +                - intel-python/2020u2:python3
 +        "1.8.0":
 +            description: 1.8.0 from pkgs/main
 +            dependencies:
 +                - intel-python/2018u3:python3
 +</file>
 +
 +<note warning>Make sure you modify ''prefix: /work/my_workgroup/sw/tensorflow'' for your workgroup (e.g. If my workgroup is ''it_nss'', then use I would use ''prefix: /work/it_nss/sw/tensorflow'').</note>
 +
 +<note tip>On Caviness after a user has used the ''workgroup'' command, VALET searches for package definitions in ''${WORKDIR}/sw/valet'' by default.  VALET also searches a ''~/.valet'' directory (in your home directory) if it exists, so that's the best location for personal package definitions -- for software you've installed in your home directory, for example.</note>
 +
 +With a properly-constructed package definition file, you can now check for your versions of TensorFlow:
 +
 +<code bash>
 +[(it_nss:frey)@login00 ~]$ vpkg_versions tensorflow
 +
 +Available versions in package (* = default version):
 +
 +[/work/my_workgroup/sw/valet/tensorflow.vpkg_yaml]
 +tensorflow               TensorFlow Python environments
 +* 2.2.0:gpu              2.2.0 with GPU support
 +  2.3.0:intel,python3.8  2.3.0 with Python 3.8, Intel optimizations
 +  
 +     :
 +</code>
 +
 +===== Job Scripts =====
 +
 +Any job scripts you submit that want to run scripts using this virtualenv should include something like the following toward its end:
 +
 +<code>
 +#
 +# Setup TensorFlow virtualenv:
 +#
 +vpkg_require tensorflow/2.3.0:intel,python3.8
 +
 +#
 +# Run a Python script in that virtualenv:
 +#
 +python3 my_tf_work.py
 +rc=$?
 +
 +#
 +# Do cleanup work, etc....
 +#
 +
 +#
 +# Exit with whatever exit code our Python script handed back:
 +#
 +exit $rc
 +</code>