Skip to content
This repository has been archived by the owner on Jul 6, 2023. It is now read-only.

Latest commit

 

History

History
257 lines (203 loc) · 20 KB

README.md

File metadata and controls

257 lines (203 loc) · 20 KB

Julia on HPC systems


Note: This repository is no longer maintained. More and up-to-date information on using Julia for HPC can be found on the new site at https://juliahpc.github.io/JuliaOnHPCClusters/, which includes the contents of this repository.


The purpose of this repository is to document best practices for running Julia on HPC systems (i.e., "supercomputers"). At the moment, both information relevant for supercomputer operators as well as users is collected here. There is no guarantee for permanence or that information here is up-to-date, neither for a useful ordering and/or categorization of issues.

For operators

Official Julia binaries vs. building from source

According to this Discourse post, the difference between compiling Julia from source with architecture-specific optimization and using the official Julia binaries is negligible. This has been confirmed by Ludovic Räss for an Nvidia DGX-1 system at CSCS, where also no performance differences between a Spack-installed version and the official binaries were found (April 2022).

Since installing from source using, e.g., Spack, can sometimes be cumbersome, the general recommendation is to go with the pre-built binaries unless benchmarked and found to be different. This is also the current approach taken at NERSC, CSCS, and PC2.

In June 2022, a new Julia PR was created (JuliaLang/julia#45641) that aims to add PGO (profile-guided optimization) and LTO (link-time optimization) to the Julia Makefile. Depending on the test, compilation time improvements of up to 30% have been reported, so it might be worth checking out once merged. The performance of the compiled Julia code is unaffected though.

Last update: June 2022

Ensure correct libraries are loaded

When using Julia on a system that uses an environment-variable based module system (such as modules or Lmod), the LD_LIBRARY_PATH variable might be filled with entries pointing to different packages and libraries. To avoid issues from Julia loading another library instead of the ones packaged with Julia, make sure that Julia's lib directory is always the first directory in LD_LIBRARY_PATH.

One possibility to achieve this is to create a wrapper shell script that modifies LD_LIBRARY_PATH before calling the Julia executable. Inspired by a script from UCL's Owain Kenway:

#!/usr/bin/env bash

# This wrapper makes sure the julia binary distributions picks up the GCC
# libraries provided with it correctly meaning that it does not rely on
# the gcc-libs version.

# Dr Owain Kenway, 20th of July, 2021
# Source: https://github.com/UCL-RITS/rcps-buildscripts/blob/04b2e2ccfe7e195fd0396b572e9f8ff426b37f0e/files/julia/julia.sh

location=$(readlink -f $0)
directory=$(readlink -f $(dirname ${location})/..)

export LD_LIBRARY_PATH=${directory}/lib/julia:${LD_LIBRARY_PATH}
exec ${directory}/bin/julia "$@"

Note that using readlink might not be optimal from a performance perspective if used in a massively parallel environment. Alternatively, hard-code the Julia path or set an environment variable accordingly.

Also note that fixing the LD_LIBRARY_PATH variable does not seem to be a hard requirement, since it is not used universally (e.g., it is not necessary on NERSC's systems).

Last update: April 2022

Julia depot path

Since the available file systems can differ significantly between HPC centers, it is hard to make a general statement about where the Julia depot folder (by default on Unix-like systems: ~/.julia) should be placed (via JULIA_DEPOT_PATH). Generally speaking, the file system hosting the Julia depot should have

  • good (parallel) I/O
  • no tight quotas
  • read and write access
  • no mechanism for the automatic deletion of unused files (or the depot should be excluded as an exception)

On some systems, it resides in the user's home directory (e.g. at NERSC). On other systems, it is put on a parallel scratch file system (e.g. CSCS and PC2). At the time of writing (April 2022), there does not seem to be reliable performance data available that could help to make a data-based decision.

If multiple platforms, e.g., systems with different architecture, would access the same Julia depot, for example because the file system is shared, it might make sense to create platform-dependend Julia depots by setting the JULIA_DEPOT_PATH environment variable appropriately, e.g.,

prepend-path JULIA_DEPOT_PATH $env(HOME)/.julia/$platform

where $platform contains the current system name (source).

MPI.jl

It is generally recommended to set

JULIA_MPI_BINARY=system

such that MPI.jl will always use a system MPI instead of the Julia artifact (i.e. MPI_jll.jl). For more configuration options see this part of the MPI.jl documentation.

Additionally, on the NERSC systems, there is a pre-built MPI.jl for each programming environment, which is loaded through a settings module. More information on the NERSC module file setup can be found here.

CUDA.jl

It seems to be generally advisable to set the environment variables

JULIA_CUDA_USE_BINARYBUILDER=false
JULIA_CUDA_USE_MEMORY_POOL=none

in the module files when loading Julia on a system with GPUs. Otherwise, Julia will try to download its own BinaryBuilder.jl-provided CUDA stack, which is typically not what you want on a production HPC system. Instead, you should make sure that Julia finds the local CUDA installation by setting relevant environment variables (see also the CUDA.jl docs). Disabling the memory pool is advisable to make CUDA-aware MPI work on multi-GPU nodes (see also the MPI.jl docs).

Modules file setup

Johannes Blaschke provides scripts and templates to set up modules file for Julia on some of NERSC's systems:
https://gitlab.blaschke.science/nersc/julia/-/tree/main/modulefiles

There are a number of environment variables that should be considered to be set through the module mechanism:

Easybuild resources

Samuel Omlin and colleagues from CSCS provide their Easybuild configuration files used for Piz Daint online at https://github.com/eth-cscs/production/tree/master/easybuild/easyconfigs/j/Julia. For example, there are configurations available for Julia 1.7.2 and for Julia 1.7.2 with CUDA support. Looking at these files also helps to decide which kind of environment variables are useful to set.

Further resources

For users

HPC systems with Julia support

We maintain an (incomplete) list of HPC systems that provide a Julia installation and/or support for using Julia to its users. For this, we use the following nomenclature:

  • Center: The HPC center's name
  • System: The compute system's "marketing" name
  • Installation: Is there a pre-installed Julia configuration available?
  • Support: Is Julia "officially" supported on the system, i.e., will Julia users be supported by HPC center staff if they have questions/problems?
  • Interactive: Is interactive computing with Julia supported, i.e., can you run parallel jobs on the system interactively via, e.g., Jupyter notebooks?
  • Architecture: The main CPU used in the system
  • Accelerators: The main accelerator (if anything) in the system
  • Documentation: Links to documentation for Julia users

Australasia

Center System Installation Support Interactive Architecture Accelerators Documentation
NeSI Mahuika, Māui Intel Xeon Broadwell/Cascade Lake + AMD EPYC Milan Nvidia Tesla P100, A100 1

Europe

Center System Installation Support Interactive Architecture Accelerators Documentation
ARC, UCL Myriad, Kathleen, Michael, Young ? various Intel Xeon various GPUs 1
CSC (EuroHPC) LUMI ? AMD EPYC Milan AMD Radeon Instinct MI250X 1
CSCS Piz Daint Intel Xeon Broadwell + Haswell Nvidia Tesla P100 1
DESY IT Maxwell ? various AMD EPYC/Intel Xeon various GPUs 1
HLRS Hawk AMD EPYC Rome Nvidia Tesla A100 1
HPC2N, Umeå U Kebnekaise ? Intel Xeon Broadwell + Skylake Nvidia Tesla K80, Nvidia Tesla V100 1
IT4I (EuroHPC) Karolina AMD EPYC Rome Nvidia Ampere A100 1
IZUM (EuroHPC) Vega AMD EPYC Rome Nvidia Ampere A100 1
LuxProvide (EuroHPC) MeluXina ? AMD EPYC Rome Nvidia Ampere A100-40 1, 2
PC2, U Paderborn Noctua 1 Intel Xeon Skylake Intel Stratix 10 + consumer GPUs 1
PC2, U Paderborn Noctua 2 AMD EPYC Milan Nvidia Ampere A100, Xilinx Alveo U280 1
ULHPC, U Luxembourg Aion, Iris ? AMD EPYC Rome + Intel Xeon Broadwell/Skylake Nvidia Tesla V100 1
ZDV, U Mainz MOGON II ? ? Intel Xeon Broadwell + Skylake no 1
ZIB HLRN-IV ? Intel Cascade Lake AP coming soon: Nvidia A100, Intel PVC 1

North America

Center System Installation Support Interactive Architecture Accelerators Documentation
Carnegie Mellon College of Engineering Arjuna, Hercules Intel Xeon+AMD EPYC Milan Nvidia A100, Nvidia K80 1
Dartmouth College Discovery ? Intel Xeon (various) + AMD EPYC 7532 Nvidia V100 1
FASRC, Harvard U Cannon ? Intel Xeon Cascade Lake Nvidia V100, A100 1
HPC @ LLNL various systems ? various processors various GPUs 1
NERSC Cori ? ? Intel Xeon Haswell Intel Xeon Phi 1
NERSC Perlmutter ? AMD EPYC Milan Nvidia Ampere A100 1, 2
Open Science Grid N/A ? Various Various 1
Perimeter Institute for Theoretical Physics Symmetry AMD EPYC, Intel Xeon Nvidia V100 -
Pittsburgh Supercomputing Center Bridges-2 AMD EPYC, Intel Xeon Nvidia V100 1
Princeton University Several including Tiger Intel Xeon (Skylake + Broadwell) Nvidia P100 1

Other HPC systems

There are a number of other HPC systems that have been reported to provide a Julia installation and/or Julia support, but lack enough details to be put on the list above:

  • Various clusters at ANL

License and contributing

The contents of this repository are published under the MIT license (see LICENSE). Our main goal is to publicly curate information on using Julia on HPC systems, as a service from the community and for the community. Therefore, we are very happy to accept contributions from everyone, preferably in the form of a PR.

Authors

This repository is maintained by Michael Schlottke-Lakemper (RWTH Aachen University, Germany).

The following people have provided valuable contributions, either in the form of PRs or via private communication:

Disclaimer

Everything is provided as is and without warranty. Use at your own risk!