-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mpirun signal 9 (killed) #740
Comments
I would like to note that I too have received this sort of issue. My lab is attempting to run the program to generate global simulations with NEX_XI and NEX_ETA params at a value of 256. I am compiling the program to run with a set of NVidia RTX 2080s. While I am aware there is a parameter called MEMORY_INSTALLED_PER_CORE_IN_GB and the related PERCENT_OF_MEM_TO_USE_PER_CORE, assigning different values to these doesn't actually affect the performance of the program. Each GPU thread only seems to use about 150MB of memory on the GPUs; we are running this over 24 threads. The xcreate_header_file binary suggests that we need about 160GB of memory to run the solver. We only have around 128GB of RAM available on the system. Aside from lowering the value of the NEX_XI and NEX_ETA parameters or increasing the amount of memory available to the system, is there anything we can do to fix this memory issue, especially given that the GPUs' memory is mostly going unused? |
@subedika: maybe you can add more details, e.g., attach the output files here? also, try first the most recent devel branch version to check. @amkearns-usgs: the numbers above don't seem to match: 150MB on the GPU for 24 threads would only amount to 3.6 GB memory needed, not the 160GB mentioned to run the solver. there should be more details in the output_solver.txt files for example. what is the setup in your case, 24 GPU cards spread over 24 compute nodes? or only a single compute node with 1 GPU card and you use CUDA MPS to run all processes on the same card and node? to use more GPU memory per process, you would lower the anyway, add more outputs if you want to get more specific answers... :) |
To be more precise, the GPU memory ussage is only ~150 MB per thread according to nvidia-smi. Main memory usage (according to htop) is multiple GB per thread. Once the program gets past ~5 GB per thread it crashes due to an out of memory error. The system we run on has 4 RTX 2080 GPUs, and I believe it has 10 dual-thread CPU cores (exact hardware according to /proc/cpuinfo is Intel core i9-9820X, 3.30GHz). |
Here is the contents of output_solver from the last attempted run of the program: **** Specfem3D MPI Solver **** Version: v7.0.2-421-gc4c30a79 Planet: Earth There are 24 MPI processes There are 256 elements along xi in each chunk There are 2 slices along xi in each chunk NDIM = 3 NGLLX = 5 using single precision for the calculations smallest and largest possible floating-point numbers are: 1.17549435E-38 3.40282347E+38 model: s362ani incorporating 3-D lateral variations in the mantle GPU_MODE Active. creating global slice addressing Spatial distribution of the slices
mesh databases: percentage of edge elements in crust/mantle 5.73075438 % percentage of edge elements in outer core 14.9479170 % percentage of edge elements in inner core 14.9107141 % Elapsed time for reading mesh in seconds = 357.150177 topography: Elapsed time for reading topo/bathy in seconds = 0.508016586 adjacency: using kd-tree search radius = 234.55179839086608 (km) maximum search elements = 656 estimated typical element size at surface = 39.091966398477680 (km) maximum neighbors found per element = 37 (should be 37 for globe meshes) Elapsed time for detection of neighbors in seconds = 16.861965878168121 kd-tree: sources: 1 locating sources source # 1 source located in slice 2 using moment tensor source: source time function:
magnitude of the source: original (requested) position of the source:
position of the source that will be used:
Error in location of the source: 1.42565528E-12 km maximum error in location of the sources: 1.42565528E-12 km Elapsed time for detection of sources in seconds = 3.6215900378301740 End of source detection - done receivers: Total number of receivers = 378 locating receivers reading receiver information... Stations sorted by epicentral distance: maximum error in location of all the receivers: 5.45888942E-12 km Elapsed time for receiver detection in seconds = 0.24783961405046284 End of receiver detection - done found a total of 378 receivers in all slices source arrays: seismograms: Total number of samples for seismograms = 135500 Reference radius of the globe used is 6371.0000000000000 km incorporating the oceans using equivalent load incorporating ellipticity incorporating surface topography incorporating self-gravitation (Cowling approximation) incorporating rotation incorporating attenuation using 3 standard linear solids preparing mass matrices number of SLS bodies: 3 Reference frequency of anelastic model (Hz): 1.00000000 using shear attenuation Q_mu ATTENUATION_1D_WITH_3D_STORAGE : T This is where the file ends, because that's where the program stops running. |
right, the node or workstation has not enough memory to fit and run this simulation setup. the code stops when assigning values to the wavefields. after allocation, arrays have not been mapped to memory yet. this is done with the first wavefield initialization here. given with this setup of 24 MPI processes on a single node and the NEX 256 setting, the estimate is having ~160GB memory. you will have to run on multiple nodes or workstations (given they can communicate by an MPI installation), or run it on a fat node with more memory. regarding the GPUs, a Geforce RTX 2080 card has 8GB memory. the setup with NEX 256 and 24 MPI processes (and model s362ani) will require ~5GB GPU memory per process based on my past experience. thus, only a single process would fit onto one card. I'm afraid you will need more GPU cards as well to run this setup. |
While trying to run the
go_solver_pbs.bash
in theglobal_s362ani_shakemovie
directory, I get ampirun noticed that process rank 7 with PID 0 on node [nodename] exited on signal 9 (killed)
error, which I suppose is a memory overflow error. How do I fix this?The text was updated successfully, but these errors were encountered: