Hello,
TL;DR: using structural plasticity with mpi-based simulation leads to spontaneous crashes in NESTv3.6 onward. A minimal script reproducing the crash is provided
This is a followup to my previous mailing list post (https://www.nest-simulator.org/mailinglist/hyperkitty/list/users@nest-simula...) where I encountered segmentation faults while executing structural-plasticity-based simulations in NESTv3.8.
Previously, I suspected that the crashes occurred due to presumed buggy installations of NEST on HPCs. The helpful comments on that post did not solve the issue, so I performed some tests and gathered feedback from other users testing structural plasticity (SP) in NESTv3.6+.
My conclusion is that MPI-based simulations spontaneously crash when using structural plasticity. The probability of spontaneous crashes increases with the number of MPI processes. Below, I'll provide more details and also a link to a minimal code that reproduces the segmentation fault.
Background: I use SP to perform large-scale network simulations on HPC. Hitherto, I've been using NESTv2.20.1 for various reasons. When I found that SP functionality in NESTv3.6 was finally equivalent to that of NESTv2.20.1, I decided to port my code to the latest release (v3.8). I installed v3.8 on my local machine with MPI support, and tested a scaled-down experiment as a sanity check---everything worked as expected.
However, when I ran the full-scale network on HPCs (JUSUF at Jülich and NEMO at uni-freiburg), I got segmentation faults. HPC support couldn't help in resolving the issues. Reinstalling didn't help. At this point, I reached out with the aforementioned post on the mailing list ("MPI-based error on v3.8").
Current Status: I created a minimal pynest script that simulates an E-I network, where the E-->E connections are organised by structural plasticity. I believe this minimal example includes the necessary steps involved in any SP-based experiment. The script can be run by the following BASH call: `mpirun python minimal.py $RANDOM_SEED $TOTAL_NUM_VIRTUAL_PROCS # use srun, when required`
The following table gives a summary of the segmentation-fault crashes for `minimal.py`: NEST version machine (MPI procs count) Crash v2.20.1 local (2) No v2.20.2 local (2) No v2.20.1 HPC-2 (upto 1024) No v2.20.2 HPC-1 (upto 1024) No v3.8 local (2) Yes (rarely) v3.8 HPC-1 (8) No v3.8 HPC-1 (16, 24, 32, 64, 128) Yes v3.8 HPC-2 (8) No v3.8 HPC-2 (16, 24, 32, 64, 128) Yes v3.6 HPC-1 (32, 64, 128, 256) Yes
- machines: - local: Linux 5.4.0-204-generic x86_64; Intel Core i5-5300U; 16GB - HPC-2: NEMO (https://www.nemo.uni-freiburg.de/) - HPC-1: JUSUF (https://www.fz-juelich.de/en/ias/jsc/systems/supercomputers/jusuf)
- NESTv2.20.1, v2.20.2, and v3.8 in all cases were installed manually. v3.6 was available as a preinstalled module in HPC-1 - The crashes do not occur if MPI is not used for SP simulations - The crashes do not occur in MPI-based simulations that do not involve SP
Conclusion: - MPI-based crashes occur in SP simulations. - Perhaps how MPI-procs handle data in NESTv3.6+ leads to spontaneous segmentation faults. - MPI-proc count potentially correlates with segmention-fault frequency. It is possible that faults will occur, although rarely, with small no. of MPI procs
Expectations from this post: Large no. of MPI-procs provide a very significant speed-up for SP-based simulations, therefore it is important that MPI is fully functional in NEST. - Kindly let me know if you can reproduce the same crash. If what I observe is indeed true, then this merits creating an issue on GitHub. - If you cannot reproduce the crashes, I would greatly appreciate any help and fixing the issue.
Code: - You will find a formated version of this post with acocmpanying code and crash dumps on this GitLab page: https://gitlab.rz.uni-freiburg.de/as2013/minimal_share.git - minimal.py: https://gitlab.rz.uni-freiburg.de/as2013/minimal_share/-/blob/119e52d3552a7e...
Thanks!
Best, Ady
PS. Is it possible to add some sort of text formatting (HTML/markdown) to these posts?
Dear Ady,
Thank you very much for reporting this behavior to us. Could you please create an issue in the NEST github so we can keep good track of it and address this as soon as possible? This seems to be related to an error during event delivery, probably a deleted synapse on which an event was meant to be delivered. Will take a look as soon as possible and give you some advice.
All the best, Sandra
On 15.01.25 10:54, Ady Sharma wrote:
Hello,
TL;DR: using structural plasticity with mpi-based simulation leads to spontaneous crashes in NESTv3.6 onward. A minimal script reproducing the crash is provided
This is a followup to my previous mailing list post (https://www.nest-simulator.org/mailinglist/hyperkitty/list/users@nest-simula...) where I encountered segmentation faults while executing structural-plasticity-based simulations in NESTv3.8.
Previously, I suspected that the crashes occurred due to presumed buggy installations of NEST on HPCs. The helpful comments on that post did not solve the issue, so I performed some tests and gathered feedback from other users testing structural plasticity (SP) in NESTv3.6+.
My conclusion is that MPI-based simulations spontaneously crash when using structural plasticity. The probability of spontaneous crashes increases with the number of MPI processes. Below, I'll provide more details and also a link to a minimal code that reproduces the segmentation fault.
Background: I use SP to perform large-scale network simulations on HPC. Hitherto, I've been using NESTv2.20.1 for various reasons. When I found that SP functionality in NESTv3.6 was finally equivalent to that of NESTv2.20.1, I decided to port my code to the latest release (v3.8). I installed v3.8 on my local machine with MPI support, and tested a scaled-down experiment as a sanity check---everything worked as expected.
However, when I ran the full-scale network on HPCs (JUSUF at Jülich and NEMO at uni-freiburg), I got segmentation faults. HPC support couldn't help in resolving the issues. Reinstalling didn't help. At this point, I reached out with the aforementioned post on the mailing list ("MPI-based error on v3.8").
Current Status: I created a minimal pynest script that simulates an E-I network, where the E-->E connections are organised by structural plasticity. I believe this minimal example includes the necessary steps involved in any SP-based experiment. The script can be run by the following BASH call: `mpirun python minimal.py $RANDOM_SEED $TOTAL_NUM_VIRTUAL_PROCS # use srun, when required`
The following table gives a summary of the segmentation-fault crashes for `minimal.py`: NEST version machine (MPI procs count) Crash v2.20.1 local (2) No v2.20.2 local (2) No v2.20.1 HPC-2 (upto 1024) No v2.20.2 HPC-1 (upto 1024) No v3.8 local (2) Yes (rarely) v3.8 HPC-1 (8) No v3.8 HPC-1 (16, 24, 32, 64, 128) Yes v3.8 HPC-2 (8) No v3.8 HPC-2 (16, 24, 32, 64, 128) Yes v3.6 HPC-1 (32, 64, 128, 256) Yes
- machines: - local: Linux 5.4.0-204-generic x86_64; Intel Core i5-5300U; 16GB - HPC-2: NEMO (https://www.nemo.uni-freiburg.de/) - HPC-1: JUSUF (https://www.fz-juelich.de/en/ias/jsc/systems/supercomputers/jusuf) - NESTv2.20.1, v2.20.2, and v3.8 in all cases were installed manually. v3.6 was available as a preinstalled module in HPC-1 - The crashes do not occur if MPI is not used for SP simulations - The crashes do not occur in MPI-based simulations that do not involve SP
Conclusion:
- MPI-based crashes occur in SP simulations.
- Perhaps how MPI-procs handle data in NESTv3.6+ leads to spontaneous segmentation faults.
- MPI-proc count potentially correlates with segmention-fault frequency. It is possible that faults will occur, although rarely, with small no. of MPI procs
Expectations from this post: Large no. of MPI-procs provide a very significant speed-up for SP-based simulations, therefore it is important that MPI is fully functional in NEST.
- Kindly let me know if you can reproduce the same crash. If what I observe is indeed true, then this merits creating an issue on GitHub.
- If you cannot reproduce the crashes, I would greatly appreciate any help and fixing the issue.
Code:
- You will find a formated version of this post with acocmpanying code and crash dumps on this GitLab page: https://gitlab.rz.uni-freiburg.de/as2013/minimal_share.git
- minimal.py: https://gitlab.rz.uni-freiburg.de/as2013/minimal_share/-/blob/119e52d3552a7e...
Thanks!
Best, Ady
PS. Is it possible to add some sort of text formatting (HTML/markdown) to these posts? _______________________________________________ NEST Users mailing list -- users@nest-simulator.org To unsubscribe send an email to users-leave@nest-simulator.org