Hello all,
I am trying to use as much parallelism as possible on my computing grid. In this sense, I combine MPI+OpenMP. MPI processes are spawned fine, but OpenMP threads always amount to 1, no matter that I set local_num_threads correctly ( https://nest-simulator.readthedocs.io/en/latest/hpc/slurm_script.html#set-lo...) as well as all SLURM variables. I am sure I have compiled nest with OpenMP too. I use slightly modified izhikevich neurons with stdp synapses. Can the model specifics be the cause of underutilizing available threads? Or is there something else in the system that could be the reason for that?
Best regards,
Dear Stefan,
Have you tried setting the OMP variable in your batch script? Something like:
export OMP_NUM_THREADS=<No. of threads>
You should do this before your srun call to the nest script and in combination to setting the local_num_threads, as well as the right value for -c (or --cpus-per-task).
All the best, Sandra
On 05.08.23 13:05, Stefan Dvoretskii wrote:
Hello all,
I am trying to use as much parallelism as possible on my computing grid. In this sense, I combine MPI+OpenMP. MPI processes are spawned fine, but OpenMP threads always amount to 1, no matter that I set local_num_threads correctly (https://nest-simulator.readthedocs.io/en/latest/hpc/slurm_script.html#set-lo...) as well as all SLURM variables. I am sure I have compiled nest with OpenMP too. I use slightly modified izhikevich neurons with stdp synapses. Can the model specifics be the cause of underutilizing available threads? Or is there something else in the system that could be the reason for that?
Best regards,
Stefan Dvoretskii
NEST Users mailing list --users@nest-simulator.org To unsubscribe send an email tousers-leave@nest-simulator.org
Dear Sandra,
I have done all of what you describe, and still only one OpenMP thread is being used on each node. (I set all above listed variables to 16 threads)
For reproducibility: I attach my SLURM batch script and the simulation script. I use NEST 3.4 compiled with openmp and mpi, and Intel(R) MPI Library for Linux* OS, Version 2019 Update 12 Build 20210429 (id: e380127cb).
Best regards,
On Sun, 6 Aug 2023 at 12:02, Sandra Diaz s.diaz@fz-juelich.de wrote:
Dear Stefan,
Have you tried setting the OMP variable in your batch script? Something like:
export OMP_NUM_THREADS=<No. of threads>
You should do this before your srun call to the nest script and in combination to setting the local_num_threads, as well as the right value for -c (or --cpus-per-task).
All the best, Sandra
On 05.08.23 13:05, Stefan Dvoretskii wrote:
Hello all,
I am trying to use as much parallelism as possible on my computing grid. In this sense, I combine MPI+OpenMP. MPI processes are spawned fine, but OpenMP threads always amount to 1, no matter that I set local_num_threads correctly ( https://nest-simulator.readthedocs.io/en/latest/hpc/slurm_script.html#set-lo...) as well as all SLURM variables. I am sure I have compiled nest with OpenMP too. I use slightly modified izhikevich neurons with stdp synapses. Can the model specifics be the cause of underutilizing available threads? Or is there something else in the system that could be the reason for that?
Best regards,
Stefan Dvoretskii
NEST Users mailing list -- users@nest-simulator.org To unsubscribe send an email to users-leave@nest-simulator.org
-- Dr. Sandra Diaz Pier Simulation & Data Lab Neuroscience Jülich Supercomputing Centre Institute for Advanced Simulation Forschungszentrum Jülich
Office: +49 2461 61-8913
Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Stefan Müller Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Dr. Ir. Pieter Jansens, Prof. Dr. Astrid Lambrecht, Prof. Dr. Frauke Melchior
Dear Stefan,
I could not spot an error in the slurm or python script.
As Sandra already pointed out, it is advantageous to export some OMP variable in your jobfile. Besides OMP_NUM_THREADS, it is also good to set
* export OMP_PROC_BIND=TRUE -> pins the threads to certain cores, increases NEST performance * export OMP_DISPLAY_ENV=TRUE -> gives information on whether OMP was set up correctly * export OMP_DISPLAY_AFFINITY=TRUE -> gives information on whether OMP was set up correctly
See also here https://nest-simulator.readthedocs.io/en/v3.5/hpc/parallel_computing.html for the part of the NEST documentation regarding parallel computing, generally a good source of information.
Regarding your concrete problem, it is somewhat difficult to debug with the information available
What would help is the standard error generated by slurm when exporting the two OMP variables regarding the OMP configuration, i.e. OMP_DISPLAY_ENV=TRUE and OMP_DISPLAY_AFFINITY=TRUE
Best
Anno
On 06.08.23 13:24, Stefan Dvoretskii wrote:
Dear Sandra,
I have done all of what you describe, and still only one OpenMP thread is being used on each node. (I set all above listed variables to 16 threads)
For reproducibility: I attach my SLURM batch script and the simulation script. I use NEST 3.4 compiled with openmp and mpi, and Intel(R) MPI Library for Linux* OS, Version 2019 Update 12 Build 20210429 (id: e380127cb).
Best regards,
On Sun, 6 Aug 2023 at 12:02, Sandra Diaz s.diaz@fz-juelich.de wrote:
Dear Stefan, Have you tried setting the OMP variable in your batch script? Something like: export OMP_NUM_THREADS=<No. of threads> You should do this before your srun call to the nest script and in combination to setting the local_num_threads, as well as the right value for -c (or --cpus-per-task). All the best, Sandra On 05.08.23 13:05, Stefan Dvoretskii wrote:
Hello all, I am trying to use as much parallelism as possible on my computing grid. In this sense, I combine MPI+OpenMP. MPI processes are spawned fine, but OpenMP threads always amount to 1, no matter that I set local_num_threads correctly (https://nest-simulator.readthedocs.io/en/latest/hpc/slurm_script.html#set-local-num-threads-in-your-nest-script) as well as all SLURM variables. I am sure I have compiled nest with OpenMP too. I use slightly modified izhikevich neurons with stdp synapses. Can the model specifics be the cause of underutilizing available threads? Or is there something else in the system that could be the reason for that? Best regards, -- Stefan Dvoretskii _______________________________________________ NEST Users mailing list --users@nest-simulator.org To unsubscribe send an email tousers-leave@nest-simulator.org
-- Dr. Sandra Diaz Pier Simulation & Data Lab Neuroscience Jülich Supercomputing Centre Institute for Advanced Simulation Forschungszentrum Jülich Office: +49 2461 61-8913 --------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------- Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Stefan Müller Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Dr. Ir. Pieter Jansens, Prof. Dr. Astrid Lambrecht, Prof. Dr. Frauke Melchior --------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------
-- Stefan Dvoretskii
NEST Users mailing list --users@nest-simulator.org To unsubscribe send an email tousers-leave@nest-simulator.org
Hi Stefan,
I'm also having some unexplainable performance issue with OpenMP on our cluster, and have yet to find a solution, but perhaps the behavior I've observed would provide you some ideas.
I run my Nest scripts on a single node (so no MPI) but have set the number of threads using `local_num_threads` in my script and also the `#SBATCH --ntasks=28`. Additionally, I've set the `MKL_THREADING_LAYER` environment variable per [GitHub issue #2573](https://github.com/nest/nest-simulator/issues/2573).
However, when monitoring the node via `htop`, I only sometimes see utilization close to 28x cores. The core utilization seems to be random from each SBATCH runs (sometimes 10x core, sometimes 6x cores) even when on an idle node of my cluster, thus the issue seems to be independent of how subscribed a node is.
The GH issue #2573 also points to [GH issue #2401](https://github.com/nest/nest-simulator/pull/2401) and [this documentation on threading](https://nest-simulator.readthedocs.io/en/stable/hpc/threading.html#table-of-...).
Thanks for pointing to these references, Tony! I would eventually try the suggested method and let you know how well it works.
вт, 29 серп. 2023, 20:35 користувач Tony Lee anthonyylee@colorado.edu пише:
Hi Stefan,
I'm also having some unexplainable performance issue with OpenMP on our cluster, and have yet to find a solution, but perhaps the behavior I've observed would provide you some ideas.
I run my Nest scripts on a single node (so no MPI) but have set the number of threads using `local_num_threads` in my script and also the `#SBATCH --ntasks=28`. Additionally, I've set the `MKL_THREADING_LAYER` environment variable per [GitHub issue #2573]( https://github.com/nest/nest-simulator/issues/2573).
However, when monitoring the node via `htop`, I only sometimes see utilization close to 28x cores. The core utilization seems to be random from each SBATCH runs (sometimes 10x core, sometimes 6x cores) even when on an idle node of my cluster, thus the issue seems to be independent of how subscribed a node is.
The GH issue #2573 also points to [GH issue #2401]( https://github.com/nest/nest-simulator/pull/2401) and [this documentation on threading]( https://nest-simulator.readthedocs.io/en/stable/hpc/threading.html#table-of-... ). _______________________________________________ NEST Users mailing list -- users@nest-simulator.org To unsubscribe send an email to users-leave@nest-simulator.org