OpenMP: only one thread is used

List overview All Threads
Download

newer

older

A problem running a .nestml file

Network simulation of Hanuschkin...

Stefan Dvoretskii

5 Aug 2023 5 Aug '23

1:05 p.m.

Hello all,

I am trying to use as much parallelism as possible on my computing grid. In this sense, I combine MPI+OpenMP. MPI processes are spawned fine, but OpenMP threads always amount to 1, no matter that I set local_num_threads correctly ( https://nest-simulator.readthedocs.io/en/latest/hpc/slurm_script.html#set-lo...) as well as all SLURM variables. I am sure I have compiled nest with OpenMP too. I use slightly modified izhikevich neurons with stdp synapses. Can the model specifics be the cause of underutilizing available threads? Or is there something else in the system that could be the reason for that?

Best regards,

-- Stefan Dvoretskii

Attachments:

attachment.html (text/html — 1.0 KB)

Show replies by date

Sandra Diaz

6 Aug 6 Aug

12:02 p.m.

Dear Stefan,

Have you tried setting the OMP variable in your batch script? Something like:

export OMP_NUM_THREADS=<No. of threads>

You should do this before your srun call to the nest script and in combination to setting the local_num_threads, as well as the right value for -c (or --cpus-per-task).

All the best, Sandra

On 05.08.23 13:05, Stefan Dvoretskii wrote:

...

Hello all,

I am trying to use as much parallelism as possible on my computing grid. In this sense, I combine MPI+OpenMP. MPI processes are spawned fine, but OpenMP threads always amount to 1, no matter that I set local_num_threads correctly (https://nest-simulator.readthedocs.io/en/latest/hpc/slurm_script.html#set-lo...) as well as all SLURM variables. I am sure I have compiled nest with OpenMP too. I use slightly modified izhikevich neurons with stdp synapses. Can the model specifics be the cause of underutilizing available threads? Or is there something else in the system that could be the reason for that?

Best regards,

Stefan Dvoretskii

NEST Users mailing list --users@nest-simulator.org To unsubscribe send an email tousers-leave@nest-simulator.org

-- Dr. Sandra Diaz Pier Simulation & Data Lab Neuroscience Jülich Supercomputing Centre Institute for Advanced Simulation Forschungszentrum Jülich Office: +49 2461 61-8913 --------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------- Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Stefan Müller Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Dr. Ir. Pieter Jansens, Prof. Dr. Astrid Lambrecht, Prof. Dr. Frauke Melchior --------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------

Stefan Dvoretskii

1:24 p.m.

Dear Sandra,

I have done all of what you describe, and still only one OpenMP thread is being used on each node. (I set all above listed variables to 16 threads)

For reproducibility: I attach my SLURM batch script and the simulation script. I use NEST 3.4 compiled with openmp and mpi, and Intel(R) MPI Library for Linux* OS, Version 2019 Update 12 Build 20210429 (id: e380127cb).

Best regards,

On Sun, 6 Aug 2023 at 12:02, Sandra Diaz s.diaz@fz-juelich.de wrote:

...

Dear Stefan,

Have you tried setting the OMP variable in your batch script? Something like:

export OMP_NUM_THREADS=<No. of threads>

You should do this before your srun call to the nest script and in combination to setting the local_num_threads, as well as the right value for -c (or --cpus-per-task).

All the best, Sandra

On 05.08.23 13:05, Stefan Dvoretskii wrote:

Hello all,

I am trying to use as much parallelism as possible on my computing grid. In this sense, I combine MPI+OpenMP. MPI processes are spawned fine, but OpenMP threads always amount to 1, no matter that I set local_num_threads correctly ( https://nest-simulator.readthedocs.io/en/latest/hpc/slurm_script.html#set-lo...) as well as all SLURM variables. I am sure I have compiled nest with OpenMP too. I use slightly modified izhikevich neurons with stdp synapses. Can the model specifics be the cause of underutilizing available threads? Or is there something else in the system that could be the reason for that?

Best regards,

Stefan Dvoretskii

NEST Users mailing list -- users@nest-simulator.org To unsubscribe send an email to users-leave@nest-simulator.org

-- Dr. Sandra Diaz Pier Simulation & Data Lab Neuroscience Jülich Supercomputing Centre Institute for Advanced Simulation Forschungszentrum Jülich

Office: +49 2461 61-8913

Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Stefan Müller Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Dr. Ir. Pieter Jansens, Prof. Dr. Astrid Lambrecht, Prof. Dr. Frauke Melchior

-- Stefan Dvoretskii

Anno Kurth

6:43 p.m.

Dear Stefan,

I could not spot an error in the slurm or python script.

As Sandra already pointed out, it is advantageous to export some OMP variable in your jobfile. Besides OMP_NUM_THREADS, it is also good to set

* export OMP_PROC_BIND=TRUE -> pins the threads to certain cores, increases NEST performance * export OMP_DISPLAY_ENV=TRUE -> gives information on whether OMP was set up correctly * export OMP_DISPLAY_AFFINITY=TRUE -> gives information on whether OMP was set up correctly

See also here https://nest-simulator.readthedocs.io/en/v3.5/hpc/parallel_computing.html for the part of the NEST documentation regarding parallel computing, generally a good source of information.

Regarding your concrete problem, it is somewhat difficult to debug with the information available

What would help is the standard error generated by slurm when exporting the two OMP variables regarding the OMP configuration, i.e. OMP_DISPLAY_ENV=TRUE and OMP_DISPLAY_AFFINITY=TRUE

Best

Anno

On 06.08.23 13:24, Stefan Dvoretskii wrote:

...

Dear Sandra,

I have done all of what you describe, and still only one OpenMP thread is being used on each node. (I set all above listed variables to 16 threads)

Best regards,

On Sun, 6 Aug 2023 at 12:02, Sandra Diaz s.diaz@fz-juelich.de wrote:

Dear Stefan,

Have you tried setting the OMP variable in your batch script?
Something like:

export OMP_NUM_THREADS=<No. of threads>

You should do this before your srun call to the nest script and in
combination to setting the local_num_threads, as well as the right
value for -c (or --cpus-per-task).

All the best,
Sandra

On 05.08.23 13:05, Stefan Dvoretskii wrote:

...

Hello all,

I am trying to use as much parallelism as possible on my
computing grid. In this sense, I combine MPI+OpenMP. MPI
processes are spawned fine, but OpenMP threads always amount to
1, no matter that I set local_num_threads correctly
(https://nest-simulator.readthedocs.io/en/latest/hpc/slurm_script.html#set-local-num-threads-in-your-nest-script)
as well as all SLURM variables. I am sure I have compiled
nest with OpenMP too.
I use slightly modified izhikevich neurons with stdp synapses.
Can the model specifics be the cause of underutilizing available
threads? Or is there something else in the system that could be
the reason for that?

Best regards,
-- 
Stefan Dvoretskii

_______________________________________________
NEST Users mailing list --users@nest-simulator.org
To unsubscribe send an email tousers-leave@nest-simulator.org

-- 
Dr. Sandra Diaz Pier
Simulation & Data Lab Neuroscience
Jülich Supercomputing Centre
Institute for Advanced Simulation
Forschungszentrum Jülich

Office: +49 2461 61-8913

---------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Stefan Müller
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Dr. Ir. Pieter Jansens,
Prof. Dr. Astrid Lambrecht, Prof. Dr. Frauke Melchior
---------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------

-- Stefan Dvoretskii

NEST Users mailing list --users@nest-simulator.org To unsubscribe send an email tousers-leave@nest-simulator.org

Tony Lee

29 Aug 29 Aug

8:35 p.m.

Hi Stefan,

I'm also having some unexplainable performance issue with OpenMP on our cluster, and have yet to find a solution, but perhaps the behavior I've observed would provide you some ideas.

I run my Nest scripts on a single node (so no MPI) but have set the number of threads using `local_num_threads` in my script and also the `#SBATCH --ntasks=28`. Additionally, I've set the `MKL_THREADING_LAYER` environment variable per [GitHub issue #2573](https://github.com/nest/nest-simulator/issues/2573).

However, when monitoring the node via `htop`, I only sometimes see utilization close to 28x cores. The core utilization seems to be random from each SBATCH runs (sometimes 10x core, sometimes 6x cores) even when on an idle node of my cluster, thus the issue seems to be independent of how subscribed a node is.

The GH issue #2573 also points to [GH issue #2401](https://github.com/nest/nest-simulator/pull/2401) and [this documentation on threading](https://nest-simulator.readthedocs.io/en/stable/hpc/threading.html#table-of-...).

Stefan Dvoretskii

30 Aug 30 Aug

8 a.m.

Thanks for pointing to these references, Tony! I would eventually try the suggested method and let you know how well it works.

вт, 29 серп. 2023, 20:35 користувач Tony Lee anthonyylee@colorado.edu пише:

...

Hi Stefan,

I'm also having some unexplainable performance issue with OpenMP on our cluster, and have yet to find a solution, but perhaps the behavior I've observed would provide you some ideas.

I run my Nest scripts on a single node (so no MPI) but have set the number of threads using `local_num_threads` in my script and also the `#SBATCH --ntasks=28`. Additionally, I've set the `MKL_THREADING_LAYER` environment variable per [GitHub issue #2573]( https://github.com/nest/nest-simulator/issues/2573).

However, when monitoring the node via `htop`, I only sometimes see utilization close to 28x cores. The core utilization seems to be random from each SBATCH runs (sometimes 10x core, sometimes 6x cores) even when on an idle node of my cluster, thus the issue seems to be independent of how subscribed a node is.

The GH issue #2573 also points to [GH issue #2401]( https://github.com/nest/nest-simulator/pull/2401) and [this documentation on threading]( https://nest-simulator.readthedocs.io/en/stable/hpc/threading.html#table-of-... ). _______________________________________________ NEST Users mailing list -- users@nest-simulator.org To unsubscribe send an email to users-leave@nest-simulator.org

672

Age (days ago)

697

Last active (days ago)

users@nest-simulator.org

5 comments

4 participants

tags (0)

participants (4)

Anno Kurth
Sandra Diaz
Stefan Dvoretskii
Tony Lee