Dear NEST developers,
In our group, we're working on a model of the primary visual cortex and use step_current_source generators to simulate the input current of the LGN neurons. We noticed that the simulation time of our model was very sensitive to the number of step_current_sources. When trying to narrow down the cause, we found out that this might be due to an issue with the parallelization of the step_current_source_generators. The resulting simple system in which the problem can be observed is attached below, simple_example.py. It essentially creates NSstep_current_generators and injects them into NLneurons with fixed indegree. The iaf_cond_exp neuron model is used here. The increment in the number of step_current sources does not benefit from a multithreading performance boost as one would expect. This is compared to the performance boost for the number of neurons; see the technical details below. Our estimated guess is that the difference between 1 and 32 threads is 10 to 20 times slower than the parallelization suggests.
Technical details:
The relative slowdown due to the parallelization of step_current_sources was measured using linear regression over
simulation time = a NL + b NS.
See slowdown_example.png.
The ratio b/a was then calculated. This ratio was then measured in dependence on the number of threads. A bigger difference between the ratio for 1 thread and 32 threads means a greater problem in parallelization in step_current_generators.
Some additional results:
- interval_dependence.png - the slowdown does not depend on amplitude_times in the step_current_source function - indegree_dependence.png - the slowdown depends on the indegree of nest.Connect(source, neurons). Specifically, the slowdown is worse for low indegree values. This shows the slowdown depends on the number of step_current_sources created, not on the injections themselves.
Are you aware of some lack of parallelization of the step_current_source or current the injection itself? If so, are there any plans for improving it?
best regards,
Jan Střeleček
Dear Jan,
Thank you very much for your very detailed analysis. We will try to reproduce this as soon as possible.
Three questions: - You only use threads, no MPI parallelization, correct? - Your machine has >= 32 cores? - Do the neurons receive the expected input currents, especially the same currents independent of number of threads?
Best, Hans Ekkehard
--
Prof. Dr. Hans Ekkehard Plesser Head, Department of Data Science
Faculty of Science and Technology Norwegian University of Life Sciences PO Box 5003, 1432 Aas, Norway
Phone +47 6723 1560 Email hans.ekkehard.plesser@nmbu.nomailto:hans.ekkehard.plesser@nmbu.no Home http://arken.nmbu.no/~plesser
On 28/04/2022, 16:22, "Jan Střeleček" <strelda@protonmail.commailto:strelda@protonmail.com> wrote:
Dear NEST developers,
In our group, we're working on a model of the primary visual cortex and use step_current_source generators to simulate the input current of the LGN neurons. We noticed that the simulation time of our model was very sensitive to the number of step_current_sources. When trying to narrow down the cause, we found out that this might be due to an issue with the parallelization of the step_current_source_generators. The resulting simple system in which the problem can be observed is attached below, simple_example.py. It essentially creates NS step_current_generators and injects them into NL neurons with fixed indegree. The iaf_cond_exp neuron model is used here. The increment in the number of step_current sources does not benefit from a multithreading performance boost as one would expect. This is compared to the performance boost for the number of neurons; see the technical details below. Our estimated guess is that the difference between 1 and 32 threads is 10 to 20 times slower than the parallelization suggests.
Technical details:
The relative slowdown due to the parallelization of step_current_sources was measured using linear regression over
simulation time = a NL + b NS.
See slowdown_example.png.
The ratio b/a was then calculated. This ratio was then measured in dependence on the number of threads. A bigger difference between the ratio for 1 thread and 32 threads means a greater problem in parallelization in step_current_generators.
Some additional results: · interval_dependence.png - the slowdown does not depend on amplitude_times in the step_current_source function · indegree_dependence.png - the slowdown depends on the indegree of nest.Connect(source, neurons). Specifically, the slowdown is worse for low indegree values. This shows the slowdown depends on the number of step_current_sources created, not on the injections themselves.
Are you aware of some lack of parallelization of the step_current_source or current the injection itself? If so, are there any plans for improving it?
best regards,
Jan Střeleček
Thanks for the quick response,
- we use only multithreading here, no MPI - we are able to use more than 32 threads, - we did not check, if the neurons receive the expected input currents. We might look into it in the future, if it will be important.
Sincerely, Jan Střeleček
On Thu, Apr 28, 2022 at 16:43, Hans Ekkehard Plesser hans.ekkehard.plesser@nmbu.no wrote:
Dear Jan,
Thank you very much for your very detailed analysis. We will try to reproduce this as soon as possible.
Three questions:
You only use threads, no MPI parallelization, correct?
Your machine has >= 32 cores?
Do the neurons receive the expected input currents, especially the same currents independent of number of threads?
Best,
Hans Ekkehard
--
Prof. Dr. Hans Ekkehard Plesser
Head, Department of Data Science
Faculty of Science and Technology
Norwegian University of Life Sciences
PO Box 5003, 1432 Aas, Norway
Phone +47 6723 1560
Email hans.ekkehard.plesser@nmbu.no
Home http://arken.nmbu.no/~plesser
On 28/04/2022, 16:22, "Jan Střeleček" strelda@protonmail.com wrote:
Dear NEST developers,
In our group, we're working on a model of the primary visual cortex and use step_current_source generators to simulate the input current of the LGN neurons. We noticed that the simulation time of our model was very sensitive to the number of step_current_sources. When trying to narrow down the cause, we found out that this might be due to an issue with the parallelization of the step_current_source_generators. The resulting simple system in which the problem can be observed is attached below, simple_example.py. It essentially creates NSstep_current_generators and injects them into NLneurons with fixed indegree. The iaf_cond_exp neuron model is used here. The increment in the number of step_current sources does not benefit from a multithreading performance boost as one would expect. This is compared to the performance boost for the number of neurons; see the technical details below. Our estimated guess is that the difference between 1 and 32 threads is 10 to 20 times slower than the parallelization suggests.
Technical details:
The relative slowdown due to the parallelization of step_current_sources was measured using linear regression over
simulation time = a NL + b NS.
See slowdown_example.png.
The ratio b/a was then calculated. This ratio was then measured in dependence on the number of threads. A bigger difference between the ratio for 1 thread and 32 threads means a greater problem in parallelization in step_current_generators.
Some additional results:
·interval_dependence.png - the slowdown does not depend on amplitude_times in the step_current_source function
·indegree_dependence.png - the slowdown depends on the indegree of nest.Connect(source, neurons). Specifically, the slowdown is worse for low indegree values. This shows the slowdown depends on the number of step_current_sources created, not on the injections themselves.
Are you aware of some lack of parallelization of the step_current_source or current the injection itself? If so, are there any plans for improving it?
best regards,
Jan Střeleček