Dear Jan,
Thank you very much for your very detailed analysis. We will try to reproduce this as soon as possible.
Three questions: - You only use threads, no MPI parallelization, correct? - Your machine has >= 32 cores? - Do the neurons receive the expected input currents, especially the same currents independent of number of threads?
Best, Hans Ekkehard
--
Prof. Dr. Hans Ekkehard Plesser Head, Department of Data Science
Faculty of Science and Technology Norwegian University of Life Sciences PO Box 5003, 1432 Aas, Norway
Phone +47 6723 1560 Email hans.ekkehard.plesser@nmbu.nomailto:hans.ekkehard.plesser@nmbu.no Home http://arken.nmbu.no/~plesser
On 28/04/2022, 16:22, "Jan Střeleček" <strelda@protonmail.commailto:strelda@protonmail.com> wrote:
Dear NEST developers,
In our group, we're working on a model of the primary visual cortex and use step_current_source generators to simulate the input current of the LGN neurons. We noticed that the simulation time of our model was very sensitive to the number of step_current_sources. When trying to narrow down the cause, we found out that this might be due to an issue with the parallelization of the step_current_source_generators. The resulting simple system in which the problem can be observed is attached below, simple_example.py. It essentially creates NS step_current_generators and injects them into NL neurons with fixed indegree. The iaf_cond_exp neuron model is used here. The increment in the number of step_current sources does not benefit from a multithreading performance boost as one would expect. This is compared to the performance boost for the number of neurons; see the technical details below. Our estimated guess is that the difference between 1 and 32 threads is 10 to 20 times slower than the parallelization suggests.
Technical details:
The relative slowdown due to the parallelization of step_current_sources was measured using linear regression over
simulation time = a NL + b NS.
See slowdown_example.png.
The ratio b/a was then calculated. This ratio was then measured in dependence on the number of threads. A bigger difference between the ratio for 1 thread and 32 threads means a greater problem in parallelization in step_current_generators.
Some additional results: · interval_dependence.png - the slowdown does not depend on amplitude_times in the step_current_source function · indegree_dependence.png - the slowdown depends on the indegree of nest.Connect(source, neurons). Specifically, the slowdown is worse for low indegree values. This shows the slowdown depends on the number of step_current_sources created, not on the injections themselves.
Are you aware of some lack of parallelization of the step_current_source or current the injection itself? If so, are there any plans for improving it?
best regards,
Jan Střeleček