Dear NEST Community, 
 
while adapting my model to run on multiple MPI processes, I have been running into some problems connected with the usage of masks inside the connectivity dictionary for a 2D-spatially distributed population. 
 
You can find a minimal example containing further details in the attachments in the form of a .txt file. Please just change "txt" with "py" for execution. The following explanation also references to the example. 
 

My setup:
    - NEST version 3.4
    - Python version 3.10
    - Executed with mpirun inside a conda environment
        "mpirun -np 4 python3 minEx_4MPIprocesses_problem.py" 
    - System: Ubuntu 22.04
     
    - The same problem occurred when the model was executed on JURECA. 
 
 
Problem description:
    While using multiple MPI processes:
         1. Create circular mask. See line 119 
             Create a spatial 2D-Population of neurons "neurons".  See line 125-132
         2. Select some of the neurons as source neurons. See line 154
         3. Set up the connection dictionary with a mask inside. See line 157-163
         4. For each source neuron:  See line 173 
                 Connect the source neurons with the neurons population using the connection dictionary 
                 See line 183


Where does the Problem occur?:

    The error occurs at the point when the nest.Connect(…) call is executed in the loop at line 183 when each source neuron is connected.  


When does the Problem occur?:
 
    Its occurrance depends on whether or not a mask is used inside the connection-dict given to the nest.Connect(...) function.  
  
    If the mask is removed as in the conn-dict. in line 164-169, no error is produced in none of the used settings (however the result is not as desired). 
  
    If the mask is used, the way the problem shows itself depends on the number of MPI processes used, and on the setting of the number of neurons, extent, mask radius, number of source neurons and if edge wrap is used. 
        - For 1 and 2 MPI procs the model is running correctly, independent of the used conn-dict and setting. 
        - For 3 MPI procs either
            the model runs but the distance between connected neurons does not correspond to the given mask dimensions. In other words: The established connections are longer/shorter than the mask should allow. 
            Or execution leads to an error code output ("segmentation fault") from mpirun and job abortion. 
                The terminal output for an example run can be seen in the attached file "minEx_4MPIprocesses_problem_error_output_3_MPI" 
        - For 4 MPI procs either,
            the model runs correctly,
             or it leads to an error code output ("segmentation fault")  and the model not terminating. (When executed in the terminal the  keyboard command "str+c" is needed to stop the execution. The terminal output for an example run can be seen in the attached file "minEx_4MPIprocesses_problem_error_output_4_MPI" 
        

    If  the model is executed with the same setting on 3 and 4 MPI procs there are 3 possible combinations of above described problems. 
        1. The model runs and terminates on both, however the distances created on 3 MPI procs are wrong. 
        2. The model works with 4 MPI procs but creates the "segmentation fault" error on 3 MPI processes. 
        3. The model creates a "segmentation fault" error on both. 
 
Please keep in mind that whether an error occurs and in which form highly depends on the used setting. In the minimal example I provide different settings which represent the above described cases. However, it might not cover everything that can occur. 
 
Workaround:
For my own use-case I found a rather computing-time consuming workaround.  
It involves applying nest.SelectNodesByMask() on every source neuron, and from the resulting set choosing the targets using the desired probability. 
This requires multiple additional loops and also communicating the position data of every neuron to every MPI process in the beginning. 
Using this approach, the connection distances seem to be correct and no error occurs while executing the model. However, while writing this I'm questioning if I actually tested it enough. So there might be some not yet discovered problems. 
 

Is there something that I overlooked or approached wrong when using masks in the connectivity dict on multiple MPI processes?  
 
Thanks in advance!  
 
     
Best, 
Miriam Kempter



------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Jülich GmbH
52425 Jülich
Sitz der Gesellschaft: Jülich
Eingetragen im Handelsregister des Amtsgerichts Düren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Stefan Müller
Geschäftsführung: Prof. Dr. Astrid Lambrecht (Vorsitzende),
Karsten Beneke (stellv. Vorsitzender), Dr. Ir. Pieter Jansens
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------