I want to run neuron simulations in parallel on multiple servers in OpenMPI.
I've installed Nest through conda: conda install -c conda-forge nest-simulator.
multi_test.pycode show as below:
from nest import *
SetKernelStatus({"total_num_virtual_procs": 4})
pg = Create("poisson_generator", params={"rate": 50000.0})
n = Create("iaf_psc_alpha", 4)
sd = Create("spike_detector", params={"to_file": True})
print("work01,My Rank is :{}".format(Rank()))
#print("Processes Number is :{}".format(NumProcesses())
#print("Processor Name is :{}".format(ProcessorName())
Connect(pg, [n[0]], syn_spec={'weight': 1000.0, 'delay': 1.0})
Connect([n[0]], [n[1]], syn_spec={'weight': 1000.0, 'delay': 1.0})
Connect([n[1]], [n[2]], syn_spec={'weight': 1000.0, 'delay': 1.0})
Connect([n[2]], [n[3]], syn_spec={'weight': 1000.0, 'delay': 1.0})
Connect(n, sd)
Simulate(100.0)
To Reproduce
Steps to reproduce the behavior:
(pynest) work@work01:~/xiejiadu/nest_multi_test$
/home/work/anaconda3/envs/pynest/bin/mpirun -np 2 -host work01:1,work02:1
/home/work/anaconda3/envs/pynest/bin/python3
/home/work/xiejiadu/nest_multi_test/multi_test.py
[INFO] [2020.11.23 3:57:6
/home/conda/feedstock_root/build_artifacts/nest-simulator_1580129123254/work/nestkernel/rng_manager.cpp:217
@ Network::create_rngs_] : Creating default RNGs
[INFO] [2020.11.23 3:57:6
/home/conda/feedstock_root/build_artifacts/nest-simulator_1580129123254/work/nestkernel/rng_manager.cpp:260
@ Network::create_grng_] : Creating new default global RNG
[INFO] [2020.11.23 3:57:6
/home/conda/feedstock_root/build_artifacts/nest-simulator_1580129123254/work/nestkernel/rng_manager.cpp:217
@ Network::create_rngs_] : Creating default RNGs
[INFO] [2020.11.23 3:57:6
/home/conda/feedstock_root/build_artifacts/nest-simulator_1580129123254/work/nestkernel/rng_manager.cpp:260
@ Network::create_grng_] : Creating new default global RNG
python3:
/home/conda/feedstock_root/build_artifacts/nest-simulator_1580129123254/work/sli/scanner.cc:581:
bool Scanner::operator()(Token&): Assertion `in->good()' failed.
[work02:95945] *** Process received signal ***
[work02:95945] Signal: Aborted (6)
[work02:95945] Signal code: (-6)
[work02:95945] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x7fc94a207730]
[work02:95945] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x10b)[0x7fc94a0697bb]
[work02:95945] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x121)[0x7fc94a054535]
[work02:95945] [ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x2240f)[0x7fc94a05440f]
[work02:95945] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x30102)[0x7fc94a062102]
[work02:95945] [ 5]
/home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/nest/../../../libsli.so(_ZN7ScannerclER5Token+0x1489)[0x7fc93cf3ceb9]
[work02:95945] [ 6]
/home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/nest/../../../libsli.so(_ZN6ParserclER5Token+0x49)[0x7fc93cf2f229]
[work02:95945] [ 7]
/home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/nest/../../../libsli.so(_ZNK14IparseFunction7executeEP14SLIInterpreter+0x96)[0x7fc93cf66666]
[work02:95945] [ 8]
/home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/nest/../../../libsli.so(+0x74193)[0x7fc93cf25193]
[work02:95945] [ 9]
/home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/nest/../../../libsli.so(_ZN14SLIInterpreter8execute_Em+0x222)[0x7fc93cf29a32]
[work02:95945] [10]
/home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/nest/../../../libsli.so(_ZN14SLIInterpreter7startupEv+0x27)[0x7fc93cf29e57]
[work02:95945] [11]
/home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/nest/../../../libnest.so(_Z11neststartupPiPPPcR14SLIInterpreterNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x1ea0)[0x7fc93d97ba40]
[work02:95945] [12]
/home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/nest/pynestkernel.so(+0x444dc)[0x7fc93dd774dc]
[work02:95945] [13]
/home/work/anaconda3/envs/pynest/bin/python3(+0x1b4924)[0x55e5ae205924]
[work02:95945] [14]
/home/work/anaconda3/envs/pynest/bin/python3(_PyEval_EvalFrameDefault+0x4bf)[0x55e5ae22dbcf]
[work02:95945] [15]
/home/work/anaconda3/envs/pynest/bin/python3(_PyFunction_Vectorcall+0x1b7)[0x55e5ae21a637]
[work02:95945] [16]
/home/work/anaconda3/envs/pynest/bin/python3(_PyEval_EvalFrameDefault+0x71a)[0x55e5ae22de2a]
[work02:95945] [17]
/home/work/anaconda3/envs/pynest/bin/python3(_PyEval_EvalCodeWithName+0x260)[0x55e5ae219490]
[work02:95945] [18]
/home/work/anaconda3/envs/pynest/bin/python3(+0x1f6bb9)[0x55e5ae247bb9]
[work02:95945] [19]
/home/work/anaconda3/envs/pynest/bin/python3(+0x13a23d)[0x55e5ae18b23d]
[work02:95945] [20]
/home/work/anaconda3/envs/pynest/bin/python3(PyVectorcall_Call+0x6f)[0x55e5ae1aef2f]
[work02:95945] [21]
/home/work/anaconda3/envs/pynest/bin/python3(_PyEval_EvalFrameDefault+0x5fc1)[0x55e5ae2336d1]
[work02:95945] [22]
/home/work/anaconda3/envs/pynest/bin/python3(_PyEval_EvalCodeWithName+0x260)[0x55e5ae219490]
[work02:95945] [23]
/home/work/anaconda3/envs/pynest/bin/python3(_PyFunction_Vectorcall+0x594)[0x55e5ae21aa14]
[work02:95945] [24]
/home/work/anaconda3/envs/pynest/bin/python3(_PyEval_EvalFrameDefault+0x4e73)[0x55e5ae232583]
[work02:95945] [25]
/home/work/anaconda3/envs/pynest/bin/python3(_PyFunction_Vectorcall+0x1b7)[0x55e5ae21a637]
[work02:95945] [26]
/home/work/anaconda3/envs/pynest/bin/python3(_PyEval_EvalFrameDefault+0x4bf)[0x55e5ae22dbcf]
[work02:95945] [27]
/home/work/anaconda3/envs/pynest/bin/python3(_PyFunction_Vectorcall+0x1b7)[0x55e5ae21a637]
[work02:95945] [28]
/home/work/anaconda3/envs/pynest/bin/python3(_PyEval_EvalFrameDefault+0x71a)[0x55e5ae22de2a]
[work02:95945] [29]
/home/work/anaconda3/envs/pynest/bin/python3(_PyFunction_Vectorcall+0x1b7)[0x55e5ae21a637]
[work02:95945] *** End of error message ***
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: Open MPI failed to TCP connect to a peer MPI process. This
should not happen.
Your Open MPI job may now hang or fail.
Local host: work01
PID: 114620
Message: connect() to 192.168.204.122:1024 failed
Error: Operation now in progress (115)
--------------------------------------------------------------------------
[work01:114615] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 2193
-- N E S T --
Copyright (C) 2004 The NEST Initiative
Version: nest-2.18.0
Built: Jan 27 2020 12:49:17
This program is provided AS IS and comes with
NO WARRANTY. See the file LICENSE for details.
Problems or suggestions?
Visit
https://www.nest-simulator.org
Type 'nest.help()' to find out more about NEST.
Nov 23 03:57:06 ModelManager::clear_models_ [Info]:
Models will be cleared and parameters reset.
Nov 23 03:57:06 Network::create_rngs_ [Info]:
Deleting existing random number generators
Nov 23 03:57:06 Network::create_rngs_ [Info]:
Creating default RNGs
Nov 23 03:57:06 Network::create_grng_ [Info]:
Creating new default global RNG
Nov 23 03:57:06 RecordingDevice::set_status [Info]:
Data will be recorded to file and to memory.
work01,My Rank is :0
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 95945 on node work02 exited on signal 6
(Aborted).
--------------------------------------------------------------------------
Command to run
/home/work/anaconda3/envs/pynest/bin/mpirun -np 2 -host work01:1,work02:1
/home/work/anaconda3/envs/pynest/bin/python3
/home/work/xiejiadu/nest_multi_test/multi_test.py
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop/Environment:
OS: Debain-10.0
Shell: conda4.8.3
Python-Version: Python 3.8.6
NEST-Version: nest-2.18
Installation:conda packet, with MPI
Best
jiaduxie