Hi all!
What I'm doing: I've used v2.20.2 on HPC (bwForCluster NEMO) for my large-scale simulations involving structural plasticity. Now I'm trying to move to v3.8.
How am I doing this? I built NESTv3.8 on my HPC workspace following the cmake instructions mentioned at: https://nest-simulator.readthedocs.io/en/v3.8/installation/cmake_options.htm...
Here's an overview of my build and installation commands:
``` source /<some_path>/NESTv3.8/bin/activate # activate a fresh python venv (v3.9.7) [optional]
module load mpi/openmpi/4.0-gnu-9.2
# cmake version 3.30.4 cmake --debug-find \ -DMPI_C_COMPILER=$(which mpicc) \ -DMPI_CXX_COMPILER=$(which mpicxx) \ -DMPI_HOME=$(which mpirun) \ -Dwith-mpi=ON \ -Dwith-openmp=ON \ -DCYTHON_EXECUTABLE=/<some_path>/intel/oneapi/2022.1/intelpython/latest/bin/cython \ -DCMAKE_INSTALL_PREFIX:PATH=/<some_path>/nest-simulator-3.8-build \ /<path_to_extracted_tar>/nest-simulator-3.8/
# if I don't specify the Cython path, cmake picks an outdated (uncompatible) cython version. I did not try the -Dcythonize-pynest=OFF option because I don't know how to build PyNEST from a pre-cythonized pynestkernel.pyx.
make make install make installcheck ```
Once this finishes, I do the usual `source`-ing of `nest_vars.sh`. I also add the NEST path to my python venv's `site-packages`. This has always worked for all NEST versions I ever installed.
`installcheck` finishes with 1 error. See below:
``` THE NEST TESTSUITE DISCOVERED PROBLEMS The following tests failed | regressiontests.issue-1703.py
Please report test failures by creating an issue at https://github.com/nest/nest-simulator/issues
---------------------------------------------------------------------------
make[3]: *** [CMakeFiles/installcheck] Error 1 make[2]: *** [CMakeFiles/installcheck.dir/all] Error 2 make[1]: *** [CMakeFiles/installcheck.dir/rule] Error 2 make: *** [installcheck] Error 2 ``` THE WARNING:
Ignoring this, if I proceed with pyNEST in the venv by `import nest`, I get the following warning: ``` -------------------------------------------------------------------------- WARNING: There is at least non-excluded one OpenFabrics device found, but there are no active ports detected (or Open MPI was unable to use them). This is most certainly not what you wanted. Check your cables, subnet manager configuration, etc. The openib BTL will be ignored for this job.
Local host: n4669 -------------------------------------------------------------------------- n4669.nemo.privat.74030PSM2 no hfi units are active (err=23) -------------------------------------------------------------------------- Open MPI failed an OFI Libfabric library call (fi_endpoint). This is highly unusual; your job may behave unpredictably (and/or abort) after this.
Local host: n4669 Location: mtl_ofi_component.c:627 Error: Invalid argument (22) --------------------------------------------------------------------------
-- N E S T -- Copyright (C) 2004 The NEST Initiative
Version: 3.8.0 Built: Oct 10 2024 16:21:41
This program is provided AS IS and comes with NO WARRANTY. See the file LICENSE for details.
Problems or suggestions? Visit https://www.nest-simulator.org
Type 'nest.help()' to find out more about NEST. ```
THE CRASH:
If I ignore this warning and proceed to submit a job, the execution eventually crashes with exit code: 134. If you'd like to see the crash dump, check this MD file on GitLab.: https://gitlab.rz.uni-freiburg.de/as2013/mpi-hpc-nestv3.8.git
Any idea what's going wrong here?
If I follow the same build and installation steps on my local machine (other than specifying the Cython path in cmake; because no need). Neither do I get any warnings nor does any simulation crash.
I'm guessing this has to do with the build and installation on the cluster.
I'd appreciate any input. Thanks!
Best, Ady
Hello, I would like to kindly bring this issue to your attention again. Could you please assist me in understanding what might be going wrong?
Hi! You can try to use a virtual environment to control which software is detected. I'm writing these steps on the top of my head:
python -m venv venv ./venv/bin/activate pip install cmake cython
this should ensure that you're using up to date versions of cmake and cython
make sure to delete your build directory, and then try the build again from the start. Hope it helps
On Fri, Oct 18, 2024, 17:00 Ady Sharma aadhar.sharma@bcf.uni-freiburg.de wrote:
Hello, I would like to kindly bring this issue to your attention again. Could you please assist me in understanding what might be going wrong? _______________________________________________ NEST Users mailing list -- users@nest-simulator.org To unsubscribe send an email to users-leave@nest-simulator.org
Hello Robin, Thanks for you reply!
I'm already using a Python virtual environment with the paths to CMake and Cython correctly specified. I retried your suggestion just in case, but, sadly, it produces the same issue.
Could you suggest what else might be going wrong here? Thanks!
Best, Ady
Try again WITHOUT specifying any paths yourself: If your venv is active, they should be the first ones to be detected. could you compare the versions of cython in the build log with that given by pip show cython?
On Fri, Oct 18, 2024, 19:20 Ady Sharma aadhar.sharma@bcf.uni-freiburg.de wrote:
Hello Robin, Thanks for you reply!
I'm already using a Python virtual environment with the paths to CMake and Cython correctly specified. I retried your suggestion just in case, but, sadly, it produces the same issue.
Could you suggest what else might be going wrong here? Thanks!
Best, Ady _______________________________________________ NEST Users mailing list -- users@nest-simulator.org To unsubscribe send an email to users-leave@nest-simulator.org
On top of that, please do not specify any DCMAKE_INSTALL_PREFIX. Using that toggles on an "expert user" build. If you omit it, nest will ne automatically installed inside of your current Python env, and the chances of successful detection are considerably higher
On Fri, Oct 18, 2024, 19:22 Robin De Schepper robin.deschepper93@gmail.com wrote:
Try again WITHOUT specifying any paths yourself: If your venv is active, they should be the first ones to be detected. could you compare the versions of cython in the build log with that given by pip show cython?
On Fri, Oct 18, 2024, 19:20 Ady Sharma aadhar.sharma@bcf.uni-freiburg.de wrote:
Hello Robin, Thanks for you reply!
I'm already using a Python virtual environment with the paths to CMake and Cython correctly specified. I retried your suggestion just in case, but, sadly, it produces the same issue.
Could you suggest what else might be going wrong here? Thanks!
Best, Ady _______________________________________________ NEST Users mailing list -- users@nest-simulator.org To unsubscribe send an email to users-leave@nest-simulator.org
Oh, (sorry for the triple post everyone) omitting it will also make it that you do not need to source any nest vars anymore either. activate the python env, and the nest that you installed for that env will be active as well
On Fri, Oct 18, 2024, 19:23 Robin De Schepper robin.deschepper93@gmail.com wrote:
On top of that, please do not specify any DCMAKE_INSTALL_PREFIX. Using that toggles on an "expert user" build. If you omit it, nest will ne automatically installed inside of your current Python env, and the chances of successful detection are considerably higher
On Fri, Oct 18, 2024, 19:22 Robin De Schepper < robin.deschepper93@gmail.com> wrote:
Try again WITHOUT specifying any paths yourself: If your venv is active, they should be the first ones to be detected. could you compare the versions of cython in the build log with that given by pip show cython?
On Fri, Oct 18, 2024, 19:20 Ady Sharma aadhar.sharma@bcf.uni-freiburg.de wrote:
Hello Robin, Thanks for you reply!
I'm already using a Python virtual environment with the paths to CMake and Cython correctly specified. I retried your suggestion just in case, but, sadly, it produces the same issue.
Could you suggest what else might be going wrong here? Thanks!
Best, Ady _______________________________________________ NEST Users mailing list -- users@nest-simulator.org To unsubscribe send an email to users-leave@nest-simulator.org
Thanks for the comments, especially about not specifying the path for DCMAKE_INSTALL_PREFIX---I didn't know that. As you said, I did not need to source nest vars or specify NEST's path in venv's site-packages, cool!
Robin: "could you compare the versions of cython in the build log with that given by pip show cython?" Yes, I did. I had a problem before and that's why I had to specify the full path for cython. But that is now resolved and cmake picks up the current version of python without specifying the full path. Currently, Cmake --version = 3.30.4, Cython is 3.0.11, and Python is 3.9.7.
Robin: "Try again WITHOUT specifying any paths yourself: If your venv is active, they should be the first ones to be detected" Just tried this (with no DCMAKE_INSTALL_PREFIX). The HPC-based simulation crashes with the same issue.
``` [<node-id>.nemo.privat:03381] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages python: <path-to-extracted-tarball>/nest-simulator-3.8/nestkernel/connector_base.h:405: size_t nest::Connector<ConnectionT>::send(size_t, size_t, const std::vectornest::ConnectorModel*&, nest::Event&) [with ConnectionT = nest::tsodyks2_synapsenest::TargetIdentifierPtrRport; size_t = long unsigned int]: Assertion `lcid + lcid_offset < C_.size()' failed. ``` I don't fully get why NEST here refers to code in <path-to-extracted-tarball>. This is not new, it also shows up in the previous crash dump. But this suggests that there's perhaps some build issue with cmake, right?
Ok, is this error raised during install check / nest test, or is this raised during one of your own simulations?
I think one of the possible causes of that error is that you have too many neurons on a single node. Could you start by trying a small proof of concept simulation, perhaps one of the nest examples, or something similarly simple.
On Sat, Oct 19, 2024, 01:20 Ady Sharma aadhar.sharma@bcf.uni-freiburg.de wrote:
Thanks for the comments, especially about not specifying the path for DCMAKE_INSTALL_PREFIX---I didn't know that. As you said, I did not need to source nest vars or specify NEST's path in venv's site-packages, cool!
Robin: "could you compare the versions of cython in the build log with that given by pip show cython?" Yes, I did. I had a problem before and that's why I had to specify the full path for cython. But that is now resolved and cmake picks up the current version of python without specifying the full path. Currently, Cmake --version = 3.30.4, Cython is 3.0.11, and Python is 3.9.7.
Robin: "Try again WITHOUT specifying any paths yourself: If your venv is active, they should be the first ones to be detected" Just tried this (with no DCMAKE_INSTALL_PREFIX). The HPC-based simulation crashes with the same issue.
[<node-id>.nemo.privat:03381] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages python: <path-to-extracted-tarball>/nest-simulator-3.8/nestkernel/connector_base.h:405: size_t nest::Connector<ConnectionT>::send(size_t, size_t, const std::vector<nest::ConnectorModel*>&, nest::Event&) [with ConnectionT = nest::tsodyks2_synapse<nest::TargetIdentifierPtrRport>; size_t = long unsigned int]: Assertion `lcid + lcid_offset < C_.size()' failed.
I don't fully get why NEST here refers to code in <path-to-extracted-tarball>. This is not new, it also shows up in the previous crash dump. But this suggests that there's perhaps some build issue with cmake, right? _______________________________________________ NEST Users mailing list -- users@nest-simulator.org To unsubscribe send an email to users-leave@nest-simulator.org
Robin: "Ok, is this error raised during install check / nest test, or is this raised during one of your own simulations?" No errors are raised during install check. nest test creates one failure for regression test. The crash always occurs during my simulations.
Robin: "I think one of the possible causes of that error is that you have too many neurons on a single node. Could you start by trying a small proof of concept simulation, perhaps one of the nest examples, or something similarly simple." Done. I tried reducing the number of neurons from ~5000 to ~1000 in the test network. The simulation lasted two more iterations, but ultimately crashed with the same error. I also tried this with a NEST example (https://nest-simulator.readthedocs.io/en/stable/auto_examples/structural_pla...) modified for MPI, and it too crashes with the same error. I'm still working on more tests, and will post here once I have more information.