Our tools support multiple formats, including SONATA, but if possible we'd like to load the data in from memory, so that the format is of no concern/constraint to NEST. Coming from Python, numpy would probably be the best fit for contiguous C arrays I suppose.

On Tue, 20 Dec 2022 at 17:54, Sergio Solinas <smgsolinas@uniss.it> wrote:
Dear Hans and Robin,
I'm also interested in this thread.
Do you have updates on it?
I need this NEST upgrade to load massive connectivity matrices for a large hippocampus model.

Cordiali saluti,
Sergio MG Solinas
Dip. di Scienze Biomediche
Università di Sassari
Viale San Pietro 23
07100 - Sassari


Il giorno ven 14 ott 2022 alle ore 16:03 Hans Ekkehard Plesser <hans.ekkehard.plesser@nmbu.no> ha scritto:

 

Hi Robin,

 

Just to follow up on the discussion from the NEST Open VC last Monday.

 

Nicolai and Håkon (with early contributions by Stine) have a branch almost ready. We are currently waiting for JUSUF to come back up so we can run tests against the full Allen Institute mouse brain model (ca 480 GB of HDF5) with an improved prototype. The code needs some tidying up, but we could start discussing once we know that it works in principle.

 

To adapt to your plans, it would be very helpful to know what your data looks like. E.g., do you use NumPy arrays with certain columns to store all connections, HDF5, your own magic format?

 

Best,

Hans Ekkehard

 

-- 

 

Prof. Dr. Hans Ekkehard Plesser

Head, Department of Data Science

 

Faculty of Science and Technology

Norwegian University of Life Sciences

PO Box 5003, 1432 Aas, Norway

 

Phone +47 6723 1560

Email hans.ekkehard.plesser@nmbu.no

Home http://arken.nmbu.no/~plesser

 

 

 

From: Robin Gilbert De Schepper <robingilbert.deschepper@unipv.it>
Reply to: NEST User Mailing List <users@nest-simulator.org>
Date: Friday, 7 October 2022 at 16:33
To: NEST User Mailing List <users@nest-simulator.org>
Subject: [NEST Users] Re: Fastest way to transfer dense connectivity data in a parallel simulation?

 

What's the timeline for this? Is there any open source discussion or proposal for the implementation that I could read to understand how the problem is being tackled, and maybe find out if I can propose a low-effort interface decoupled from SONATA on disk? If there's something reading data from disk, I might as well be streaming that thing my data.

 

On Fri, 7 Oct 2022, 16:09 Hans Ekkehard Plesser, <hans.ekkehard.plesser@nmbu.no> wrote:

 

Hi Robin,

 

We are currently working with the Allen Institute to develop an efficient reader for large Sonata network specifications. We assume here that all connectivity is collected in HDF5 files, and expect significant performance gains if data is sorted by target neuron and the Sonata files provide "indices" tables. Would this help you?

 

Dividing network data according to a specific compute node configuration seems rather restrictive to me and right now there is no way to read such data per process.

 

Best,

Hans Ekkehard

 

 

 

-- 

 

Prof. Dr. Hans Ekkehard Plesser

Head, Department of Data Science

 

Faculty of Science and Technology

Norwegian University of Life Sciences

PO Box 5003, 1432 Aas, Norway

 

Phone +47 6723 1560

Email hans.ekkehard.plesser@nmbu.no

Home http://arken.nmbu.no/~plesser

 

 

 

From: Robin Gilbert De Schepper <robingilbert.deschepper@unipv.it>
Reply to: NEST User Mailing List <users@nest-simulator.org>
Date: Friday, 7 October 2022 at 15:41
To: NEST User Mailing List <users@nest-simulator.org>
Subject: [NEST Users] Fastest way to transfer dense connectivity data in a parallel simulation?

 

Hi all!

In the world of biophysical detail, it's commonplace that the connectome is generated with algorithms that specify connections as dense tabular data, with each row specifying a synaptic location on a cell pair (SONATA for example).

 

A) In NEST I can't really find the opportunity to fit this data into any of the connection rules: I want to specify pairwise connections from the multiset A to multiset B.

- Is this possible with `pairwise bernoulli`, or do the inputs have to be strict sets?
- The probability step is superfluous, can it be skipped?

B) Then there's the fact that NEST parallelizes transparently, but since this data was generated in parallel by tiling the biological volume, I have neatly fragmented data already available on each node in the distributed cluster. It would be such a waste to communicate all the data to each node, for NEST to communicate and distribute them back another way. 

 

The data is too big to allgather and fit into memory of any single node. Not only is this a lot of overhead to implement, but NEST will throw away all but `1 / Nnodes` of the data on each node again, leaving me with a reshuffled version of my starting data.

 

Is there a way to bypass the transparency and to imperatively declare the cells and connections on each machine?

 

--

Robin De Schepper, MSc (they/them)

Department of Brain and Behavioral Sciences

Unit of Neurophysiology

University of Pavia, Italy

Via Forlanini 6, 27100 Pavia - Italy

Tel: (+39) 038298-7607

 

Interested in large scale network modelling?
Discover our framework:
Error! Filename not specified.

_______________________________________________
NEST Users mailing list -- users@nest-simulator.org
To unsubscribe send an email to users-leave@nest-simulator.org

_______________________________________________
NEST Users mailing list -- users@nest-simulator.org
To unsubscribe send an email to users-leave@nest-simulator.org


--
Dona il  5x1000 all'Università degli Studi di Sassari
codice fiscale: 00196350904
_______________________________________________
NEST Users mailing list -- users@nest-simulator.org
To unsubscribe send an email to users-leave@nest-simulator.org


--
Robin De Schepper, MSc (they/them)
Department of Brain and Behavioral Sciences
Unit of Neurophysiology
University of Pavia, Italy
Via Forlanini 6, 27100 Pavia - Italy
Tel: (+39) 038298-7607
http://www-5.unipv.it/dangelo/