⚒️Setup Storage Gateway environment

Software stack

The storage gateway server should be installed with:

An Ubuntu Linux instance is recommended. Singularity, Lotus node, and Boost packages should be be built and installed from source. Web server and IPFS daemon can be installed from binaries.

[TODO link to Ubuntu build script]

Architectural decisions

Data transfer plan and Gateway placement

Network data transfer is often a limiting factor when onboarding PiB-scale large data sets. Determine the optimal data replication approach for the following paths:

  • Source dataset replicated from the data owner to the storage gateway,

  • Prepared dataset replicated from the storage gateway to each participating SP.

Compare online network data transfer options and offline physical media transport options, consider feasibility, cost, transfer duration, etc. The data transfer plan may also affect the optimal placement the storage gateway. Compare cloud vs. on-premise hosting. Consider the physical locations of the source dataset and the destination SPs.

Storage Gateway sizing

Gateway storage should be sized for storing the source dataset, and for hosting of the prepared CAR files. A general guideline is to size local storage for 2x of the source dataset size.

Optimization

Data preparation tasks are IO-bound, so the storage gateway will benefit from fast local storage, such as iSCSI or NVMe storage interfaces.

Singularity can also be configured to specify a number of workers for concurrent data preparation. See deal_preparation_worker.num_workers in the Singularity config file if required.

****

Last updated