📖
Data Onboarding Docs
  • 🌐Overview
  • Plan
    • 📐Reference Architecture
    • 🔎Select Datasets
    • 🎟️Prepare storage funding with DataCap or FIL
    • 🏢Select Storage Providers
  • Execute
    • ⚒️Setup Storage Gateway environment
    • 📦Prepare data
    • 📤Replicate data to SPs and propose storage deals
    • 📥Retrieve data
    • ⏭️Plan next steps
Powered by GitBook
On this page
  • Software stack
  • Architectural decisions
  • Data transfer plan and Gateway placement
  • Storage Gateway sizing
  • Optimization
  • ****
  1. Execute

Setup Storage Gateway environment

PreviousSelect Storage ProvidersNextPrepare data

Last updated 2 years ago

Software stack

The storage gateway server should be installed with:

  • - petabyte-scale data onboarding and retrieval client tool.

    • Install Singularity:

  • - lite client for interacting with the Filecoin chain. Used for legacy deals.

    • Install Lotus and configure a lotus-lite node:

  • - new lotus markets client for v1.2 deals with upgraded SPs.

    • Install Boost client:

  • Web Server - hosting of CAR files for online data transfer to SPs.

    • E.g. Nginx.

  • daemon, stores the Singularity dataset index to support retrievals.

An Ubuntu Linux instance is recommended. Singularity, Lotus node, and Boost packages should be be built and installed from source. Web server and IPFS daemon can be installed from binaries.

[TODO link to Ubuntu build script]

Architectural decisions

Data transfer plan and Gateway placement

Network data transfer is often a limiting factor when onboarding PiB-scale large data sets. Determine the optimal data replication approach for the following paths:

  • Source dataset replicated from the data owner to the storage gateway,

  • Prepared dataset replicated from the storage gateway to each participating SP.

Compare online network data transfer options and offline physical media transport options, consider feasibility, cost, transfer duration, etc. The data transfer plan may also affect the optimal placement the storage gateway. Compare cloud vs. on-premise hosting. Consider the physical locations of the source dataset and the destination SPs.

Storage Gateway sizing

Gateway storage should be sized for storing the source dataset, and for hosting of the prepared CAR files. A general guideline is to size local storage for 2x of the source dataset size.

Optimization

Data preparation tasks are IO-bound, so the storage gateway will benefit from fast local storage, such as iSCSI or NVMe storage interfaces.

****

Singularity can also be configured to specify a number of workers for concurrent data preparation. See deal_preparation_worker.num_workers in the if required.

⚒️
Singularity
https://boost.filecoin.io/getting-started
Lotus Lite node
https://lotus.filecoin.io/lotus/install/lotus-lite/
Boost
https://boost.filecoin.io/getting-started
IPFS
https://docs.ipfs.tech/install/command-line/
Singularity config file