Filecoin Large Data Onboarding Guide

This operational guide walkthroughs a checklist of steps to onboard large sized datasets ( >100 TiB in size including all replicas), into the Filecoin network. This guide is suitable for data owners that are seeking an introduction to the onboarding process.
The end-to-end data flow is covered at an introductory level, from planning, through to retrieval testing. This guide is intended as a top-level guide that summarizes each step, while providing references to more detailed documentation.
This guide provides a walkthrough usage of the primary client-side tool set, namely Singularity, supported by Boost, and Lotus client.

Data Onboarding Checklist

  1. 1.
    Understand the large data onboarding reference architecture.
  2. 2.
    Select datasets for onboarding, estimate sizing.
  3. 3.
    Allocate storage funding with DataCap, or FIL tokens.
  4. 4.
    Select Storage Providers
  5. 5.
    Setup Storage Gateway environment
  6. 6.
    Prepare Data
  7. 7.
    Replicate Data to SPs, Propose Storage Deals
  8. 8.
    Retrieve Data
  9. 9.
    Plan next steps

Looking for alternative tools suitable for smaller datasets?

If your organization's dataset is smaller (<100TiB), you may also consider alternatives that may provide simpler and accelerated onboarding paths, such as Chainsafe.storage, Estuary.tech, web3.storage, NFT.storage, and others listed at https://dataonboarding.filecoin.io/​