Select Datasets
Define your data storage requirements and objectives
Last updated
Define your data storage requirements and objectives
Last updated
When selecting the dataset to be stored into the Filecoin network, Data Owners should consider: data classification, use-cases, value of the data, and desired improvements over existing storage solution.
To be suitable for the onboarding solution described in this guide, the total sizing of a "large dataset" multiplied to include all replicas, should be expected to exceed 100 TiB.
Determine the size of the source dataset.
Determine how many replicas are needed. Multiple replicas across multiple SPs will increase data availability, durability, and fault-tolerance. To calculate the total size, multiply the source dataset by the number of replicas.
Where possible, it is recommended to distribute replicas to multiple SPs across geographic regions. Consider any data residency regulations that may limit replicas to within a country.
Determine the duration the dataset needs to be retained for. The current maximum duration of Filecoin storage deals is 540 days. After this period, deals can be renewed by making new storage deals and re-sealing.
If the selected dataset is classified as public, it can be stored on the Filecoin network in clear and accessible the general public.
Private datasets will require the Data Owner to implement encryption for data privacy. Encryption of the dataset by the Data Owner prior to storage enables confidentiality over the public Filecoin storage network. The Data Owner selects an encryption method, encrypts datasets prior to packaging and storage, and decrypts the datasets after retrieval. Data Owners are responsible for secret management and key management.
Consider the "temperature" of data. At the current state of the Filecoin network, cold data storage and infrequently-accessed archive data are more suitable storage use-cases, than warm or hot datasets.
Consider what portion of the dataset will be requested for each retrieval, e.g. individual files, partial retrieval, or full retrieval. Consider whether a special offline data transfer arrangement should be negotiated with SPs.
While the current state of the Filecoin client implementations do support retrievals, however the system of SP retrieval incentives are still evolving so you should check with your selected SPs about retrievability service levels, and to ensure they can support your expected data retrieval pattern.
Objectives of your data onboarding project on Filecoin can include, e.g. reducing storage costs, improvement on existing storage infrastructure, improving data availability and data durability, increasing storage decentralization, enabling Web3 use-cases, etc.
Fast and low-latency retrieval over the Filecoin network is currently a work-in-progress, and beyond the scope of this guide. Refer to for more info.