Proof of Data Segment Inclusion (PoDSI)
Data uploaded to web3.storage is aggregated together with data from other users of the service. When an aggregate is big enough, it is stored with multiple Filecoin Storage Providers.
Storage Providers have minimum size requirements for data storage. Therefore it is necessary to aggregate data to satisfy the requirements. The minimum is typically between 16 and 32GB.
The web3.storage service uses Proof of Data Segment Inclusion (PoDSI), which allows clients to verify the correct aggregation of their data and prove this fact to third parties.
Data uploaded to web3.storage is packed up and sent as CAR files. We call these shards, and each one is referenced by it's shard CID.
# example CAR shard CID
The IPLD codec for a CAR shard CID is
0x0202. You can inspect this CID on cid.ipfs.tech (opens in a new tab). It is not your content root CID - that's a different CID that refers to the root node of a DAG that has been built from your data. The shard CID is a hash of the (CAR) file your DAG has been packed into.
Your content may be split between 1 or more shards.
Piece CIDs are the primary means of referencing data stored in a sector of a Filecoin Storage Provider. Each piece CID is loosely equivalent to a corresponding shard CID.
The piece CIDs used in web3.storage are v2 piece CIDs since these also encode tree height information. On chain and in various chain explorers online you may see v1 piece CIDs displayed. You can convert from v2 to v1 but not v1 to v2, unless you also know the tree height.
# example piece v2 CID
The IPLD codec for the above multihash is
0x1011 (fr32-sha2-256-trunc254-padded-binary-tree). You can inspect this CID on cid.ipfs.tech (opens in a new tab).
PoDSI enables clients using data aggregation services like web3.storage to verify the correct aggregation of their data and allow proving of this fact to third parties.
Put simply, it is a proof that a smaller piece (a segment) has been included in a larger piece (an aggregate).
# example merkle proof showing path from aggregate (piece) CID to segment (piece) CID
The above example does not visualize all the information that a PoDSI contains, just the direct path from aggregate (piece) CID to segment (piece) CID.
A data aggregation proof is a PoDSI, plus information that ties an aggregate piece to a Filecoin Storage Provider. At time of writing this one or more Deal IDs.
The web3.storage aggregation pipeline is fully verifiable thanks to UCANs. Your piece can be tracked through the pipeline via signed UCAN receipts.
There are 4 roles in the aggregation pipeline:
- Storefront - facilitates data storage services to applications and users, getting the requested data stored into Filecoin deals asynchronously.
- Aggregator - aggregates smaller data (Filecoin Pieces) into a larger piece that can effectively be stored with a Filecoin Storage Provider.
- Dealer - arranges deals with Filecoin Storage Probviders for the aggregates.
- Deal Tracker - follows the filecoin chain to keep track of successful deals.
Roughly speaking, a piece progresses through the pipeline via the following stages:
The client submits a piece to the storefront (web3.storage) for aggregation and storage in Filecoin storage providers. The receipt for this invocation contains two links for async tasks:
filecoin/submit- allows the client to continue following the receipt chain through the aggregation pipeline. It is executed after the storefront has verified the piece CID corresponds to the shard CID.
filecoin/accept- a "short cut" to the end of the pipeline, where the data aggregation proof will eventually become available. It is executed when the dealer has successfully stored an aggregate containing the submitted piece in one or more Filecoin Storage Providers.
The storefront issues a receipt for
filecoin/submit to indicate it has verified the offered piece and submitted it to the pipeline. The receipt for this invocation contains a link for an async task
piece/offer, which is executed when the storefront offers the piece to an aggregator.
The aggregator issues a receipt for
piece/offer when the storefront offers a piece to be aggregated. The receipt contains a link for an async task
piece/accept, which is executed when the piece has been included in an aggregate.
The aggregator issues
piece/accept receipts when an aggregate is big enough. Every piece in the aggregate is issued a receipt which includes a Proof of Data Segment Inclusion (PoDSI). The receipt contains a link for an async task
aggregate/offer, which is executed when the aggregator offers the aggregate to a dealer.
The dealer issues an
aggregate/offer receipt when the aggregator offers a piece to be stored by Filecoin Storeage Providers. The receipt contains a link for an async task
aggregate/accept, which is executed when the aggregate has been stored by at least one Filecoin Storage Provider.
The dealer issues an
aggregate/accept receipt when an aggregate has been stored by at least one Filecoin Storage Provider. The receipt includes information that ties an aggregate to a Storage Provider which is used in the next step to create a Data Aggregation Proof.
The storefront periodically checks for an
aggregate/accept receipt for offered aggregates. When an aggregate is accepted, the storefront issues
filecoin/accept receipts for each piece in the aggregate. The receipt includes a Data Aggregation Proof. This is the end of the pipeline.