CIP-9: Runner-Backed Storage

Status: Draft Type: Standards Track Category: Core Created: 2026-03-04

1. Abstract

This proposal defines CBFS-backed runner storage — a system for provisioning, addressing, accessing, and billing off-chain storage volumes that are associated with a Cowboy account and made available to Runners during job execution. It extends CIP-2 (Verifiable Off-Chain Compute) by giving Runners persistent, addressable storage that survives beyond a single job invocation. CBFS (the Cowboy File System) is the storage data plane: encrypted object writes, manifest handling, erasure coding, Relay Node RPCs, placement records, autonomous repair, and the FUSE/object client interfaces. Cowboy layers the protocol control plane on top: onchain StorageCommitment records, CapToken issuance and revocation, Relay Registry state, billing, and task attachment semantics. Together they form what this document calls Runner Attached Storage (RAS). Key properties:

Account-scoped: All storage is owned by and billed to a Cowboy account (EOA or Actor).
Private by default, public by choice: Private volumes are encrypted client-side before leaving the Runner; only the owning account can decrypt. Public volumes (visibility = PUBLIC) are stored unencrypted and readable by any party without a CapToken, enabling use cases like web asset hosting (CIP-15) and public datasets.
Access-controlled: Runners receive scoped capability tokens granting read-only, write-only, or read-write access to a storage volume. Multiple concurrent CapTokens may be active on the same volume simultaneously. Public volumes require CapTokens only for writes, not reads.
Addressable: Every stored object has a deterministic path within the account’s storage namespace, enabling targeted reads and deletes by the account owner.
Durable: Objects are erasure-coded (Reed-Solomon) and distributed across multiple Relay Nodes, tolerating node failures without data loss.
Metered: Storage is billed in CBY via a per-epoch, per-byte fee model (including erasure coding overhead) that extends the Cells concept from CIP-3 to persistent off-chain blobs.

2. Motivation

Runners in the Cowboy off-chain compute system (CIP-2) are stateless by design — each job executes in an isolated environment and terminates. This creates four concrete gaps:

Secret management. Runners need access to API keys, certificates, and credentials without exposing them on the public chain. These secrets must only be decryptable by authorized Runners, optionally gated by TEE attestation.
Large off-chain output. Computation frequently produces artifacts exceeding the 64 KiB onchain inline cap (whitepaper §7). LLM inference may generate images, audio, datasets, or model weights that are too large for result_data and inefficient to pass through onchain callbacks. These must be stored off-chain with a verifiable content commitment anchored onchain.
Multi-step data flow. Iterative computation (AI training loops accumulating weights across invocations), stateful agents (conversation history, tool outputs, learned preferences persisting across scheduling cycles), pipeline handoff (Runner A produces intermediate data consumed by Runner B), and agent swarms (coordinator + concurrent sub-agents sharing a volume) all require persistent, addressable storage between jobs.
Container persistence. CIP-10 container runtimes require persistent storage layers for stateful applications, databases, and caching across job restarts. Without attachable storage, containers are limited to ephemeral scratch space.

A critical constraint is that Runners are ephemeral. Most Runners operate as short-lived containers without persistent local disk and are not guaranteed to remain available after job completion. This means the storage layer must be separate from the compute layer — Runners are clients of the storage system, not the storage system itself.

2.1 Why not existing systems?

Existing decentralized storage systems offer relevant concepts but none are a direct fit:

System	Relevant Concept	Limitation for Cowboy Runners
Filecoin	Content-addressed storage deals with proof-of-spacetime	Minimum 180-day deals; sector sealing takes hours; no built-in access control
IPFS	Content addressing (CID), pinning	No persistence guarantee; no native privacy; deletion only local
Arweave	Permanent one-time-payment storage	Immutable — no deletion possible; wrong model for ephemeral runner output
Storj	S3-compatible API; Macaroon-based capability tokens; client-side encryption	Centralized coordination (Satellites); not permissionless; not integrated with onchain billing
Sia	Encrypted erasure-coded storage with contract-based billing	Contract formation latency; single-renter access model

RAS draws on the strongest ideas from these systems — Storj’s Macaroon-inspired capability tokens for scoped access control, Sia/Storj’s erasure coding for durability, S3’s path-based addressing for usability, and Sia’s contract-expiry cleanup as a safety net — while integrating them into a canonical CBFS-backed storage layer for Cowboy’s account model, Runner framework, and dual-metered fee system.

3. Definitions

Volume: A named, account-scoped storage namespace. An account may own multiple Volumes. Each Volume is an isolated container for stored objects.
Object: A single blob of data stored within a Volume, identified by a path key.
CBFS: The canonical off-chain storage engine used by this CIP. CBFS nodes implement the Relay Node data plane and CBFS clients implement the object API, manifest handling, and FUSE mount described here.
Storage Commitment: An onchain record that tracks a Volume’s existence, owner, size, shard placement, creation epoch, and billing state.
Capability Token (CapToken): A cryptographic bearer token encoding the permissions (read-only, write-only, or read-write), scope (volume + path prefix), time bounds, and size quota for a client’s access to a Volume. The client may be a Runner during a job, the owner via the CLI/SDK, or any other authorized caller.
Relay Node: A CBFS storage node participating in the Cowboy network that persistently stores erasure-coded shards of Volume data. Relay Nodes are a distinct network role from Runners and Validators, with their own staking and incentive model.
Shard: A fragment of an erasure-coded object. An object is split into K data shards and M parity shards (K+M total); any K shards are sufficient to reconstruct the original object.
ObjectDescriptor: The immutable content-identity record for a stored object. Contains the path, content hash, ciphertext hash, encryption nonce, size, erasure parameters, and per-shard hashes. Stored inside the encrypted manifest via ManifestEntry::File.
PlacementRecord: The mutable shard-to-node assignment record for a stored object. Contains a shard_id, a list of PlacementAssignment (shard_index → node_id), duplicated erasure params and shard hashes (so repair workers can operate without manifest access), a ciphertext_size (the pre-padding ciphertext length needed for correct erasure reconstruction), and a CAS version for atomic updates. Replicated to all participating Relay Nodes independently of the manifest.
Volume DEK: The per-volume AES-256 data-encryption key. It is wrapped one or more ways according to the volume’s access class (§9.1) — to the owner’s wallet-derived key, to the CBSS committee, or both.
CBSS: The Cowboy Secret Service — the threshold committee that holds the IBE capability to unwrap a committee-wrapped Volume DEK on behalf of an authorized Runner (§9.2).
Write-Relayer: The cowboy-ras-write-relayer service, which submits owner-signed RAS control-plane transactions (volume create, manifest commit, allowlist updates) and pays their gas on a client’s behalf (§12.2.1).
Dispatcher: The Job Dispatcher system actor (0x02), defined by CIP-2 — the in-consensus job-dispatch logic that selects Runners (VRF + committee filtering), issues and revokes CapTokens, and manages job lifecycle. It is not a separate off-chain service: its actions execute inside block processing at consensus trust level. In CIP-9 the Dispatcher additionally applies the storage-capability filter (§5.1.1), scopes a CapToken to a mount grant for cross-owner access (§7.7), and constructs and authorizes the CBSS SealRequest for private-volume key delivery (§9.2) — it never handles a plaintext DEK.

3.1 Canonical Manifest DAG

The storage manifest is a content-chunked, path-ordered Merkle-DAG, and the root committed on-chain (StorageCommitment.manifest_root, §11.1) is the trust anchor for every read. The structure is pinned to a single canonical algorithm so any party — a Runner, a Gateway, an indexer, an owner CLI — can independently recompute and verify the root. The canonical implementation is cbfs/manifest/src/dag.rs (build_manifest_dag):

The manifest is the set of ManifestEntry (cbfs/types): File { descriptor, metadata }, Symlink { … }, Directory { … }, ordered lexicographically by path() (UTF-8 byte ordering).
Entries are grouped into leaf nodes by content-defined chunking: once a node holds at least MANIFEST_NODE_MIN_ENTRIES (64) entries, a boundary is cut at the first entry whose domain-separated hash hits the target — BLAKE3("cbfs.manifest-entry-boundary.v1" || encoded_entry) with its low MANIFEST_NODE_TARGET_BITS (10) bits all zero (≈1024-entry target) — and is forced at MANIFEST_NODE_MAX_ENTRIES (2048) or MANIFEST_NODE_MAX_BYTES (256 KiB). This is a per-entry hash test, not a rolling hash. A Leaf { entries } carries its slice of the sorted entries.
Leaf nodes are reduced into interior nodes the same way (boundary domain cbfs.manifest-child-boundary.v1); an Interior { children } carries ManifestChild { min_path, locator, subtree_count } pointers. Levels reduce bottom-up until one node remains.
Each node is content-addressed by its locator — the BLAKE3 hash of its canonical encoding (magic CBMN/v1 for plaintext nodes, CBME/v1 for encrypted). The root is the locator of the top node; manifest_root is that value.

Because chunk boundaries are content-defined, an insert/update/delete rewrites only the leaf it lands in and the interior nodes on the path to the root — every other node keeps its locator and is structurally shared with the prior commit. build_manifest_dag takes the prior ManifestDagSnapshot and returns the new root plus added_nodes / removed_nodes, so a commit publishes and bills only the delta (this is what makes incremental commits O(changed), not O(N) — and is the basis for the PoR shard-hash rollup in §5.6 and the delta-scoped commit authority in §7.2). For private volumes each node is individually AES-256-GCM-encrypted under the volume DEK (key domain cbfs.manifest-node.v1, AAD domain cbfs.manifest-node-aad.v1) before storage, so manifest_root commits to ciphertext nodes and reveals nothing without the DEK. For public volumes nodes are stored as plaintext, which is what lets an unauthenticated Gateway recompute and verify the root (§5.3.2, §7.6.3).

4. Design Overview

4.1 Architecture

Runners are ephemeral compute nodes. They do not store data persistently. Instead, a network of CBFS Relay Nodes provides durable, always-available storage. Runners write to and read from Relay Nodes over the network during job execution.

4.2 Lifecycle

Create: Account owner creates a Volume via the Storage Manager system actor, specifying a name, optional size quota, and replication parameters. An onchain Storage Commitment is written.
Attach: When submitting a CIP-2 task, the owner includes one or more volume attachments in the task definition, each specifying the volume name, access mode (read-only, write-only, or read-write), and an optional path prefix scope.
Authorize: The Dispatcher (or a delegated Storage Manager) issues a CapToken to each selected Runner. The token is scoped to the specified volume, access mode, path prefix, job duration, and byte quota.
Write: The Runner encrypts object data, erasure-codes it into K+M shards, and distributes shards to Relay Nodes. Shards are immediately available for retrieval by other CapToken holders on the same volume.
Read: The Runner fetches any K of K+M shards from Relay Nodes, reconstructs the object, decrypts, and verifies the content hash.
Commit: At job completion (or periodically), the Runner commits a storage manifest — a Merkle root of all objects written — to the onchain Storage Commitment.
Manage: The account owner can list, read, and delete individual objects or entire Volumes at any time via the Storage Manager.

4.2.1 Committee Composition for Storage-Attached Jobs

A job that mounts a volume is dispatched through the same CIP-2 verification committee as any other off-chain job — CIP-9 does not introduce a separate dispatch path. It adds one prerequisite to committee selection: every runner considered for a storage-attached job MUST be storage-capable (§5.1.1) — storage_support = true with a valid x25519 encryption_pubkey. Runners lacking the capability are filtered out before committee sizing, not after. This creates a hard coupling between CIP-2 committee size and the storage-capable runner population. If the verification committee size M exceeds the number of eligible storage-capable runners, dispatch fails with InsufficientStorageCapableRunners { required, available, reasons } (§8.2.1) and never proceeds. Open design point (CIP-2 × CIP-9 seam): whether storage reads may ride a smaller committee (e.g., M = 1) than verified compute, or must always use the full verified-compute committee, is governed by CIP-2 committee sizing and is not yet settled. Until it is, a storage-attached job inherits the compute committee size. Implementations MUST expose the storage-capable count at dispatch so an operator can tell when committee size — not storage availability — is the binding constraint.

4.2.2 Dependency on Runner-Backed Continuations

Other than volumes mounted via the CLI or a client application, volume mounts are almost always reached through runner-backed continuations — an actor’s runner.agent / runner.http job. Such a job is materialized on-chain as a deferred transaction (job_submit) by the CIP-2 / CIP-5 machinery. CIP-9 dispatch is therefore gated on that materialization succeeding: if the deferred job_submit tx is dropped (for example by a speculative-execution cache eviction that fails to preserve a block’s deferred txs), the mount job is never created and the storage layer is never reached — even though every storage-side gate would have passed. The full chain of gates a storage job must clear is enumerated in Appendix D.

4.3 Canonical Implementation Boundary

This CIP is intentionally split into a CBFS data plane and a Cowboy control plane. Both are normative parts of a conforming CIP-9 implementation. CBFS data plane responsibilities:

Object encryption/decryption for PRIVATE volumes and plaintext handling for PUBLIC volumes.
Reed-Solomon erasure coding, shard hashing, and shard placement records.
Relay Node RPCs (PUT_SHARD, GET_SHARD, LIST_SHARDS, placement replication, repair traffic).
Manifest storage, reconstruction, Merkle root computation, and verification against the authoritative root.
FUSE mount behavior, local cache, sync daemon, and direct object API.
Shard repair, garbage collection of orphan shards, and local usage reporting hooks.

Cowboy control plane responsibilities:

StorageCommitment lifecycle, authoritative manifest_root, and volume ownership semantics.
CapToken issuance, revocation, and task-scoped attachment semantics.
Relay Registry membership, staking, health, and repair coordination triggers.
Billing, fee settlement, storage grace periods, and slashing policy.
Integration with CIP-2 task submission and the Runner execution flow.

CBFS is the canonical storage substrate; CIP-9 specifies how Cowboy governs, authorizes, and pays for that substrate.

4.4 Relationship to Existing CIPs

CIP-2 (Off-Chain Compute): RAS extends the OffchainTask definition to include volume attachments. The Runner Submission Contract is extended to accept storage manifests alongside result_data.
CIP-3 (Fee Model): RAS introduces a new fee dimension — persistent storage fees — billed per byte per epoch (including erasure overhead). Unlike CIP-3 Cycles and Cells (which are metered by the VM during transaction execution), storage usage is metered externally by Relay Nodes and settled onchain via attestation-based billing.
CIP-4 (State Storage): onchain Storage Commitments live in the existing STORAGE key space under the Storage Manager actor’s address. Volume data itself is NOT stored in the MPT trie.
CIP-10 (Runner Container Runtime): CIP-10 consumes the storage primitive defined here. Container image handling, cgroups, network policy, and GPU passthrough are separate concerns; volume attachment, mount semantics, and the object API are defined by CIP-9 and then mounted into CIP-10 runtimes.

5. Relay Nodes

5.1 Role and Responsibilities

Relay Nodes are distinct from Runners and Validators. They are implemented as CBFS storage nodes. A Relay Node:

Stores erasure-coded shards of encrypted Volume data.
Stores PlacementRecord entries that map shard IDs to their assigned nodes (see §5.3.1).
Serves shards and placement records to authorized requesters (Runners with valid CapTokens, account owners).
Heartbeats to the onchain Relay Registry to prove liveness.
Runs autonomous two-phase repair (self-heal + redundancy restoration) without Runner involvement (see §5.5).
Replicates PlacementRecords to peer nodes via ReplicatePlacement RPCs.

Relay Nodes hold opaque ciphertext shards and never see plaintext. They verify CapTokens to gate access to both shard and placement operations (§7.1.1) but perform no computation on the data beyond repair.

5.1.1 Runner Storage Capability

Relay Nodes hold and serve shards; Runners are the clients that read and write them during a job. For a Runner to be assignable to a storage-attached job it must advertise a storage capability, and the dispatcher enforces it (§4.2.1, §8.1). Two fields are added to the Runner’s registry capabilities (the CIP-2 RunnerCapabilities record):

storage_support: bool — the Runner is willing and configured to mount volumes.
encryption_pubkey: [u8; 32] — the Runner’s x25519 public key, used as the recipient key for sealed DEK delivery (§9.2).

A Runner MUST set storage_support = true and publish a valid encryption_pubkey to be eligible; the dispatcher’s storage filter rejects any runner missing either. Both default to off — storage_support is #[serde(default)] false — and auto-registration derives them from the Runner’s storage configuration (a CBSS/secrets endpoint and an x25519 key being present). To avoid “advertised but ineligible” runners, auto-registration MUST NOT set storage_support = true unless a valid encryption_pubkey is present in the same registration — i.e., the two are set together or not at all, so a runner never advertises a capability the dispatcher will reject. A Runner with no storage configuration is then correctly invisible to volume jobs rather than being selected and failing the mount after assignment. Capability updates require a re-register path. Capabilities are fixed at registration, and a second register_runner for an already-registered identity is rejected (RunnerAlreadyRegistered). To change storage capability — e.g., a runner that later gains a secrets endpoint — the operator MUST be able to deregister_runner (refunding stake) and register again with the new capabilities.

5.2 Relay Registry

The Relay Registry is a system contract (analogous to the Runner Registry in CIP-2) that manages Relay Node registration, staking, health, and capacity.

RelayNodeProfile {
  address:           bytes32,
  stake_amount:      u256,           // CBY staked
  capacity_bytes:    u64,            // advertised storage capacity
  used_bytes:        u64,            // current usage
  last_heartbeat:    u64,            // block height
  health:            u8,             // decays per block, reset on heartbeat
  shards_held:       u32,            // number of active shards
  shards_lost:       u32,            // historical shard losses (reputation)
  region_hint:       bytes4          // optional: geographic hint for latency optimization
}

Clients discover the active relay set from this registry and need no relay configuration. (CBSS DEK delivery — the seal committee and the sealed ciphertext — is covered in §9.2 and does not use a relay-registry field.) Lifecycle:

Register: Relay Node stakes MIN_RELAY_STAKE CBY and calls register_relay(capacity_bytes).
Heartbeat: Relay Node calls heartbeat() periodically. Health resets to MAX_RELAY_HEALTH (e.g., 100 blocks). Health decays by 1 per block.
Removal: If health reaches 0, the Relay Node is removed from the active list. Shards assigned to it are flagged for repair (see §5.5).
Unstake: A Relay Node may unstake after a cooldown period (RELAY_UNSTAKE_DELAY), provided it has no active shard assignments or has transferred them.

5.3 Shard Assignment

When a client writes an object, it must select K+M Relay Nodes to receive shards. Selection follows these rules:

Eligible set: All Relay Nodes in the active list with health > MIN_HEALTH_FOR_ASSIGNMENT and sufficient free capacity.
Diversity: Selected nodes SHOULD have distinct region_hint values (best-effort, not enforced).
Determinism: The initial assignment is recorded in the object’s PlacementRecord (replicated to Relay Nodes, §5.3.1) so any future reader or repair worker knows which Relay Nodes hold which shards.

Each stored object produces two records: an ObjectDescriptor (immutable, stored in the encrypted manifest) and a PlacementRecord (mutable, replicated to Relay Nodes independently of the manifest).

ObjectDescriptor {
  object_path:      string,
  write_id:         bytes16,          // CSPRNG(16) — fresh random per write, ensures version isolation
  content_hash:     bytes32,          // BLAKE3 hash of plaintext
  ciphertext_hash:  bytes32,          // BLAKE3 hash of ciphertext (pre-erasure-coding)
  encryption_nonce: bytes12,          // random nonce used for AES-256-GCM (unique per write)
  size_bytes:       u64,              // original object size
  ciphertext_size:  u64,              // ciphertext length before erasure padding (needed for correct reconstruction)
  shard_id:         bytes32,          // BLAKE3(volume_id || object_path || write_id) — opaque, version-unique
  erasure_k:        u8,
  erasure_m:        u8,
  shard_hashes:     [bytes32; K+M],   // BLAKE3 of each whole shard (integrity / repair)
  shard_chunk_roots:[bytes32; K+M],   // per-shard chunk Merkle root; PoR commits these via shard_root (§5.6)
}

PlacementRecord {
  shard_id:         bytes32,          // matches ObjectDescriptor.shard_id
  version:          u64,              // CAS version — monotonically incremented on each reassignment
  assignments: [
    { shard_index: u8, node_id: bytes32 },
    ...  // K+M entries
  ],
  // Duplicated from ObjectDescriptor so repair workers can operate
  // without manifest access (critical for private volumes where the
  // manifest is encrypted):
  erasure_k:        u8,
  erasure_m:        u8,
  ciphertext_size:  u64,
  shard_hashes:     [bytes32; K+M],
  shard_chunk_roots:[bytes32; K+M],   // duplicated so relays can build PoR proofs without the manifest (§5.6)
}

Why two records? The manifest is encrypted for private volumes — repair workers (Relay Nodes) cannot read it. By replicating the erasure params, shard hashes, and ciphertext_size into the PlacementRecord, Relay Nodes can autonomously verify, reconstruct, and reassign shards without ever touching the manifest. The CAS version field enables atomic reassignment: a repair worker reads the current version, computes a new assignment, and writes with expected_version = current; if another node raced, the write fails and the worker retries. The write_id ensures that overwrites to the same object path produce distinct shard addresses. Without it, overwriting a path would physically replace the old shards on Relay Nodes, destroying the only retrievable copy of the previous version. This would break rollback after failed commits, concurrent writers to overlapping paths, and reads from old manifests. With write_id, old shards remain on Relay Nodes until the new manifest commits and the old shards become orphans (garbage collected after ORPHAN_SHARD_TTL).

5.3.1 Placement Persistence

PlacementRecords are replicated to Relay Nodes independently of the manifest via three dedicated RPCs:

RPC	Purpose
`PutPlacement(shard_id, record)`	Store or update a PlacementRecord. Used by the SDK during commit and by repair workers during reassignment.
`GetPlacement(shard_id)`	Fetch a PlacementRecord by shard ID. Used by the SDK during open/refresh and by repair workers.
`ReplicatePlacement(shard_id, record)`	Node-to-node replication during placement sync.

All three RPCs are auth-gated by the same AuthProvider as shard operations (§7.1.1). Commit path: When the SDK commits a volume, it publishes each PlacementRecord to all assigned Relay Nodes via PutPlacement. Failures are logged but do not block the commit — placement is best-effort during commit and self-heals via repair. Open path: When the SDK opens a volume, it queries Relay Nodes for PlacementRecords via GetPlacement for each shard ID in the manifest. For any shard where no Relay Node returns a record (e.g., first open of a migrated volume), the SDK falls back to constructing an assumed placement from the node selector. On-disk schema versioning. PlacementRecords are persisted in the Relay Node’s cbfs-placement sled store inside a versioned magic envelope: magic: [u8; 4] || schema_version: u8 || bincode(PlacementRecord). bincode is not self-describing, so a bare field addition deserializes as a truncated or garbage record (“unexpected end of file”) on every node still holding old data — #[serde(default)] does not save you. On a schema_version mismatch the node treats the record as unreadable and re-derives placement from peers rather than aborting. The schema_version MUST be incremented on any change to the PlacementRecord layout. Placement-store lifecycle across re-genesis. Placement records reference node_ids and a volume_id namespace that are only meaningful within a single chain instance. When the chain is re-genesised, every persisted placement is stale: the nodes it names no longer exist and repair can never satisfy it. A Relay Node MUST record the genesis hash alongside its placement store and, on first boot against a different genesis, wipe the placement store rather than attempt to repair placements from a dead chain. Skipping this wipe is a documented relay crash-loop source (see §5.5 dead-placement backoff).

5.3.2 `GET_MANIFEST` RPC

The manifest can always be read indirectly by traversing the DAG (§3.1): starting from the on-chain manifest_root, a client fetches each node by its content-addressed shard id manifest_node_shard_id(locator) via GET_SHARD. GET_MANIFEST is a direct, one-round-trip alternative for public volumes that returns the whole manifest without per-node traversal — the recommended path for public-volume consumers (Gateways, indexers):

GET_MANIFEST { volume_id: bytes32 }
  → { manifest: bytes, manifest_root: bytes32 }

Authorization (PUBLIC only). GET_MANIFEST applies to PUBLIC volumes, whose manifest nodes are plaintext (§3.1). A relay serves it without a CapToken, deciding from the stored shard metadata written at commit time (visibility = PUBLIC) via the AuthProvider — the same mechanism as GET_SHARD (§7.6.3), not a per-request StorageCommitment chain lookup, so the two read paths can never disagree. Private volumes are not served via GET_MANIFEST: a relay without the DEK cannot decrypt a manifest node to learn its child locators (§3.1), so it cannot assemble the DAG — and the privacy model forbids it inspecting the manifest in any case (§9.3). A private manifest is read only by a DEK holder, node-by-node via GET_SHARD from the on-chain root, decrypting each node to discover the next locators. A volume whose status is DELETED or GARBAGE_COLLECTING is not served — the manifest is treated as absent. Verification. The caller MUST verify the returned manifest_root against the on-chain StorageCommitment.manifest_root and MUST NOT cache or serve content from a manifest whose root does not match. On mismatch, refetch from another Relay Node and mark the offending node suspect (§15.3 reputation). Node-by-node GET_SHARD traversal from the root remains valid; a Relay Node that does not implement GET_MANIFEST is treated as outdated (clients fall back to DAG traversal), not malfunctioning. Server-side model. GET_MANIFEST is an optimization, not a primitive: no single Relay Node inherently holds the whole manifest, since each DAG node is itself an erasure-coded object spread across a placement set. A relay answers it only if it can assemble the public DAG — reconstructing each node from its K shards across the placement set, and/or serving a cache it built at commit time. (This is possible only for public volumes; private nodes are opaque to the relay, per the authorization note above.) A relay that cannot assemble it does not implement the RPC, and clients fall back to node-by-node traversal. Because the caller verifies the returned root regardless, a stale or partial cache is safe — it fails verification and triggers a refetch.

5.4 Erasure Coding

RAS uses Reed-Solomon erasure coding to distribute each object across multiple Relay Nodes. Default parameters (governance-tunable, overridable per-volume):

Parameter	Default	Description
K (data shards)	4	Minimum shards to reconstruct
M (parity shards)	2	Additional parity shards
K+M (total shards)	6	Distributed to 6 distinct Relay Nodes
Storage overhead	1.5x	Account is billed for effective_size × 1.5

Write path (performed by the writing client):

Generate write_id = CSPRNG(16) and compute shard_id = BLAKE3(volume_id || object_path || write_id).
Encrypt the object (AES-256-GCM, see §9). Record ciphertext_size (pre-padding length).
Erasure-code the ciphertext into K data shards and M parity shards using Reed-Solomon (reed-solomon-erasure crate).
Compute shard_hash = BLAKE3(shard_bytes) for each shard.
Select K+M Relay Nodes and PUT each shard to its assigned node, authenticated by the CapToken.
Produce an ObjectDescriptor (stored in the manifest) and a PlacementRecord (published to Relay Nodes at commit time, §5.3.1).

Read path (performed by the reading client):

Look up the ObjectDescriptor from the manifest and the PlacementRecord (from local cache or fetched from Relay Nodes via GetPlacement).
Request any K shards from available Relay Nodes listed in the PlacementRecord (prefer lowest-latency nodes; fall back if some are unavailable).
Reconstruct the ciphertext using Reed-Solomon decoding, truncating to ciphertext_size to remove erasure padding.
Verify ciphertext_hash.
Decrypt (AES-256-GCM).
Verify content_hash.

Comparison with other systems:

System	Scheme	Total Shards	Reconstruct From	Overhead
RAS (default)	Reed-Solomon 4/6	6	any 4	1.5x
Storj	Reed-Solomon 29/80	80	any 29	2.7x
Sia	Reed-Solomon 10/30	30	any 10	3.0x
AWS S3	Proprietary (3+ AZ replication)	—	—	~3x

The 4/6 default is conservative. Accounts may opt for higher redundancy (e.g., 6/10 for 1.67x overhead, or 10/16 for 1.6x) via volume creation parameters. Governance may adjust the defaults as the Relay Node network matures.

5.5 Autonomous Shard Repair

Repair runs on the Relay Nodes themselves — no Runner involvement and no onchain timer required. Each Relay Node periodically inspects the PlacementRecords it holds and executes a two-phase repair cycle: Phase 1 — Self-heal (local shard verification): For each PlacementRecord where this node is an assigned holder:

Verify the local shard bytes against the expected shard_hash from the PlacementRecord.
If the shard is missing or corrupt, fetch K healthy shards from peer Relay Nodes listed in the same PlacementRecord.
Reconstruct the ciphertext using Reed-Solomon decoding, truncating to ciphertext_size (not the padded shard length — using the wrong length produces garbage).
Re-encode and extract the needed shard. Verify its hash. Store locally.

Self-heal is safe for any node to run concurrently — it only writes to local storage and does not modify the PlacementRecord. Phase 2 — Redundancy repair (dead node replacement):

For each PlacementRecord, probe all assigned nodes. A node is considered dead if it is unreachable after the configured timeout.
Leader election: Only the assigned node with the lowest shard_index among live nodes drives reassignment for that PlacementRecord. This prevents multiple nodes from racing to repair the same shard.
The leader selects replacement node(s) from the known peer list (excluding already-assigned nodes).
The leader reconstructs the missing shard(s) from K surviving shards (same erasure reconstruction as Phase 1) and uploads via PutShard to the replacement node(s).
The leader writes an updated PlacementRecord with the new assignments and an incremented version, using a CAS (Compare-and-Swap) write: the PutPlacement RPC specifies expected_version = current_version. If another node raced and already updated the record, the CAS fails and the leader retries from step 1 on the next cycle.
The updated PlacementRecord is replicated to all assigned nodes via ReplicatePlacement.

Failure tolerance: With K=4, M=2, the system tolerates any 2 simultaneous Relay Node failures per object without data loss. If 3+ nodes holding shards of the same object fail before repair completes, the object is lost. Repair cycle frequency should be tuned aggressively (default: REPAIR_CHECK_INTERVAL blocks) to keep this window small. Dead-placement backoff. A PlacementRecord whose assigned nodes are all unreachable (0 of K+M live) cannot be repaired — there is no surviving shard to reconstruct from. The repair worker MUST NOT re-attempt such a record every cycle; it applies exponential backoff (2^n cycles, capped at MAX_REPAIR_BACKOFF) and, after LOST_PLACEMENT_CYCLES consecutive all-dead observations, marks the placement LOST and stops attempting repair until a new placement supersedes it. Without this guard, a relay carrying stale placements (e.g., across a re-genesis, §5.3.1) busy-loops the repair worker and emits unbounded “0/N shards available” error logs every cycle. No Runner or onchain involvement: Repair is fully autonomous. Relay Nodes read PlacementRecords (which contain all the information needed — erasure params, shard hashes, ciphertext_size, assignments), reconstruct shards from peers, and update placements via CAS. The onchain StorageCommitment.degraded_shards counter is updated by Relay Node heartbeats as a reporting mechanism, not a repair trigger.

5.6 Proof of Retrievability (PoR) Challenges

Heartbeats prove liveness but not retrievability — a Relay Node could be alive but have lost or withheld data. RAS uses random PoR challenges to verify that Relay Nodes actually hold the shards they claim to hold. Challenge mechanism:

A periodic onchain timer (CIP-5, every POR_CHALLENGE_INTERVAL blocks) selects a random set of (shard_id, shard_index, byte_offset, byte_length, nonce) challenges targeting active Relay Nodes. The nonce is fresh per challenge.
The challenged Relay Node must respond within POR_RESPONSE_WINDOW blocks with:
- The requested byte range of the specified shard. The nonce seeds which range is asked, so a relay cannot precompute or replay a prior response.
- A range-inclusion proof: a Merkle path from the fixed-size chunks (e.g. 4 KiB) covering the requested range to the shard’s chunk-root — the Merkle root over that shard’s chunks. (The relay already builds this chunk tree to serve byte ranges.)
- A commitment proof: a path from that chunk-root to the volume’s onchain shard_root (§11.1) — the public commitment over the volume’s per-shard chunk-roots, maintained incrementally from the added_shards / removed_shards deltas every commit_manifest carries (§12.2).
The onchain verifier checks, against the shard_root snapshot carried in the challenge:
- the returned bytes hash into the chunk leaves of the range-inclusion proof, which resolves to a chunk-root;
- that chunk-root resolves to the onchain shard_root;
- if either fails, the response is invalid.

A relay that discarded the shard cannot produce valid chunks plus a path to the committed chunk-root, and cannot reuse an old response because the nonce moves the challenged range. The check spot-checks a random range against a committed per-shard chunk-root, anchored in the public shard_root, not manifest_root. This is what makes PoR sound for private volumes: manifest_root commits to ciphertext manifest nodes (§3.1), so the chain cannot verify plaintext object membership — but a shard’s chunks are erasure-coded ciphertext, and shard_root commits over their chunk-roots, so retrievability is provable without the DEK. Public and private volumes use the identical path, and only the challenged range — not the whole shard — is transferred. Concurrent manifest updates: If a volume commits between challenge issuance and response, the Relay Node provides proofs against the shard_root active at challenge time; the challenge carries that shard_root snapshot as its reference. If the challenged shard was removed by a commit during the response window, the challenge is voided (no penalty). Failure and slashing:

Outcome	Consequence
Valid response within window	No action; challenge passed
No response within window	`shards_lost` incremented; shard flagged for repair; `POR_MISS_PENALTY` slashed from stake
Invalid response (proof mismatch)	`shards_lost` incremented; shard flagged for repair; `POR_FRAUD_PENALTY` slashed from stake (higher than miss penalty)
3+ consecutive misses	Relay Node removed from active list; all shards flagged for repair; `RELAY_EVICTION_PENALTY` slashed

Design rationale: This is a lightweight PoR scheme — it does not require the full Filecoin-style Proof-of-Spacetime (which seals sectors and requires constant re-proving). Instead, it spot-checks random byte ranges, making it cheap to verify but expensive to fake. A Relay Node that has genuinely stored the shard can respond trivially; one that has discarded it cannot. Challenge economics: Challenges are funded from the storage fee pool — 2% of every per-epoch storage-fee batch accrues to the PoR challenge pool at 0x0B (STORAGE_FEE_CHALLENGE_POOL_BPS = 200, formerly POR_CHALLENGE_FEE_SHARE; canonical split in §10.4 and CIP-31 §4). Challenge bond (required as of CIP-31). Calling por_challenge(shard_id, byte_offset, byte_length) requires the caller to escrow RELAY_CHALLENGE_BOND (default 10 CBY, Tier-0 tunable; see CIP-31 §7) at 0x0B for the duration of the challenge window. Lifecycle:

Valid response within POR_RESPONSE_WINDOW: bond is refunded minus POR_CHALLENGE_FEE (default 1 CBY, retained by 0x0B).
Miss or fraudulent response: bond is refunded in full, and the challenger additionally receives CHALLENGER_BOUNTY (default 5 CBY) from the PoR challenge pool. The Relay’s slash (POR_MISS_PENALTY / POR_FRAUD_PENALTY / RELAY_EVICTION_PENALTY) follows the same 10/2/88 burn/challenge-pool/Relay-pro-rata split as storage fees, with the slashed Relay excluded from the 88% distribution that epoch (CIP-31 §8).

The bond + per-challenge fee make speculative or griefing challenges economically irrational while ensuring that any challenger who provokes a real miss is net-positive (+5 bounty − 1 fee = +4 CBY). Challenge-timer funding. The periodic PoR-challenge timer is a CIP-5 timer and is metered per fire (max_cost = gas_limit_per_fire × cycle_basefee + max_cells × cell_basefee). It MUST be scheduled with fee_payer = STORAGE_MANAGER (0x0A), pre-charged each fire from the same PoR challenge pool that the STORAGE_FEE_CHALLENGE_POOL_BPS split feeds (above). If the pool is somehow insufficient at fire time, CIP-5 cancels the timer (TimerCancelledInsufficientFunds); the Storage Manager SHOULD surface this as PorChallengePaused { reason: INSUFFICIENT_POOL } and re-arm on the next interval once the pool refills. Only the timer’s billing source is affected — PoR generation and shard validation are unchanged.

5.7 Relay Node Incentives

Relay Nodes earn fees from two sources:

Storage fees: A share of the per-epoch storage fee paid by the account owner, proportional to the number and size of shards held.
Transfer fees: A per-byte fee for serving shards to Runners (read bandwidth).

Fee distribution:

Per epoch, for each volume:
  total_storage_fee = volume_effective_size * STORAGE_FEE_PER_BYTE_PER_EPOCH
  per_relay_share = total_storage_fee / num_relay_nodes_holding_shards

Per read operation:
  transfer_fee = bytes_served * TRANSFER_FEE_PER_BYTE
  (paid to the specific Relay Node serving the shard)

Relay Nodes that lose shards (detected via failed PoR challenges or repair triggers) have their shards_lost counter incremented and stake slashed per §5.6. High shards_lost degrades reputation and may reduce future shard assignments.

5.8 Relay Drain Governance

Draining a relay — migrating its shards to other relays and releasing it — has three triggers: manual (the operator initiates the drain, posts a drain bond, and is given a window to migrate), policy-driven auto-drain (the per-epoch storage-settlement scan drains over-capacity or stale-heartbeat relays automatically), and forced governance-initiated (governance drains a silent or misbehaving relay whose operator will not cooperate). The manual path is operator-driven and self-bonded; the other two are governance-controlled and specified here. Both governance affordances ride CIP-12’s existing proposal pipeline (SubmitProposal=45 / CastVote=46 / ExecuteProposal=47) and share the Proposal storage schema at GOVERNANCE_SYSTEM_ACTOR=0x09. They are exposed as two specialized submit-proposal opcodes; the payload discriminator ProposalPayloadKind carries three variants:

pub enum ProposalPayloadKind {
    UpdateBasefeeConfig,    // CIP-3
    DrainRelay,             // §5.8, opcode 85
    UpdateAutoDrainPolicy,  // §5.8, opcode 86
}

The Proposal record carries two payload-side fields populated by variant — payload_relay_node_id: Option<[u8; 32]> (DrainRelay) and payload_auto_drain_policy: Option<AutoDrainPolicyConfig> (UpdateAutoDrainPolicy); both are None for UpdateBasefeeConfig. SubmitDrainRelayProposal (opcode 85). Wire: { description_hash: [u8; 32], voting_blocks: u64, node_id: [u8; 32] }. Preconditions: MIN_VOTING_BLOCKS ≤ voting_blocks ≤ MAX_VOTING_BLOCKS (else InvalidData); standard base gas; any account may submit (as with SubmitProposal). The target node_id is not validated at submit time — only at execution, since the relay may come and go during the voting window; an unknown node_id at execution yields InvalidData. Submit writes an Active Proposal (payload_kind = DrainRelay, payload_relay_node_id = Some(node_id), deadline = block + voting_blocks) and emits governance.proposal.submitted. On ExecuteProposal after a passing tally, it enqueues the governance auto-drain: read the relay’s RelayNodeProfile from RELAY_REGISTRY=0x0B, write the node into its auto_drain_governance_scan slot, and let the next epoch’s storage-settlement loop perform the drain. Governance-initiated drain skips the operator’s drain bond (the operator is not posting it) but reuses the same drain windowing, shard-absorption receipts, and stake-reservation/slashing logic; the relay’s existing stake is the slashable collateral. SubmitAutoDrainPolicyProposal (opcode 86). Wire:

{ description_hash, voting_blocks, enabled, high_water_mark_bps, over_capacity_epochs, stale_heartbeat_epochs, liveness_window_epochs, min_active_stake, drain_bond_base, drain_bond_per_gib, drain_window_epochs }

. It writes the network-wide AutoDrainPolicyConfig:

Field	Type	Meaning
`enabled`	`bool`	Master switch; if false the per-epoch scanner skips all relays.
`high_water_mark_bps`	`u16`	Capacity-utilization threshold in bps of `BPS_DENOMINATOR=10_000`. A relay above it for `over_capacity_epochs` consecutive epochs becomes auto-drain-eligible. Must satisfy `0 < x ≤ 10_000`.
`over_capacity_epochs`	`u64`	Consecutive over-capacity epochs required to trigger (≥ 1).
`stale_heartbeat_epochs`	`u64`	Consecutive stale-heartbeat epochs required to trigger (≥ 1).
`liveness_window_epochs`	`u64`	Epochs without a fresh heartbeat before a relay counts as stale (≥ 1).
`min_active_stake`	`u128`	Minimum stake (attoCBY) for a relay to be a valid shard-absorption destination.
`drain_bond_base`	`u128`	Base drain bond (attoCBY); auto-drain reserves the equivalent from existing stake.
`drain_bond_per_gib`	`u128`	Per-GiB adder to the drain bond, sized to the relay’s stored data.
`drain_window_epochs`	`u64`	Epochs the drain must complete in; uncopied shards past it are accounted lost and slashed (≥ 1).

Submit-time validation requires high_water_mark_bps ∈ (0, 10_000] and over_capacity_epochs, stale_heartbeat_epochs, liveness_window_epochs, drain_window_epochs all ≥ 1; enabled and the stake/bond knobs are unconstrained (zero allowed). On ExecuteProposal after a passing tally, the policy is re-validated (defense-in-depth against a malformed record) and written to 0x0B:AUTO_DRAIN_POLICY_KEY; subsequent epochs read and apply it in the auto-drain scan. Cross-references. Drain-relay and auto-drain-policy proposals are CIP-12 Tier-1 (registry/policy scope), not protocol-wide scalar parameters. Opcodes 85/86 are pinned in the CIP-13 master opcode table. A draining relay’s outstanding CBFS rent obligations follow the migrating shards to the absorbing relay; CIP-31 owns the per-shard rent accrual the drain window is sized against.

6. Storage Addressing

6.1 Path-Based Namespace

Every object in RAS is addressed by a three-part key:

/{account_address}/{volume_name}/{object_path}

account_address: The 32-byte Ed25519 public key of the owning account (hex-encoded).
volume_name: A UTF-8 string (max 64 bytes, restricted to [a-zA-Z0-9_\-.]).
object_path: A UTF-8 string (max 512 bytes) using / as a logical separator. No leading or trailing /.

Examples:

/0xaabb...ccdd/model-weights/v3/layer_0.bin
/0xaabb...ccdd/agent-memory/conversations/2026-03-04/session_1.cbor
/0xaabb...ccdd/pipeline-scratch/job_4821/intermediate.parquet

6.2 Content Integrity

Each stored object is tagged with these hashes:

content_hash     = BLAKE3(plaintext_bytes)       // verifies decrypted content
ciphertext_hash  = BLAKE3(ciphertext_bytes)      // verifies pre-erasure-coding ciphertext
shard_hash[i]    = BLAKE3(shard_bytes[i])        // verifies a whole shard (integrity / repair)
chunk_root[i]    = merkle_root(fixed-size chunks of shard_bytes[i])  // PoR spot-check anchor (§5.6)

shard_hash[i] is the whole-shard integrity hash used by repair; chunk_root[i] is the Merkle root over shard i’s fixed-size chunks (e.g. 4 KiB) and is the value that shard_root commits, so PoR can spot-check a byte range without transferring the whole shard (§5.6). Both are carried per shard in the ObjectDescriptor and PlacementRecord and in each ShardRef of a commit_manifest delta. BLAKE3 is used throughout (rather than keccak256) for content hashing because it is faster, parallelizable, and the integrity checks happen offchain where onchain compatibility is not a concern. The storage manifest committed onchain is the root locator of the manifest DAG over all entries, ordered lexicographically by object_path (§3.1). This enables:

Integrity verification: Any party can verify the full object lifecycle — shard hashes verify individual shards, ciphertext hash verifies reconstruction, content hash verifies decryption.
Efficient proofs: a path from a leaf node to the DAG root is O(log N) in the number of objects. For a public volume this proves object membership directly; for a private volume the nodes are ciphertext, so object membership is not chain-verifiable from manifest_root alone — shard-level retrievability is anchored separately by the public shard-hash root (§5.6).

6.3 Addressing for Deletion

The account owner deletes objects by their full path:

delete_object(volume_name, object_path) -> bool
delete_volume(volume_name) -> bool

delete_object marks all shards of the object for removal on the assigned Relay Nodes and updates the onchain manifest. Relay Nodes garbage-collect the shard data.
delete_volume initiates a soft delete: it sets the volume’s status to DELETED and starts the GC grace timer (§13.2). The StorageCommitment and rent persist through the grace window — the owner may call undelete_volume() to restore the volume — after which it is garbage-collected and fees cease.
Only the account owner (or an Actor acting on behalf of the account) may delete. Runners are never granted delete permission, even under read-write mode — this prevents malicious or compromised Runners from destroying data.

7. Access Control

7.1 Capability Tokens

Access to a Volume is mediated by Capability Tokens (CapTokens), inspired by Storj’s Macaroon system and UCANs. A CapToken is a compact, cryptographically signed structure encoding:

CapToken {
  volume_id:      bytes32,          // keccak256(account_address || volume_name)
  access_mode:    ENUM { READ_ONLY, WRITE_ONLY, READ_WRITE },
  path_prefix:    string,           // scope to objects under this prefix (e.g., "agent-3/")
  max_bytes:      u64,              // maximum total bytes writable
  valid_from:     u64,              // block height
  valid_until:    u64,              // block height (job timeout)
  runner_address: bytes32,          // the specific Runner authorized
  nonce:          u64,              // prevents replay
  signature:      bytes64           // reserved; presently zero — authority is the stored-token match, see below
}

Token authority. A CapToken is not self-authenticating by an in-token signature. It is minted by the system instruction that records volume attachments (the chain-authority issuance path, §8.2), and the authoritative copy is written into Storage Manager actor state. A presented token is honored only if it matches that stored copy byte-for-byte — relays enforce this through the AuthProvider, the validator through validate_token. Forging a token therefore requires writing Storage Manager state (i.e., compromising consensus), not forging a signature. The signature field is reserved for a future client-verifiable token form and is presently zero (§17.3). Note: For PUBLIC volumes (§7.6), CapTokens are required only for write access. Reads and listings are open to any party without a token, making READ_ONLY CapTokens unnecessary for public volumes.

7.1.1 Auth Enforcement on Placement RPCs

All three placement RPCs (GetPlacement, PutPlacement, ReplicatePlacement) are auth-gated by the same AuthProvider that gates shard operations. A request without a valid token (or with an expired/revoked token) is rejected before the Relay Node reads or writes any placement data. This is critical because PlacementRecords contain shard-to-node mappings — leaking them would reveal which nodes hold which shards for a volume, enabling targeted denial-of-service against specific Relay Nodes. For public volumes, placement reads follow the same open-access model as shard reads (§7.6.3).

7.2 Access Modes (CapToken Scopes)

The following modes apply to CapToken-gated access on private volumes:

Mode	Read Objects	Write Objects	List Objects	Delete Objects
READ_ONLY	Yes	No	Yes	No
WRITE_ONLY	No	Yes	No	No
READ_WRITE	Yes	Yes	Yes	No

WRITE_ONLY is the default and preferred mode for most Runner jobs. It allows the Runner to produce output without being able to inspect existing data in the volume. This minimizes the trust surface. READ_ONLY is appropriate when a Runner needs to consume data without modifying it — for example, reading a dataset, loading model weights for inference (not training), or a verifier checking another agent’s output. Because the Runner cannot write, there is no risk of data corruption or quota exhaustion. READ_WRITE is required when a Runner needs to both consume and produce data in the same volume (e.g., reading prior model weights to continue training, reading an agent’s memory and updating it, or a coordinator agent reading reports from sub-agents and writing a synthesis). Commit authority is separate from write authority. Writing shards (PUT_SHARD) and committing the manifest (COMMIT_MANIFEST / stage + finalize) are distinct. Since a commit sets the volume’s single manifest_root, an unconstrained committer could replace the whole index — including other prefixes. CIP-9 bounds this with parent-root CAS on every commit, plus prefix-confined commits for WRITE_ONLY tokens (enforced onchain for public volumes, via staged-commit + DEK-holder finalize for private volumes). See §7.3.1 for the full model.

Note — Public volumes: Volumes with visibility = PUBLIC use a different access model. Reads and listings are open to any party without a CapToken. Writes still require a CapToken (WRITE_ONLY or READ_WRITE). See §7.6 for the full PUBLIC volume specification.

7.3 Concurrent CapTokens

Multiple CapTokens may be active on the same volume simultaneously. This is essential for the agent swarm pattern, where a coordinator Runner holds a READ_WRITE token while multiple sub-agent Runners hold WRITE_ONLY tokens scoped to disjoint path prefixes. Rules for concurrent access:

Non-overlapping write prefixes: If two CapTokens grant WRITE access, their path_prefix values MUST NOT overlap. The Dispatcher enforces this at token issuance.
- Prefix canonicalization: Prefixes are canonicalized by ensuring a trailing / separator. A prefix agent-1 is stored as agent-1/. This prevents ambiguity: agent-1/ and agent-10/ are non-overlapping; agent-1/ and agent-1/sub/ DO overlap (the first is a parent of the second). Overlap is defined as: prefix A overlaps prefix B if A is a prefix of B or B is a prefix of A (after canonicalization).
- Empty prefix ("") means full volume access. No other WRITE CapToken may be active on the volume simultaneously if any token has an empty prefix.
Reads never conflict: READ_ONLY tokens may coexist with any number of other READ_ONLY or WRITE tokens. A READ_WRITE token may read paths being written by other tokens.
No total ordering of writes: Concurrent writes to different paths are independent. There is no global write ordering across CapTokens.
CapToken revocation: A CapToken can be revoked before its valid_until by the Dispatcher recording the token’s nonce in a revocation list. Relay Nodes check the revocation list on each request. Writes in-flight at revocation time may or may not land; the next manifest commit determines the canonical state. Revocation is best-effort and convergent — Relay Nodes may serve a revoked token briefly until the revocation propagates.

7.3.1 Prefix Enforcement Boundaries

Prefix enforcement operates across three layers, each with different trust properties: Layer 1 — Issuance-time checking (onchain, strong). When the Dispatcher issues a new WRITE CapToken, it checks the requested path_prefix against all active WRITE CapTokens on the same volume. If the new prefix overlaps an existing one (per the overlap definition above), issuance is rejected. This is an onchain check and is fully trustworthy. Layer 2 — Commit authority (onchain, strong). Because commit_manifest sets the volume’s single manifest_root, a misbehaving sub-agent’s blast radius is bounded by commit authority, not just write authority. Two rules apply:

Parent-root CAS (all volumes). Every commit names the prev_root it extends; the chain rejects it unless prev_root equals the volume’s current manifest_root (enforced today). A stale or racing committer cannot silently overwrite an intervening commit — it must re-base on the current root and retry. This alone removes the lost-update / replace-from-stale-base class of clobber.
Prefix-confined commits. A WRITE_ONLY committer may change only manifest nodes under its own path_prefix. How this is enforced depends on visibility, because the chain can only inspect paths it can see:
- PUBLIC volumes — manifest nodes are plaintext (§3.1), so the chain verifies prefix-confinement directly: a commit whose changed nodes (relative to prev_root) touch any path outside the committer’s prefix is rejected onchain. Prevention, not detection.
- PRIVATE volumes — manifest nodes are encrypted (§3.1), so paths are not visible onchain; the chain cannot check them without breaking the privacy guarantee (§9.3), and sampling cannot prove the absence of an out-of-prefix path. Prevention therefore routes through a DEK holder: a WRITE_ONLY sub-agent stages its prefix-scoped delta (CommitManifestStage, attributed to staged_by) but cannot finalize the canonical root; the DEK-holding coordinator or owner decrypts the staged delta, verifies prefix-confinement, and issues CommitManifestFinalize. A sub-agent therefore cannot unilaterally set manifest_root, so it cannot clobber another prefix.

The coordinator remains the natural verifier for private volumes precisely because it already holds the DEK and issued the sub-agent’s CapToken — but verification now gates finalization rather than being after-the-fact cleanup. Layer 3 — Write-time (Relay Nodes, weak). Relay Nodes receive PUT_SHARD requests keyed by opaque shard_id values (BLAKE3(volume_id || object_path || write_id)). Because the shard ID is a one-way hash, Relay Nodes cannot verify whether the underlying object path falls within the CapToken’s prefix. A Relay Node can verify that the CapToken is valid (signature, expiry, volume ID, write permission) but NOT that the write targets an authorized path. Prefix enforcement at the Relay Node layer is therefore not possible by design — this is the cost of shard ID opacity (§16.4), which protects object-path privacy. Consequence — junk-shard waste vector. Between PUT_SHARD and commit, a rogue Runner holding a CapToken scoped to agent-1/ could write shards for paths outside its prefix (e.g., agent-2/poison.dat). These out-of-prefix shards land on Relay Nodes but can never be referenced by an accepted commit — Layer 2 rejects (public) or refuses to finalize (private) any out-of-prefix change, so the shards are never committed. The waste is bounded by the CapToken’s max_bytes quota, which caps total shard bytes the Relay Node will accept for that token. Orphan shards (written but never referenced by a committed manifest) are garbage collected by Relay Nodes after ORPHAN_SHARD_TTL (§14). Summary:

Layer	Verifier	Strength	What it checks
Issuance	Onchain Dispatcher	Prevention	No overlapping prefixes among active CapTokens
Commit (public)	Onchain (parent-root CAS + changed-node prefix check)	Prevention	Commit’s changed nodes are confined to the committer’s prefix; out-of-prefix commit rejected
Commit (private)	Staged commit + DEK-holder finalize	Prevention	Sub-agent may only stage; coordinator/owner decrypts, verifies prefix, and finalizes
Write	Relay Nodes	None	CapToken is valid (but not prefix compliance)

7.4 Caveats and Restrictions

CapTokens support additive caveats (restrictions can be appended but never removed):

Path prefix narrowing: A CapToken scoped to job_4821/ can be further restricted to job_4821/checkpoints/ but never broadened to /.
Byte quota reduction: A 1 GiB quota can be reduced to 512 MiB but never increased.
Time window narrowing: The valid window can be shortened but never extended.

This enables delegation chains: the Storage Manager issues a broad CapToken to the Dispatcher, which narrows it per-job before passing it to the Runner.

7.5 Read Consistency

All reads are READ_COMMITTED: objects are visible only after the writing client has committed a manifest onchain that includes the object’s ObjectDescriptor. Manifest verification (mandatory): When a client fetches a manifest from Relay Nodes, it MUST:

Fetch the manifest DAG nodes from Relay Nodes (starting at the root locator) and reconstruct them.
Recompute the root locator of the reconstructed DAG (§3.1).
Compare the computed root to the onchain manifest_root in the volume’s StorageCommitment.
Reject on mismatch. A mismatched root means the fetched manifest is stale, partially published, or corrupted.

This verification rule is the mechanism behind READ_COMMITTED: because manifest nodes are content-addressed and a commit replaces only the root and the nodes on the changed path (§3.1), readers never trust fetched manifest data without recomputing the root and checking it against the onchain manifest_root, which is the single source of truth. This prevents dirty-read attacks: a malicious or buggy sub-agent could write garbage data and publish a manifest with it, but until commit_manifest() succeeds onchain, no reader will accept that manifest because its root won’t match the onchain commitment.

Future work — READ_UNCOMMITTED: A mode where objects are visible as soon as shards land on Relay Nodes (before commit_manifest()) is desirable for the real-time agent swarm pattern, where latency matters more than strict consistency. However, it is not implementable on the current design for two reasons:

Discovery: With versioned shard IDs (§5.3, write_id in the shard address), a reader cannot predict the shard_id for an uncommitted write — they don’t know the writer’s random write_id. Discovering uncommitted objects requires a separate metadata channel (pubsub or uncommitted manifest fragments) that this spec does not yet define.

Prefix safety: prefix-confinement is enforced at commit (§7.3.1); an uncommitted read would bypass that gate and could observe out-of-prefix shards from a rogue writer before the commit that would reject them.

READ_UNCOMMITTED is deferred to a future CIP that defines the discovery mechanism.

7.6 Public Volumes (`PUBLIC`)

7.6.1 Overview

A volume with visibility = PUBLIC is publicly readable by any party without a CapToken. This enables DNS-addressable actors (CIP-14) and other use cases such as serving static web assets (CIP-15), public datasets, or shared artifacts directly from Relay Nodes. Public volumes are created by setting visibility = PUBLIC at volume creation time (§12.3). The visibility of a volume is immutable after creation — a private volume cannot be made public, and a public volume cannot be made private. This prevents accidental data exposure and simplifies Relay Node behavior.

7.6.2 Properties

No encryption: Objects in PUBLIC volumes are stored unencrypted on Relay Nodes. No DEK is generated for the volume. The wrapped_dek field in the StorageCommitment is empty.
No CapToken for reads: Any party can fetch shards from Relay Nodes without presenting a CapToken. Relay Nodes serve GET_SHARD requests for public shards unconditionally.
CapToken still required for writes: Only the account owner (or authorized Runners via CapToken) can write to the volume. Write access uses the same CapToken mechanism as private volumes.
Content integrity preserved: content_hash (BLAKE3) is still computed and stored for every object. Readers MUST verify the content hash after shard reconstruction to detect corruption or tampering.
Erasure coding preserved: Reed-Solomon coding applies identically. The only change is that the input to erasure coding is plaintext (not ciphertext).
Billing unchanged: The account owner pays the same per-epoch, per-byte storage fees as private volumes.
Listing is public: list_objects for public volumes does not require a CapToken. The manifest is stored unencrypted and readable by anyone who traverses the manifest DAG from the on-chain root (§3.1) or calls GET_MANIFEST (§5.3.2).

7.6.3 Relay Node Behavior

Relay Nodes determine whether a shard is publicly readable using shard metadata stored alongside each shard at write time (see CBFS §7.3). When a Runner writes shards to a public volume, the CIP-9 AuthProvider implementation sets metadata: { "visibility": "PUBLIC" } in the AuthDecision. The Relay Node stores this metadata alongside the shard bytes. On a GET_SHARD request without a CapToken, the Relay Node passes the stored shard_metadata to the AuthProvider, which checks for visibility = PUBLIC and grants access. This avoids requiring the Relay Node to look up StorageCommitment.visibility onchain for every read — the authorization decision is self-contained in the stored metadata. For public volumes:

GET_SHARD requests are served without CapToken verification (authorized via shard metadata).
GET_MANIFEST (§5.3.2) is served without a CapToken — the same open-read rule.
LIST_SHARDS is not exposed; listing is performed client-side from the manifest.
PUT_SHARD requests still require a valid CapToken with write access. The AuthProvider attaches { "visibility": "PUBLIC" } metadata to the AuthDecision, which the Relay Node persists alongside the shard.

For private volumes, no visibility metadata is stored (or metadata is absent), so unauthenticated GET_SHARD requests are rejected. All operations require a valid CapToken. Relay Nodes do not expose any listing operation — object listing is performed client-side by reading the manifest. Gateways are not a privileged role. A Gateway (CIP-15) serving public web assets is simply an unauthenticated public-volume reader — no different from an indexer or a browser. The Relay Node does not need to know a request originates from a Gateway; the public-volume open-read rule applies uniformly. There is no Gateway storage role, key, or capability.

7.6.4 Shard ID Opacity

For private volumes, shard IDs are opaque (BLAKE3(volume_id || object_path || write_id)) to prevent Relay Nodes from learning object paths or detecting overwrites (§16.4). For public volumes, shard IDs remain opaque for consistency, but the manifest is unencrypted, so object paths are visible to anyone reading the manifest. This is acceptable because the data itself is public.

7.6.5 Content-Type Metadata

Public volumes support an optional content-type map stored as a well-known object at the path _meta/content_types.json:

{
  "defaults": {
    ".html": "text/html; charset=utf-8",
    ".css": "text/css",
    ".js": "application/javascript",
    ".json": "application/json",
    ".png": "image/png",
    ".jpg": "image/jpeg",
    ".svg": "image/svg+xml",
    ".woff2": "font/woff2"
  },
  "overrides": {
    "data/feed.xml": "application/atom+xml"
  }
}

Consumers (e.g., CIP-15 Gateways) read this map to set appropriate HTTP Content-Type headers when serving objects. If no map exists, consumers infer content types from file extensions using a standard MIME type database.

7.6.6 Cache Headers

Public volumes support an optional cache configuration stored at _meta/cache_config.json:

{
  "default_max_age": 3600,
  "paths": {
    "assets/*": {"max_age": 86400, "immutable": true},
    "index.html": {"max_age": 0, "must_revalidate": true}
  }
}

Consumers use this to set Cache-Control headers. The ETag for any object is its content_hash (BLAKE3, hex-encoded), enabling conditional requests (If-None-Match). Cache invalidation is driven by manifest root changes — when the onchain StorageCommitment.manifest_root changes, consumers know the volume contents have been updated.

7.7 Cross-Owner Volume Access (Mount Allowlist)

By default only a volume’s owner can attach it. To let another principal mount a volume it does not own, the owner maintains an on-chain mount allowlist on the volume’s StorageCommitment (§11.1), mutated only by an owner-signed UpdateMountAllowlist transaction (§12.3). A principal must appear on the allowlist before the Dispatcher will issue it a CapToken for a volume it does not own. Each grant is a MountGrant:

MountGrant {
  grantee:      bytes32,                                   // account or actor address
  access_mode:  ENUM { READ_ONLY, WRITE_ONLY, READ_WRITE },
  path_prefix:  string,                                    // "" = whole volume
  valid_until:  u64,                                       // block height; 0 = no expiry
}

At dispatch, when the task submitter is not the volume owner, the Dispatcher resolves the submitter to a principal, looks up a matching unexpired grant, and scopes the issued CapToken to the grant’s access_mode and path_prefix (intersected with the attachment request). Delete is never grantable (§6.3). Grants whose valid_until has passed are skipped. Principal type must be unambiguous (Actor vs Account). A grantee address can denote either an Account or a deployed Actor. The mount check derives the submitter’s principal type from chain state — Actor(addr) if the submitter carries an ActorManifest, else Account(addr). If the allowlist stored only one variant (e.g., Actor(addr), because the owner ran allow-actor), a manifest-less deployed actor that resolves to Account(addr) fails an exact-variant match and is silently rejected even though its address is on the list. To remove this trap, UpdateMountAllowlist records grants keyed by the bare address, and the mount check matches on the canonical address regardless of principal variant. (Equivalently, an implementation MAY require deployed actors to always carry a manifest so they are always Actor principals — but the address-keyed match is the normative rule, because it does not depend on ephemeral manifest state.) A grant MUST NOT be silently dropped on a variant mismatch.

8. Volume Attachment

8.1 Attachment at Job Dispatch

When submitting a CIP-2 task, the account owner specifies volume attachments in the task definition:

VolumeAttachment {
  volume_name:   string,
  access_mode:   ENUM { READ_ONLY, WRITE_ONLY, READ_WRITE },
  path_prefix:   string?,          // optional: restrict to sub-path
  max_bytes:     u64,              // byte quota for this job
}

Multiple volumes can be attached to a single job. Each produces an independent CapToken. Storage-capable runners only. A task carrying any VolumeAttachment is dispatched only to runners advertising storage_support = true with a valid encryption_pubkey (§5.1.1). The Dispatcher applies this filter as part of committee selection (§4.2.1); if it cannot assemble the required committee from storage-capable runners, dispatch fails with InsufficientStorageCapableRunners (§8.2.1) rather than assigning a runner that would then fail the mount.

8.2 Attachment Process

Since Runners have no persistent local disk, “attachment” is not about moving data to the Runner. Instead, attachment means:

CapToken issuance: The Dispatcher issues a scoped CapToken for each volume attachment.
Volume key delivery (private volumes only): the Runner obtains the DEK via a CBSS SealRequest it completes itself (§9.2); the Dispatcher constructs and authorizes the seal request but never unwraps or handles a plaintext DEK. A private volume must carry a committee wrap to be runner-mountable — an owner-key-only volume is read by its owner via the CLI (§9.2), not delivered to a Runner. Public volumes skip this step.
Manifest fetch: The Runner fetches the manifest DAG from the on-chain root locator (§3.1) — by node-addressed GET_SHARD or via GET_MANIFEST — recomputes the root, and verifies it matches the onchain manifest_root (§7.5). Only after verification does the Runner trust the manifest for reading or writing.

The Runner then reads and writes objects over the network to Relay Nodes as needed during execution. There is no bulk data “prefetch” phase — reads are on-demand. Expected latency:

Scenario	Attachment Latency	Notes
Any volume (READ_ONLY, WRITE_ONLY, or READ_WRITE)	~100-500ms	Key delivery + manifest fetch
First read of an object (1 MiB)	~200ms-1s	Fetch K shards + reconstruct + decrypt
First read of an object (100 MiB)	2-10s	Proportional to object size and network

Attachment cost is a fixed fee covering key delivery and manifest sync:

attachment_cost = BASE_ATTACHMENT_FEE

This is charged at task submission time. Data transfer fees (reads/writes during execution) are metered separately.

8.3 Detachment

When a Runner job completes (or times out):

The Runner commits the final storage manifest onchain (if it wrote any objects).
The CapToken is invalidated (past valid_until).
Shards written during the job persist on Relay Nodes, independent of the Runner’s lifecycle.

The Runner may terminate immediately after commit. Data durability does not depend on the Runner remaining online.

8.2.1 Dispatch Eligibility Diagnostics

Health and eligibility are distinct gates: a runner can be “healthy” in the registry yet still be filtered out of a storage job by reputation, job-type, storage_support, or DNS constraints. Because a storage mount must clear several independent gates (Appendix D), a silent “0 eligible runners” is the most common and least legible failure mode. The Dispatcher MUST therefore emit a structured, per-gate exclusion summary when selecting for a storage-attached job:

{ total_candidates, excluded_by_job_type, excluded_by_health,
  excluded_by_reputation, excluded_by_storage_support, selected }

When selected is below the required committee size, the dispatch error MUST be the structured InsufficientStorageCapableRunners { required, available, reasons: [...] }, not a generic “no eligible runners” — the operator needs to know which gate was binding. Note that reputation eligibility is itself a gate: a freshly-bootstrapped runner whose reputation has not yet crossed the selection threshold (or has lazily decayed just below it) is excluded here, and the summary MUST make that visible rather than collapsing it into a generic exclusion.

9. Encryption and Privacy

9.1 Encryption at Rest

All private Volume data is encrypted client-side before erasure coding and distribution to Relay Nodes — by whichever client writes it, whether a Runner during a job or the owner via the CLI. Relay Nodes never see plaintext. Exception: Public volumes (visibility = PUBLIC) skip encryption entirely. Objects are erasure-coded and distributed as plaintext. No DEK is generated, no wrapping key is needed, and Runners do not perform encryption or decryption. Content integrity is still verified via BLAKE3 content hashes. See §7.6 for full public volume semantics. The encryption scheme uses envelope encryption with a random Data Encryption Key (DEK) per volume:

Volume DEK (Data Encryption Key): A random 256-bit key generated at volume creation. The DEK is never derived from the account’s signing key — signing keys sign; they do not derive encryption keys.

volume_dek = CSPRNG(32)  // generated once at create_volume()

The DEK is then wrapped one or more ways according to the volume’s access class (wrapping_key_policy, stored on-chain in the StorageCommitment, §11.1). The wraps are independent and additive — a volume may carry either or both:

Owner wrap (owner-key): the DEK is wrapped to a key derived from the owner’s wallet via HKDF, so the owner can read and write the volume directly from the CLI on any machine. Canonical domains: OWNER_KEY_WRAP_SALT = "cbfs/owner-key-volume-dek-wrap/salt/v1" and OWNER_KEY_WRAP_INFO = "cbfs/owner-key-volume-dek-wrap/info/v1".
Committee wrap (committee-only): the DEK is IBE-encrypted to the CBSS committee under domain CIP9_VOLUME_DEK_IBE_DOMAIN = "cbss/ibe/cip9-volume-dek/v1", so an authorized Runner can obtain it at mount via a threshold seal (§9.2) without the owner ever placing wallet key material on the runner.

A canonical hash domain CIP9_HASH_DOMAIN_VOLUME_DEK = "cowboy.cip-9.volume-dek.v1" (with AAD domain …volume-dek-aad.v1) binds each wrap to its volume_id. The wraps are stored in the on-chain StorageCommitment (§11.1). Access classes (wrapping_key_policy):

Policy	Wraps present	Who can write	Who can read
`owner-key`	owner (and, typically, committee)	owner via CLI; runners with an RW CapToken	owner directly; committee-served runners
`committee-only`	committee	runners with an RW CapToken	committee-served runners (the owner cannot read without a runner)
`tee-gated` (future)	committee, release gated on TEE attestation	attested runners with an RW CapToken	attested committee-served runners

This resolves the long-standing “can’t write a private volume from the CLI” confusion: CLI writes are valid for any volume that carries an owner wrap, and --owner-key at create means also store the owner wrap — additive, not exclusive. The threat model is per-class and deliberate: committee-only keeps wallet key material off every client machine at the cost of the owner needing a runner to read; owner-key trades that for direct CLI access.

Object Encryption: Each object is encrypted with AES-256-GCM using a random nonce per write:
```
nonce = CSPRNG(12)       // fresh random nonce for EVERY write, including overwrites
ciphertext = AES-256-GCM(key=volume_dek, nonce=nonce, plaintext=object_bytes, aad=object_path)
```
The nonce is stored alongside the ciphertext in the ObjectDescriptor. This is critical: deterministic nonces derived from the object path would cause nonce reuse on overwrites, which is catastrophic for AES-GCM (leaks XOR of plaintexts, breaks authentication). A fresh random nonce on every write eliminates this class of attack entirely.
Shard opacity: Erasure coding is applied to the ciphertext, not the plaintext. Relay Nodes hold shards of ciphertext — even if they reconstructed all shards, they would only have ciphertext.
Manifest encryption: The volume manifest (list of object paths, sizes, content hashes, ObjectDescriptors) is encrypted with the volume DEK before transmission to Relay Nodes. PlacementRecords are stored separately on Relay Nodes (§5.3.1) and are not part of the encrypted manifest.

9.2 Runner Key Access

To read or write a private volume a Runner needs the volume DEK. The DEK is delivered at job attachment (§8.2); it is never stored on-chain in plaintext. Which delivery path applies depends on the volume’s access class (§9.1). Committee seal (CBSS) — committee-wrapped volumes. This is the path that lets a Runner read a private volume for which the owner has shared no wallet key. The Runner holds an x25519 recipient keypair whose public key is the encryption_pubkey in its registry capabilities (§5.1.1). At mount:

The Runner issues a SealRequest for the volume’s committee-wrapped DEK, presenting its CapToken (and, for tee-gated, its attestation).
Each CBSS committee member verifies authorization and returns a partial decryption.
The Runner combines the threshold partials and completes the IBE decryption (domain cbss/ibe/cip9-volume-dek/v1) to recover the DEK, sealed in transit to its encryption_pubkey.
The DEK is held in memory for the job and zeroized on completion.

Both halves are discovered from chain, never from local configuration (§16.7): the seal committee for a volume is resolved from the CBSS committee registry (CIP-24), keyed by the volume’s recorded committee epoch, and the sealed ciphertext is an ordinary CBFS object fetched from the relays already in the relay registry (§5.2). A Runner needs no CBSS endpoint set. Runner-mountable volumes require a committee wrap. The committee seal is the only path by which a Runner — which is not the owner — obtains the DEK. A private volume is therefore runner-mountable only if it carries a committee (CBSS) wrap; an owner-key-only volume (owner wrap, no committee wrap) is readable by its owner via the CLI (below) but cannot be mounted by a Runner, and is made runner-readable by adding a committee wrap. The Dispatcher never holds the owner wrapping key and never sees a plaintext DEK — it only constructs and authorizes the SealRequest. Owner CLI access — owner-wrapped volumes. The owner unwraps the owner wrap locally with its wallet-derived key; no runner or committee is involved. This is the direct read/write path that makes owner-key volumes CLI-writable. Security properties.

The DEK never appears in plaintext on-chain or in any persistent store.
READ_ONLY CapTokens on private volumes still require DEK delivery — the Runner must decrypt ciphertext shards to serve reads.
For TEE-attested runners (tee-gated), the committee releases partials only against a valid attestation, and the DEK is sealed to the enclave, never exposed to the host OS.

Public volumes skip key delivery entirely — there is no DEK; the encrypted_dek field is absent from the job assignment payload.

9.3 Privacy Guarantees

For private volumes (visibility = PRIVATE):

Relay Nodes see only ciphertext shards indexed by opaque shard IDs (see §16.3). They cannot read object contents or inspect the manifest. Object paths are encrypted within the manifest and never exposed to Relay Nodes.
Other Runners (not assigned to the job) cannot access the volume DEK.
onchain observers see only the Storage Commitment (volume ID, wrapped DEK, encrypted manifest hash, total size, shard assignments). They cannot determine what is stored.
The account owner has full access to all their volume data by unwrapping the DEK with their wrapping key.

For public volumes (visibility = PUBLIC):

No confidentiality: Data is stored unencrypted. Relay Nodes, Runners, and any network participant can read object contents. This is by design — public volumes are intended for publicly readable data (web assets, public datasets, shared artifacts).
Integrity preserved: Content hashes (BLAKE3) ensure that data has not been tampered with, even though it is unencrypted.
Write access is still restricted: Only the account owner (or authorized Runners via CapToken) can write to a public volume. Public readability does not imply public writability.

10. Billing and Fees

10.1 Fee Components

RAS introduces four fee components, all denominated in CBY:

Fee	When Charged	Calculation
Volume Creation	At `create_volume()`	`VOLUME_CREATION_FEE` (fixed)
Attachment	At `submit_task()` with volume attachment	`BASE_ATTACHMENT_FEE` (fixed per volume per job)
Persistent Storage	Per epoch, while volume exists	`effective_size * STORAGE_FEE_PER_BYTE_PER_EPOCH`
Data Transfer	At read/write time	`bytes_transferred * TRANSFER_FEE_PER_BYTE`

10.2 Effective Size and Erasure Overhead

The effective size of a volume is the raw data size multiplied by the erasure coding overhead factor:

effective_size = raw_size * (K + M) / K

For the default 4/6 scheme, effective_size = raw_size * 1.5. The account pays for the full effective size, since that is the actual storage consumed across Relay Nodes.

10.3 Persistent Storage Billing

Unlike onchain Cells (which are a one-time cost metered by the VM at transaction execution), persistent storage incurs ongoing costs. Storage usage is metered externally by Relay Nodes and settled onchain — the chain cannot directly measure how many bytes a Relay Node stores, so it relies on attestations and Proof of Retrievability challenges (§5.6) to verify. The billing model:

Each epoch, the protocol calculates the total effective storage used by each account across all volumes.
The per-epoch storage fee is deducted from the account’s balance.
If the account’s balance falls below MIN_STORAGE_BALANCE (sufficient to cover one epoch of fees), the protocol enters a grace period of STORAGE_GRACE_EPOCHS.
After the grace period, if the balance is still insufficient, all volumes owned by the account are marked for garbage collection.

This mirrors Sia’s contract-expiry cleanup model — storage only persists while it’s paid for.

10.3.1 Escrow and Rent Lifecycle

A volume’s billing state is explicit, not implicit. At create_volume() the owner MAY prepay storage with --initial-escrow; escrow is drawn down per epoch ahead of the account balance. Lifecycle:

Create: with no escrow and insufficient balance to cover MIN_STORAGE_BALANCE, creation MUST fail with a distinct, named error (§12.3.1) — not a generic E1900 "invalid data", which today conflates the duplicate-name and escrow-shortfall cases. A no-escrow volume that is created bills from balance immediately and expires after one epoch if balance is insufficient.
Active → grace: when escrow is exhausted and balance falls below MIN_STORAGE_BALANCE, the volume enters GRACE_PERIOD for STORAGE_GRACE_EPOCHS. A top-up of escrow or balance returns it to ACTIVE.
Expiry: after the grace window with still-insufficient funds, the volume is marked GARBAGE_COLLECTING (§13).

There is no separate “rent renewal” instruction — escrow/balance top-up uses normal account-balance operations. CIP-31 (CBFS Rent Schedule) owns the per-epoch rent accrual, the grace/eviction economics, and the Tier-0 parameter values; this section fixes only the lifecycle states and the requirement that each transition surface as a legible, distinct error or status.

10.4 Fee Distribution

Storage fees flow from account owners to Relay Nodes in a three-way split:

Account Owner ──(per-epoch storage fee)──► Protocol ──► { burn (0x00) :: PoR challenge pool (0x0B) :: Relays pro-rata }
Runner        ──(per-byte transfer fee) ──► Relay Node (serving the shard)

Per-epoch storage-fee split (canonical numeric values in CIP-31 §4):

STORAGE_FEE_BURN_BPS (default 1000 = 10%) → burned to 0x00, consistent with CIP-3’s deflationary design.
STORAGE_FEE_CHALLENGE_POOL_BPS (default 200 = 2%) → accrued to the PoR challenge pool at 0x0B (this is the explicit form of the POR_CHALLENGE_FEE_SHARE row in §14; it funds the challenger bounty defined in CIP-31 §7).
STORAGE_FEE_RELAY_BPS (default 8800 = 88%) → distributed pro-rata across active Relay Nodes with weight (shard_count × shard_age_in_epochs) per CIP-31 §5.

Invariant: STORAGE_FEE_BURN_BPS + STORAGE_FEE_CHALLENGE_POOL_BPS + STORAGE_FEE_RELAY_BPS == 10000. Tier-0 governance MAY rebalance the three under the invariant; CIP-31 owns the genesis defaults and Tier-0 keys. Transfer fees (TRANSFER_FEE_PER_BYTE) go entirely to the serving Relay Node (no burn, no challenge-pool share).

10.5 Relationship to CIP-3

CIP-3 defines two onchain meters: Cycles (compute) and Cells (data). RAS does NOT create a third onchain meter. Instead:

onchain operations (creating Storage Commitments, writing manifests) consume Cycles and Cells as normal CIP-3 transactions.
Off-chain storage fees are a separate ledger entry, debited from the account balance per epoch by the Storage Manager system actor.

This keeps the onchain metering model clean while extending billing to cover persistent off-chain resources.

11. onchain State

11.1 Storage Manager System Actor

RAS is managed by the Storage Manager system actor at STORAGE_MANAGER = 0x0A. This actor maintains: StorageCommitment (per volume):

StorageCommitment {
  volume_id:             bytes32,    // keccak256(account_address || volume_name)
  owner:                 address,
  volume_name:           string,
  visibility:            ENUM { PRIVATE, PUBLIC },  // PRIVATE = encrypted, CapToken-gated; PUBLIC = unencrypted, open reads
  wrapping_key_policy:   ENUM { OWNER_KEY, COMMITTEE_ONLY, TEE_GATED },  // access class (§9.1); ignored for PUBLIC
  created_at:            u64,        // block height
  owner_wrapped_dek:     bytes,      // DEK wrapped to the owner wallet key (§9.1); empty if no owner wrap / PUBLIC
  committee_wrapped_dek: bytes,      // DEK IBE-wrapped to the CBSS committee (§9.1); empty if no committee wrap / PUBLIC
  manifest_root:         bytes32,    // root locator of the manifest DAG (§3.1); over ciphertext nodes for PRIVATE, plaintext for PUBLIC
  shard_root:            bytes32,    // public commitment over the volume's per-shard chunk-roots (§5.6); DEK-independent, anchors PoR for PRIVATE volumes
  raw_size_bytes:        u64,        // sum of object sizes (pre-erasure)
  effective_size_bytes:  u64,        // raw_size * (K+M)/K
  erasure_k:             u8,         // data shards
  erasure_m:             u8,         // parity shards
  last_updated:          u64,        // block height of last manifest update
  degraded_shards:       u16,        // count of shards needing repair
  mount_allowed_actors:  [MountGrant],  // cross-owner mount allowlist (§7.7); matched by bare address
  status:                ENUM { ACTIVE, GRACE_PERIOD, DELETED, GARBAGE_COLLECTING }
}

For a PRIVATE volume at least one of owner_wrapped_dek / committee_wrapped_dek MUST be present, per the volume’s wrapping_key_policy (§9.1); both may be present (the additive dual-wrap). Each wrap is bound to volume_id via the canonical hash domain cowboy.cip-9.volume-dek.v1 — there is no separate wrapping_key_hash field. (Implementation status: the chain StorageCommitment today carries only the committee wrap as a single wrapped_dek; wrapping_key_policy and the owner-wrap field are target additions not yet threaded through CreateVolume/OpenVolume RPC and consensus — see §17.3.) AccountStorageSummary (per account):

AccountStorageSummary {
  total_volumes:          u32,
  total_effective_bytes:  u64,
  last_billed_epoch:      u64,
  balance_reserved:       u256       // CBY reserved for storage fees
}

11.2 Relay Registry

The Relay Registry is the system actor at RELAY_REGISTRY = 0x0B, managing:

RelayNodeProfile entries (see §5.2)
Active relay list (ordered, health-decaying, analogous to CIP-2 Runner Registry)
Shard assignment index: volume_id → list[PlacementAssignment]
The network-wide AUTO_DRAIN_POLICY_KEY slot written by governance (§5.8)

11.3 Key Space

Storage Commitments are stored in the CIP-4 STORAGE key space under the Storage Manager actor’s address:

key = 0x1 || keccak256(storage_manager_address) || 0x00 || keccak256("commitment" || volume_id)
value = rlp(StorageCommitment)

Volume name index — the canonical (owner, name) → volume_id lookup (§12.3). It is written at create_volume(), not lazily at first commit, and is keyed by the owner’s chain address — the same owner_address used to derive volume_id — so a read resolves under exactly the owner-domain the writer used. The canonical derivation is owner_volume_name_key(owner, volume_name):

key = "ras:owner-name:" || owner_address || keccak256(volume_name)   // OWNER_NAME_PREFIX = b"ras:owner-name:"
value = volume_id (bytes32)

A name lookup that misses this index (a real cause of volume info --name / allow-actor --name returning 404 for volumes that demonstrably exist) MUST be treated as “volume not found for this owner,” not silently routed to a different owner-domain. Account summaries:

key = 0x1 || keccak256(storage_manager_address) || 0x00 || keccak256("summary" || account_address)
value = rlp(AccountStorageSummary)

Relay profiles:

key = 0x1 || keccak256(relay_registry_address) || 0x00 || keccak256("relay" || relay_address)
value = rlp(RelayNodeProfile)

12. Client Interfaces

RAS exposes two client interfaces to Runner workloads. The choice of interface depends on the workload type:

Filesystem interface (§12.1): A FUSE-mounted directory that presents the volume as a standard filesystem. This is the primary interface for agentic workloads where an LLM (Claude, Kimi-K2, GPT, etc.) operates via tool calling. The model’s existing filesystem tools — Read, Write, Bash (ls, grep, find), Glob, Grep — work unchanged against the mounted volume. No custom tool definitions required.
Object API (§12.2): A programmatic interface for orchestration code and non-agentic workloads. This is what the FUSE layer calls internally, and is also available directly for lightweight write-only jobs that don’t need filesystem semantics.

12.1 Filesystem Interface (FUSE Mount)

12.1.1 Design Rationale

The primary consumer of Runner storage is an AI model doing tool calling. When Claude runs as a Cowboy Runner, it uses tools like Read (read a file by path), Write (write a file by path), and Bash (run shell commands like ls, grep, find). These tools operate on filesystem paths. Every major model provider — Anthropic, Moonshot (Kimi), OpenAI — exposes similar filesystem-based tool sets. Requiring models to use a custom put_object / get_object API would mean:

Injecting custom tool definitions into every model’s tool set
Models are less fluent with unfamiliar, domain-specific tools
Loss of composability with standard unix tools (grep, find, jq, wc, etc.)
Every model provider’s runner integration needs custom work

By presenting volumes as a mounted filesystem, the model’s existing tools work natively:

Model tool call                        What happens under the hood
─────────────────                      ──────────────────────────────
Read("/mnt/memory/state.json")    →    FUSE read → fetch shards → reconstruct → decrypt
Write("/mnt/memory/state.json")   →    FUSE write → local buffer → background push
Bash("ls /mnt/memory/logs/")      →    FUSE readdir → list objects from manifest
Bash("grep -r 'error' /mnt/mem")  →    FUSE read (multiple) → local cache → grep
Bash("wc -l /mnt/memory/*.json")  →    FUSE read (multiple) → local cache → wc

12.1.2 Mount Point Layout

Each attached volume is mounted at a deterministic path inside the Runner’s execution environment:

/mnt/volumes/{volume_name}/

If a path_prefix is specified in the attachment, only that subtree is visible:

# Full volume mount:
/mnt/volumes/agent-memory/
├── state/
│   ├── memory.json
│   └── portfolio.json
├── logs/
│   └── 2026-03-04/
│       └── analysis.json
└── config.json

# Prefix-scoped mount (path_prefix="state/"):
/mnt/volumes/agent-memory/
├── memory.json
└── portfolio.json

12.1.3 Sync Strategy (Hybrid)

The FUSE mount uses a hybrid sync strategy combining local caching with background synchronization: Local layer: A tmpfs (in-memory filesystem) provides fast local reads and writes. All filesystem operations hit the local layer first. Background sync daemon: A process running alongside the container that bridges local state with Relay Nodes:

Pull cycle (Relay Nodes → local): Re-fetches the volume’s committed manifest from Relay Nodes, verifies it against the onchain manifest_root (§7.5), and materializes any new objects not already in the local tmpfs. Because reads are READ_COMMITTED (§7.5), the pull cycle only discovers objects that have been included in a committed manifest — objects from a sub-agent become visible only after that sub-agent calls commit_manifest(). In the agent swarm pattern, this means the coordinator sees a sub-agent’s files appear as a batch when the sub-agent commits, not individually as they are written.
Push cycle (local → Relay Nodes): Detects locally written or modified files (via inotify/fswatch), encrypts them, erasure-codes, and distributes shards to Relay Nodes.

Sync interval: Configurable per mount, default SYNC_INTERVAL_SECONDS = 5. Implementations SHOULD also support an explicit sync trigger (e.g., Bash("sync /mnt/volumes/agent-memory/")) for applications that need immediate durability. Read behavior:

Scenario	Behavior
File exists in local cache	Return from cache (fast, ~microseconds)
File not in local cache, exists on Relay Nodes	Fetch on demand: shards → reconstruct → decrypt → cache locally → return
File written locally, not yet pushed	Return local version
File written by another writer, not yet pulled	Not visible until next pull cycle

Write behavior:

Scenario	Behavior
Write new file	Write to local tmpfs immediately. Queued for next push cycle.
Overwrite existing file	Update local tmpfs. Queued for next push cycle. Previous version on Relay Nodes is replaced after push.
Write exceeds `max_bytes` quota	`write()` returns `ENOSPC`.

On container shutdown: The sync daemon performs a final push of all dirty files, then commits the manifest onchain. If the container crashes before final push, data written since the last push cycle is lost (durability window = sync interval). CapToken refresh. A long-lived mount can outlive its CapToken’s valid_until. The sync daemon MUST re-mint the CapToken before expiry (via the mount’s set_auth_token hook) and continue without interrupting in-flight reads or writes; a write that races an expired token retries with the refreshed token rather than surfacing a 403 to the workload. The Object API equivalent for batched writes is in §12.2.1.

12.1.4 FUSE Operation Mapping

POSIX operation	CIP-9 equivalent	Notes
`open(path, O_RDONLY)`	`get_object(path)` (lazy, on first `read()`)	READ_ONLY or READ_WRITE mode. Returns `EACCES` for WRITE_ONLY mounts.
`open(path, O_WRONLY)`	Buffered locally	WRITE_ONLY or READ_WRITE mode. Returns `EACCES` for READ_ONLY mounts. Pushed to Relay Nodes by sync daemon.
`read(fd, buf, size)`	Returns from local cache or fetches from Relay Nodes	Transparent to the caller
`write(fd, buf, size)`	Writes to local tmpfs	Async push to Relay Nodes
`readdir(path)`	`list_objects(prefix)`	Returns from local manifest (refreshed by pull cycle)
`stat(path)`	Object metadata from manifest	Size, mtime (from manifest timestamp)
`unlink(path)`	Returns `EPERM`	Runners cannot delete. Account owner uses onchain API.
`mkdir(path)`	No-op (directories are implicit in path structure)	`mkdir -p` works; directories exist when files exist under them
`rename(old, new)`	Not supported	Returns `ENOTSUP`. Write to new path + leave old.

12.2 Object API (Programmatic)

The Object API is the low-level interface used internally by the FUSE layer and available directly for programmatic workloads. This is the appropriate interface when:

The Runner is a script (not an LLM doing tool calling) that just needs to write output files.
The workload is lightweight and a full FUSE mount is unnecessary overhead.
The orchestration layer needs to interact with storage outside of a container context.

# Object operations (subject to CapToken permissions)
put_object(cap_token: bytes, object_path: str, data: bytes) -> ObjectReceipt
put_many(cap_token: bytes, items: list[(str, bytes)]) -> list[ObjectReceipt]  # batched; refreshes token mid-stream (§12.2.1)
get_object(cap_token: bytes, object_path: str) -> bytes          # READ_ONLY or READ_WRITE
list_objects(cap_token: bytes, prefix: str = "") -> list[str]    # READ_ONLY or READ_WRITE

# CBFS-level commit (data plane)
commit(cap_token: bytes) -> CommitReceipt

# Cowboy-level commit (control plane, onchain)
commit_manifest(cap_token: bytes, prev_root: bytes32, new_root: bytes32,
                added_shards: [ShardRef], removed_shards: [ShardRef]) -> bool

Two-level commit. Committing a volume is a two-step process that reflects the data-plane / control-plane split (§4.3):

CBFS commit (commit): Publishes the manifest DAG’s new nodes — only the delta’s added_nodes, since unchanged nodes are structurally shared with the prior root (§3.1) — to Relay Nodes, each addressed by manifest_node_shard_id(locator), and publishes PlacementRecords to all assigned Relay Nodes via PutPlacement (§5.3.1). This is a data-plane operation — it makes the data durable and discoverable by other CBFS clients, but does not touch the chain. PlacementRecord publication is best-effort; failures are logged but do not block the commit.
Cowboy commit (commit_manifest): Submits the new manifest DAG root, the prev_root it extends, and the added_shards / removed_shards deltas to the onchain StorageCommitment via the write-relayer (§12.2.1). The chain enforces a parent-root compare-and-swap — it rejects the commit unless prev_root equals the volume’s current manifest_root — so concurrent or stale commits cannot silently lose updates. From the shard deltas it updates the volume’s public shard_root (§5.6) incrementally in O(changed), with no full pass. This anchors the manifest for READ_COMMITTED consistency (§7.5) and PoR (§5.6). Individual ObjectDescriptors stay off-chain in the manifest DAG; for a public volume the plaintext root yields O(log N) object-inclusion proofs, while for a private volume object contents stay opaque and shard retrievability is proven against shard_root instead.

Commit ordering and atomicity. Manifest nodes are content-addressed and immutable (§3.1): a commit publishes new nodes under new locators and never overwrites the nodes of the previous root. Combined with the parent-root compare-and-swap on commit_manifest (§7.3.1), this makes commit naturally crash-safe. The required order is: publish the new nodes, then advance the on-chain root. If the on-chain commit fails, the chain still points at the prior root, whose nodes are untouched and fully intact — the volume simply remains at its previous consistent state, with no “manifest root mismatch” and no brick. Nodes orphaned by a failed (or superseded) commit are unreferenced and garbage-collected (§13); a conforming implementation MUST therefore reclaim now-unreferenced nodes only after the root advances, never before. On every successful commit_manifest, the Storage Manager emits a chain event:

event ManifestCommitted {
    volume_id:      bytes32,
    manifest_root:  bytes32,
    block_height:   u64,
    raw_size_delta: i64,     // signed delta from the previous manifest
    visibility:     u8       // 0 = PRIVATE, 1 = PUBLIC
}

Subscribers (Gateways, indexers) use it for eager cache invalidation; polling manifest_root remains valid as a floor (MANIFEST_POLL_INTERVAL blocks). raw_size_delta lets indexers update aggregates without re-reading the manifest, and visibility lets consumers filter for public volumes without joining against StorageCommitment. Under the hood, put_object encrypts, erasure-codes, and distributes shards, producing an ObjectDescriptor (stored in the manifest) and a PlacementRecord (published at commit time). get_object reads the PlacementRecord to locate shards, fetches K shards, reconstructs, decrypts, and verifies. The caller does not interact with individual shards or Relay Nodes.

12.2.1 Manifest Commit Transport (Write-Relayer)

Control-plane writes — create_volume, commit_manifest, UpdateMountAllowlist, relay registration, escrow deposits — are not issued as direct RPC from the client. They go through the cowboy-ras-write-relayer: the client signs an owner authorization and POSTs it; the relayer builds the signed transaction, pays the gas, and submits it. This keeps the data path chain-free (a client needs only an RPC URL, §16.7) and keeps chain funds off every client machine. Normative requirements:

Routable bind + self-registration. A write-relayer MUST bind a routable address (not loopback) and, given a public endpoint, self-register on the on-chain /ras/write-relayers registry, so clients discover relayers from chain rather than hardcoding an address. A loopback-only, unregistered relayer breaks the “RPC-URL only” goal — it forces an out-of-band tunnel.
Canonical owner-authorization signing domain. The owner authorization MUST canonicalize addresses as EIP-55 in both the payload hash and the signing bytes; a mismatch between the two is a class of signature-rejection bug and is non-conforming. The owner-action payload is signed over an explicit, language-neutral big-endian byte layout (not a Rust struct encoding), so a non-Rust or alternate-chain backend can produce an identical signing domain. (This is the target encoding — registry-proto/src/canonical.rs — and is not yet wired into the signing path, which still bincode-serializes; see §17.3. Do not sign owner actions from a non-Rust client until the canonical path ships.)
Per-relayer gas key. Each relayer instance MUST use its own gas key; a key shared across instances races on nonce selection.

Long-write CapToken refresh. Batched writes (put_many) and other long operations MUST refresh the CapToken mid-stream. Minting a single token at open and never refreshing causes loads that outlive the token TTL to fail 403 mid-batch; the writer MUST re-mint (the same set_auth_token path the FUSE mount uses, §12.1.3) before expiry and continue without aborting the batch.

12.3 Account Owner API (via Storage Manager Actor)

# Volume lifecycle
create_volume(
    volume_name: str,
    max_size_bytes: int = 0,
    erasure_k: int = 4,                       # data shards (default 4)
    erasure_m: int = 2,                       # parity shards (default 2)
    visibility: str = "PRIVATE",              # "PRIVATE" (encrypted, default) or "PUBLIC" (unencrypted, open reads)
    wrapping_key_policy: str = "owner-key",   # access class (§9.1): "owner-key" | "committee-only" | "tee-gated"
    initial_escrow: int = 0,                  # CBY prepaid toward storage rent (§10.3.1)
) -> bytes32  # returns volume_id; writes StorageCommitment + name index atomically (§11.3)

delete_volume(volume_name: str) -> bool           # soft-delete: sets status=DELETED, starts GC timer
list_volumes() -> list[VolumeInfo]
transfer_volume(volume_name: str, new_owner: address) -> bool  # transfer ownership to another account

# Lookup & cross-owner access
get_volume_id_by_name(owner: address, volume_name: str) -> bytes32   # (owner, name) → volume_id (§11.3)
update_mount_allowlist(volume_name: str, grant: MountGrant, remove: bool = False) -> bool  # owner-signed (§7.7)

# Object management
delete_object(volume_name: str, object_path: str) -> bool
list_objects(volume_name: str, prefix: str = "") -> list[ObjectInfo]
get_volume_info(volume_name: str) -> VolumeInfo

# Billing
get_storage_usage() -> AccountStorageSummary
reserve_storage_balance(amount: uint256) -> bool

12.4 CIP-2 Task Definition Extension

The OffchainTask struct from CIP-2 is extended with an optional volume_attachments field:

struct OffchainTask:
    ... (existing CIP-2 fields) ...
    volume_attachments: list[VolumeAttachment]  # NEW: optional

Where:

struct VolumeAttachment:
    volume_name:      string
    access_mode:      uint8          # 0 = READ_ONLY, 1 = WRITE_ONLY, 2 = READ_WRITE
    path_prefix:      string         # "" for full volume access (canonicalized with trailing /)
    max_bytes:        uint64         # byte quota for this job
    mount:            bool           # true = FUSE mount at /mnt/volumes/{name}, false = Object API only
    sync_interval:    uint32         # seconds between sync cycles (default 5, mount only)
    # no read_consistency field — all reads are READ_COMMITTED (see §7.5)

12.3.1 Volume Creation and Lookup Errors

Volume-create and name-lookup failures MUST be distinct, named errors rather than a single opaque E1900 "invalid data" (which today conflates a duplicate (owner, name) with an escrow-path failure):

E_VOLUME_NAME_DUPLICATE — (owner, volume_name) already present in the name index (§11.3).
E_VOLUME_ESCROW_INSUFFICIENT — initial_escrow (or account balance) below MIN_STORAGE_BALANCE (§10.3.1).
E_VOLUME_PARAM_INVALID — erasure k/m out of bounds, unknown visibility or wrapping_key_policy, or a volume_name violating the [a-zA-Z0-9_\-.] / 64-byte rule (§6.1).
E_VOLUME_NOT_FOUND_FOR_OWNER — a name lookup missed the index for this owner-domain (§11.3); this MUST NOT be reported as a bare 404 that hides whether the caller queried under the wrong owner.

Each maps to a stable code/string so the CLI and SDK can branch on it rather than string-matching a generic message.

13. Garbage Collection

13.1 Triggers

Volume data is garbage-collected under two conditions:

Explicit deletion: Account owner calls delete_volume() (soft-delete + grace window, §13.2) or delete_object().
Escrow/balance exhaustion: after escrow is depleted and the account balance stays below MIN_STORAGE_BALANCE through STORAGE_GRACE_EPOCHS (§10.3). There is no separate expiry_height — volume lifetime is governed by escrow/rent, not a fixed height.

13.2 Process

13.3 Deletion Semantics

On delete_object: All shards of the object are marked for removal on their respective Relay Nodes. The onchain manifest is updated. Relay Nodes garbage-collect shard data asynchronously, but the object is immediately inaccessible to CapToken holders.
On delete_volume: The volume enters the DELETED state (soft-delete). All active CapTokens are revoked. New CapTokens cannot be issued. Storage fees continue accruing during a grace window of VOLUME_DELETE_GRACE_EPOCHS (default: same as STORAGE_GRACE_EPOCHS). During this window, the account owner may call undelete_volume() to restore the volume to ACTIVE status. After the grace window, the volume transitions to GARBAGE_COLLECTING and all shards are purged. The onchain Storage Commitment is removed and storage fees cease.
Garbage collection is irreversible. Once a volume enters GARBAGE_COLLECTING, data cannot be recovered.

13.4 Ownership Transfer

The account owner may transfer a volume to another Cowboy account via transfer_volume(volume_name, new_owner). Transfer semantics:

All active CapTokens are revoked (they were issued under the old owner’s authority).
The StorageCommitment.owner field is updated atomically.
The new owner assumes billing responsibility starting from the next epoch.
For private volumes, the owner wrap is re-encrypted to the new owner’s wallet-derived key as part of the transfer transaction (a committee wrap, if present, is unaffected — the committee can still serve the new owner’s runners). This requires the old owner to unwrap and re-wrap the DEK, so transfer is an interactive operation requiring the old owner’s cooperation.
For public volumes, no key re-wrapping is needed (no DEK exists).
Transfer of a volume in DELETED or GARBAGE_COLLECTING status is rejected.

13.5 Gateway HTTP Serving by Status

A public-volume Gateway (CIP-15) maps the volume’s StorageCommitment.status to HTTP serving behavior. There is no separate DELINQUENT status — the existing lifecycle states are sufficient:

`status`	Gateway behavior	Header
`ACTIVE`	Serve normally	(none)
`GRACE_PERIOD`	Serve	`X-Cowboy-Storage-Status: grace` (advisory)
`DELETED`	`503 Service Unavailable`	`X-Cowboy-Error: VOLUME_DELETED`
`GARBAGE_COLLECTING`	`410 Gone`	`X-Cowboy-Error: VOLUME_GC`

GRACE_PERIOD keeps serving deliberately — the owner may top up at any moment, and an abrupt 503 is a worse experience than serving with an advisory header. DELETED reflects owner-intent removal (recoverable during the deletion grace window but not served); GARBAGE_COLLECTING is irreversible, so 410 Gone is the correct permanent-removal semantic.

14. Parameters

Parameter	Value	Notes
Volume
`VOLUME_CREATION_FEE`	1,000 CBY	Covers onchain commitment
`MAX_VOLUMES_PER_ACCOUNT`	256	Abuse protection
`MAX_OBJECTS_PER_VOLUME`	1,000,000	Manifest size bound
`MAX_OBJECT_SIZE`	1 GiB	Per-object limit
`MAX_VOLUME_SIZE`	100 GiB	Per-volume limit
`MAX_VOLUME_NAME_LENGTH`	64 bytes
`MAX_OBJECT_PATH_LENGTH`	512 bytes
`VOLUME_DELETE_GRACE_EPOCHS`	86,400	~24 hours soft-delete recovery window (at 1s blocks per WP §6.1)
Erasure Coding
`DEFAULT_ERASURE_K`	4	Data shards
`DEFAULT_ERASURE_M`	2	Parity shards
`MAX_ERASURE_K`	16	Upper bound for custom K
`MAX_ERASURE_M`	8	Upper bound for custom M
Billing
`BASE_ATTACHMENT_FEE`	100 CBY	Per volume per job
`STORAGE_FEE_PER_BYTE_PER_EPOCH`	10 nano-CBY (see CIP-31 §1)	Tier-0 tunable; canonical value in CIP-31
`TRANSFER_FEE_PER_BYTE`	1 nano-CBY (see CIP-31 §2)	Tier-0 tunable; canonical value in CIP-31
`STORAGE_FEE_BURN_BPS`	1000 (10%) — see CIP-31 §4	Tier-0; three-way split invariant in CIP-31 §4
`STORAGE_FEE_CHALLENGE_POOL_BPS`	200 (2%) — see CIP-31 §4	Tier-0; replaces the legacy `POR_CHALLENGE_FEE_SHARE` 2% row
`STORAGE_FEE_RELAY_BPS`	8800 (88%) — see CIP-31 §4	Tier-0; pro-rata distributed per CIP-31 §5
`MIN_STORAGE_BALANCE`	1 × per-epoch fees at current rate (see CIP-31 §3)	formula-derived; Tier-0 multiplier
`STORAGE_GRACE_EPOCHS`	86,400	~24 hours at 1s blocks (WP §6.1)
Relay Nodes
`MIN_RELAY_STAKE`	5,000 CBY (see CIP-31 §6)	Tier-0 tunable
`MAX_RELAY_HEALTH`	100	Blocks; reset on heartbeat
`MIN_HEALTH_FOR_ASSIGNMENT`	50	Minimum health to receive new shards
`RELAY_UNSTAKE_DELAY`	86,400	~24 hours cooldown (at 1s blocks)
`REPAIR_CHECK_INTERVAL`	3,600	Blocks between proactive repair checks (~1 hour at 1s blocks)
`ORPHAN_SHARD_TTL`	86,400	Blocks before unreferenced shards are garbage collected (~24h at 1s blocks)
Proof of Retrievability
`POR_CHALLENGE_INTERVAL`	7,200	Blocks between challenge rounds (~2 hours at 1s blocks per WP §6.1)
`POR_RESPONSE_WINDOW`	600	Blocks to respond (~10 minutes at 1s blocks)
`POR_MISS_PENALTY`	50 CBY (see CIP-31 §8)	Tier-0 tunable
`POR_FRAUD_PENALTY`	500 CBY (see CIP-31 §8)	Tier-0 tunable
`RELAY_EVICTION_PENALTY`	2,000 CBY (see CIP-31 §8)	Tier-0 tunable
`RELAY_CHALLENGE_BOND`	10 CBY (see CIP-31 §7)	Tier-0; new field — required to call `por_challenge`
`POR_CHALLENGE_FEE`	1 CBY (see CIP-31 §7)	Tier-0; per-challenge fee retained by `0x0B` regardless of outcome
`CHALLENGER_BOUNTY`	5 CBY (see CIP-31 §7)	Tier-0; paid from challenge pool on valid fraud/miss detection
`MAX_SHARD_AGE_FOR_WEIGHTING`	90 epochs (see CIP-31 §5)	Tier-0; caps pro-rata weight to prevent permanent first-mover advantage
Filesystem Mount
`DEFAULT_SYNC_INTERVAL`	5 seconds	Background push/pull frequency
`MIN_SYNC_INTERVAL`	1 second	Minimum allowed sync interval
`MAX_LOCAL_CACHE_SIZE`	10 GiB	Per-mount tmpfs limit

These parameters are governance-tunable and may be adjusted via governance proposals.

15. Security Considerations

15.1 CapToken Forgery

A CapToken’s authority comes from Storage Manager actor state, not an in-token signature (§7.1): the issuing system instruction writes the authoritative token into chain state, and a presented token is honored only if it matches that stored copy byte-for-byte. Forging a token therefore requires writing Storage Manager state, which is equivalent to compromising consensus. The nonce field is monotonically increasing per (owner, volume), preventing replay of old tokens, and valid_until is fixed at issuance in the stored copy — a Runner cannot extend a token’s lifetime.

15.2 Runner Compromise

WRITE_ONLY token. A compromised Runner can write garbage data, consuming the account’s byte quota. This includes writing shards outside the CapToken’s path prefix — Relay Nodes cannot detect this because shard IDs are opaque (§7.3.1). Mitigations:

Byte quotas limit the damage per job.
Prefix-confined commits: A WRITE_ONLY commit may change only manifest nodes under the committer’s prefix — enforced onchain for public volumes (parent-root CAS + changed-node prefix check) and via staged-commit + DEK-holder finalize for private volumes (§7.3.1). Out-of-prefix commits are rejected (public) or never finalized (private), not merely detected after the fact.
Orphan shard GC: Shards not referenced by any committed manifest are garbage collected after ORPHAN_SHARD_TTL.
No delete access: A compromised Runner cannot destroy existing data.

READ_ONLY token. A compromised Runner can exfiltrate data within its scope. Mitigations are limited to TEE attestation and path prefix scoping. READ_ONLY is still preferable to READ_WRITE when the Runner only needs to consume data, because it eliminates the write-garbage attack vector entirely. READ_WRITE token. Combines both risks: data exfiltration and garbage writes. Mitigations:

TEE attestation: For sensitive workloads, require TEE-attested Runners (CIP-2 tee_required=true).
Minimal scope: Use path_prefix to restrict access to only the necessary sub-path.
Account owner discretion: READ_WRITE is an explicit opt-in; the account owner accepts the elevated trust.

15.3 Relay Node Compromise

Relay Nodes hold opaque ciphertext shards. Without the volume DEK, a compromised Relay Node cannot read data. The specific attack vectors and mitigations:

Attack	Mitigation
Shard deletion (data loss)	Erasure coding: any K of K+M shards reconstruct the object. Attacker must compromise M+1 nodes holding shards of the same object.
Shard corruption (integrity)	Content hashing: clients verify `shard_hash` on every read. Corrupt shards are detected and the client fetches from alternative nodes.
Data withholding	Health monitoring + PoR challenges (§5.6). Relay Nodes that fail to serve shards are detected and replaced.
Manifest root spoofing	Only holders of valid CapTokens with write access can submit `commit_manifest`. Readers verify manifest root against onchain `StorageCommitment`.
Placement record leakage	PlacementRecords reveal which nodes hold which shards. All placement RPCs are auth-gated (§7.1.1). For public volumes, placement reads follow the open-access model (§7.6.3).

15.4 Denial of Service

Vector	Mitigation
Volume spam	`VOLUME_CREATION_FEE` (1,000 CBY) + `MAX_VOLUMES_PER_ACCOUNT` (256)
Object spam	`MAX_OBJECTS_PER_VOLUME` (1M) + per-CapToken byte quotas
Relay Sybil	`MIN_RELAY_STAKE` prevents cheap node registration
Billing evasion	Grace period → garbage collection ensures unpaid storage is reclaimed
CapToken exhaustion	Tokens are time-bounded and nonce-gated; expired tokens are discarded

15.5 Key Management

A Volume DEK is wrapped per the volume’s access class (§9.1): to the owner’s wallet-derived key, to the CBSS committee, or both. Losing the owner’s wallet key forfeits the owner-wrap path, but a committee-wrapped volume remains readable through an authorized runner. The wraps live on-chain in the StorageCommitment, so the DEK itself is never at risk of loss — only the ability to unwrap a given wrap.
Dispatcher trust boundary. The Dispatcher is never on the key path: for private volumes the Runner completes the CBSS SealRequest itself (§9.2) and the Dispatcher only constructs and authorizes it, so the Dispatcher never sees a plaintext DEK. A compromised Dispatcher could mis-authorize a seal request — bounded by the same consensus-level trust already required to issue CapTokens — but cannot exfiltrate volume DEKs, since it never holds them.
Key rotation — re-wrapping the DEK under a new key without re-encrypting data — is future work; today a compromised owner key requires creating new volumes and re-encrypting. During ownership transfer (§13.4) the DEK is re-wrapped to the new owner: the one case where a wrap changes without re-encrypting all data.

16. Implementation Notes

16.1 Canonical Implementation Stack

The canonical implementation of this CIP is the cbfs workspace in this repository. The major protocol surfaces in this document map directly onto the following CBFS components:

CBFS component	Responsibility in this CIP
`cbfs-node`	Relay Node daemon, shard serving, placement sync, repair, GC, health reporting
`cbfs-sdk`	`create/open/put/get/commit`, manifest fetch/verify, placement fetch/publish
`cbfs-fuse`	FUSE mount, inode/cache layer, sync daemon, POSIX interface
`cbfs-hooks::AuthProvider`	CapToken validation and `PUBLIC` visibility auth decisions
`cbfs-hooks::AuthoritativeStore`	Canonical `manifest_root` commit/read boundary
`cbfs-hooks::ManifestRegistry`	Live-shard registration for GC and repair bookkeeping
`cbfs-hooks::MeteringSink`	Storage usage reporting into Cowboy billing

Underlying Rust crates such as reed-solomon-erasure, aes-gcm, blake3, fuser, and QUIC transport libraries remain implementation details of the canonical CBFS stack rather than separate pluggable protocol choices.

16.2 FUSE Mount Implementation

The FUSE mount layer translates POSIX filesystem operations into CIP-9 object operations. The implementation consists of three components:

FUSE daemon (fuser crate): Implements the Filesystem trait, handling read, write, readdir, getattr, etc. Delegates to the local cache layer.
Local cache (tmpfs-backed): An in-memory filesystem that serves as the working copy. All reads/writes hit the cache first. The cache is populated lazily on first access (fetch from Relay Nodes) and eagerly for files modified locally.
Sync daemon (background task): Runs a push/pull loop at the configured sync_interval. Uses notify for detecting local changes and polls Relay Nodes for remote changes. Handles encryption, erasure coding, and shard distribution.

┌─────────────────────────────────────────────────────────┐
│ Container / Runner process                              │
│                                                         │
│  Model (Claude, Kimi-K2, etc.)                          │
│    │                                                    │
│    ├─ Read("/mnt/volumes/mem/state.json")               │
│    ├─ Write("/mnt/volumes/mem/state.json", data)        │
│    └─ Bash("ls /mnt/volumes/mem/")                      │
│         │                                               │
│         ▼                                               │
│  ┌─────────────┐     ┌──────────────┐                   │
│  │ FUSE daemon │◄───►│ Local cache   │                  │
│  │ (fuser)     │     │ (tmpfs)       │                  │
│  └─────────────┘     └──────┬───────┘                   │
│                             │                           │
│                      ┌──────▼───────┐                   │
│                      │ Sync daemon  │                   │
│                      │  push/pull   │                   │
│                      └──────┬───────┘                   │
│                             │                           │
└─────────────────────────────┼───────────────────────────┘
                              │ encrypt / erasure code / QUIC
                              ▼
                     ┌─────────────────┐
                     │   Relay Nodes   │
                     └─────────────────┘

16.3 Object API Client Library

For direct Object API usage (no FUSE mount), the Runner SDK exposes a high-level interface that abstracts the storage internals:

// Programmatic usage (non-agentic runners):
let data = volume.get("state/memory.cbor").await?;
volume.put("state/memory.cbor", &updated_data).await?;

// Under the hood:
// get: fetch PlacementRecord → identify assigned nodes → fetch K shards
//      → Reed-Solomon reconstruct (truncate to ciphertext_size)
//      → verify ciphertext hash → AES-GCM decrypt → verify content hash
// put: AES-GCM encrypt → Reed-Solomon encode → distribute K+M shards
//      → produce ObjectDescriptor (manifest) + PlacementRecord (published at commit)

16.4 Relay Node Implementation

A Relay Node (implemented as a cbfs-node daemon) runs the following subsystems:

Blob store (cbfs-store): A sled-backed key-value store mapping (shard_id, shard_index) → shard_bytes. The shard_id is an opaque BLAKE3 hash (see §5.3) — Relay Nodes never see object paths, only opaque identifiers. This ensures the privacy guarantee in §9.3.
Placement store (cbfs-placement): A sled-backed store mapping shard_id → PlacementRecord. Stores the mutable shard-to-node assignments separately from shard data, enabling repair workers to read placement information without accessing the encrypted manifest.
RPC server: Accepts shard operations (PUT_SHARD, GET_SHARD, PROVE_SHARD) and placement operations (PutPlacement, GetPlacement, ReplicatePlacement). All operations are authenticated by the AuthProvider (§7.1.1). For public volumes (visibility = PUBLIC), GET_SHARD and GetPlacement are served without CapToken verification (§7.6.3); write operations still require a CapToken. Relay Nodes do NOT expose any listing operation — object listing is performed client-side by reading the manifest. Requests are keyed by shard_id, not object path.
Repair loop (repair.rs): Periodically runs the two-phase autonomous repair cycle (§5.5). Uses a mutable peer list (Arc<RwLock<Vec<NodeInfo>>>) bootstrapped from seed peers in the node config and updated as new nodes are discovered.
Placement sync (placement_sync.rs): Replicates PlacementRecords to peer nodes to ensure all assigned nodes have a consistent view of shard assignments.
Heartbeat loop: Periodically calls heartbeat() on the onchain Relay Registry.
Spot-check responder: When challenged, returns a random chunk of a specified shard for integrity verification.

The operational requirements for running a Relay Node are modest: stable uptime, network connectivity, and disk space. No GPU, no TEE, no high-compute requirements.

16.5 Performance Expectations

Object API (direct):

Operation	Expected Latency	Notes
`create_volume`	1 block (~1s)	onchain transaction
`put_object` (1 MiB)	~200ms	Encrypt + erasure code + distribute 6 shards in parallel
`put_object` (100 MiB)	2-8s	Dominated by network upload of ~150 MiB total shards
`get_object` (1 MiB)	~200ms	Fetch 4 shards in parallel + reconstruct + decrypt
`get_object` (100 MiB)	2-8s	Dominated by network download of ~100 MiB from 4 shards
`commit_manifest`	1 block (~1s)	onchain transaction
`delete_object`	1 block (~1s)	onchain manifest update
Volume attachment	~100-500ms	Key delivery + manifest fetch from Relay Nodes

Filesystem mount (FUSE):

Operation	Expected Latency	Notes
Read (cached file)	~microseconds	From local tmpfs, no network
Read (uncached, 1 MiB)	~200ms-1s	Fetch from Relay Nodes on demand, then cached
Write (any size)	~microseconds	To local tmpfs; async push to Relay Nodes
`ls` (cached directory)	~microseconds	From local manifest
`ls` (uncached directory)	~100ms	Manifest fetch from Relay Nodes
`grep` across cached files	Native speed	All local after first access
Durability window	≤ sync_interval	Data not yet pushed is lost on crash

16.6 Runner ↔ Validator Wire Compatibility

Runners submit transactions the validator must decode: heartbeats, job results, manifest commitments. A runner binary built against an older node can emit an encoding the newer validator cannot decode (failed to decode transaction submission … CBOR decode); the runner then polls forever and completes nothing, and the failure is invisible on the runner side. CIP-9 requires this seam be legible:

A runner’s cowboy-ras protocol types MUST be pinned compatibly with the deployed validator. A runner outside the validator’s supported wire range is rejected with a clear, typed reason (“unsupported transaction wire version — upgrade runner”) surfaced back to the submitter, not a silent drop or a generic decode error.
Wire-format changes to runner↔chain transactions MUST be versioned (a discriminant the validator checks), and the validator SHOULD accept a defined compatibility window rather than only the exact current version, so a rolling upgrade does not strand every runner at once.
Decode-reject reasons MUST be observable to the runner operator — a runner that cannot complete jobs because of wire skew has to be able to find out why.

16.7 Client Bootstrap and Discovery

A conforming client needs only the node RPC URL. Everything else is discovered from chain; flags and environment variables are overrides, not defaults:

chain_id / network from chain-info, plus consensus parameters and the basefee schedule (GET /basefee, suggested_max_fee_*) — not hand-set flags. A wrong or missing chain_id/fee fails certificate auth or is silently dropped by the mempool (fee_below_basefee), so guessing is worse than discovering.
Storage relays from the relay registry (§5.2), write-relayers from /ras/write-relayers (§12.2.1), and CBSS DEK delivery from chain (§9.2) — the seal committee from the CBSS committee registry (CIP-24) by epoch, and sealed ciphertext as a CBFS object from the relays already listed in the registry. None of it is client-configured.
The RPC URL MUST be resolved once at startup and threaded everywhere. A subcommand MUST NOT silently fall back to a hardcoded default (e.g., 127.0.0.1:4000) or ignore an explicit --indexer-url; a missing or unreachable RPC is a clear error, not a silent localhost attempt.

17. Scope and Future Work

17.1 In Scope (This CIP)

Private and public account-scoped volumes.
READ_ONLY, WRITE_ONLY, and READ_WRITE CapToken access modes; PUBLIC volume visibility mode (§7.6).
Concurrent CapTokens on the same volume (agent swarm pattern).
Two client interfaces: FUSE filesystem mount (for agentic/LLM workloads) and Object API (for programmatic workloads).
Hybrid sync strategy for FUSE mounts (local cache + background push/pull).
Relay Nodes as a dedicated storage layer with staking and incentives.
Reed-Solomon erasure coding (default 4/6) for durability.
AES-256-GCM encryption with HKDF-derived keys.
BLAKE3 content hashing at object, ciphertext, and shard levels.
Path-based addressing with Merkle manifest for integrity proofs.
Per-epoch billing with grace period and garbage collection.
Lazy and proactive shard repair.
Private dual-wrapped volumes with first-class access classes (owner-key, committee-only; tee-gated reserved), and owner-curated cross-owner mount allowlists (§7.7).
Control-plane writes via the cowboy-ras-write-relayer, with client discovery of chain parameters, relays, write-relayers, and CBSS endpoints from chain (§16.7).

17.2 Explicitly Out of Scope

Content-addressed retrieval: RAS uses path-based addressing only. CID-based retrieval (IPFS-style) may be layered on top in the future.
Storage marketplace: A competitive market where Relay Nodes bid on storage deals (Filecoin-style) is a future extension; pricing is protocol-set.
Alternative storage backends: This CIP standardizes on CBFS as the canonical storage layer. Supporting Filecoin, Arweave, or other backends under the same CIP-9 surface is future work and would require a follow-on standard.
READ_UNCOMMITTED consistency mode: Pre-commit read visibility for real-time agent swarms. Requires a shard discovery mechanism (pubsub or uncommitted manifest fragments) not defined in this CIP. See §7.5.
Key rotation: Rotating volume encryption keys without re-encrypting all data.
Runner-initiated task dispatch: Allowing a coordinator Runner to dynamically spawn sub-agent tasks. Currently, all tasks must be dispatched by onchain Actors. This is a CIP-2 extension.
Full container runtime spec: Container image management, container registries, resource limits, GPU passthrough, and network policies. CIP-9 defines the CBFS-backed storage primitive (including the FUSE mount and object API); a separate CIP will define the full container runtime that consumes it.

Appendix A: Worked Examples

These examples show the two interaction patterns: filesystem mounts for agentic workloads (LLMs doing tool calling) and direct object writes for lightweight programmatic workloads. In all examples, the Runner runtime handles encryption, erasure coding, and Relay Node communication transparently.

A.1 AI Agent with Persistent Memory (Filesystem Mount)

An autonomous trading agent runs as Claude with tool calling. The model reads its prior state, performs analysis, and writes updated state — all using its standard filesystem tools. Actor dispatches the job:

submit_task(
    task_definition=encode_task({
        "model": "claude-sonnet",
        "system": "You are a trading analyst. Your memory and portfolio are in /mnt/volumes/agent-memory/state/.",
        "prompt": "Analyze today's market conditions. Update your memory and portfolio.",
    }),
    volume_attachments=[
        VolumeAttachment(
            volume_name="agent-memory",
            access_mode=READ_WRITE,
            mount=True,                    # FUSE mount at /mnt/volumes/agent-memory/
            max_bytes=500 * 1024 * 1024,
        ),
    ],
    num_runners=1,
    timeout_blocks=500,
    proof_type_requested=TEE,
    ...
)

What Claude does (tool calling inside the Runner): The model uses its existing tools. It does not know about CapTokens, shards, or Relay Nodes.

── Claude's tool calls during execution ──────────────────────────

1. Read("/mnt/volumes/agent-memory/state/memory.json")
   → Returns the agent's memory from yesterday's run
   → (Under the hood: FUSE → fetch shards from Relay Nodes → reconstruct → decrypt)

2. Read("/mnt/volumes/agent-memory/state/portfolio.json")
   → Returns the current portfolio state

3. Bash("ls /mnt/volumes/agent-memory/logs/")
   → 2026-03-01/  2026-03-02/  2026-03-03/

4. Bash("grep -r 'NVDA' /mnt/volumes/agent-memory/logs/2026-03-03/")
   → Shows yesterday's NVDA-related log entries

5. [Claude performs analysis, makes decisions]

6. Write("/mnt/volumes/agent-memory/state/memory.json", updated_memory)
   → Writes to local tmpfs instantly
   → (Under the hood: sync daemon pushes to Relay Nodes within 5 seconds)

7. Write("/mnt/volumes/agent-memory/state/portfolio.json", updated_portfolio)

8. Write("/mnt/volumes/agent-memory/logs/2026-03-04/analysis.json", todays_analysis)

── Container shuts down ──────────────────────────────────────────
   → Final sync pushes any remaining dirty files
   → Manifest committed onchain
   → tmpfs deleted. Data persists on Relay Nodes.

Next time this job runs (tomorrow), Claude gets the same volume mounted and continues from where it left off. From the model’s perspective, it’s just reading and writing files.

A.2 Distributed Scraping with Map-Reduce (Direct Object API)

Five scraper runners write results to a shared volume using direct object writes (no mount needed). A collator runner later mounts the volume as a filesystem to process everything. Actor dispatches scraper jobs (map phase):

for i, site in enumerate(sites):
    submit_task(
        task_definition=encode_task({"action": "scrape", "url": site}),
        volume_attachments=[
            VolumeAttachment(
                volume_name="scrape-results",
                access_mode=WRITE_ONLY,
                path_prefix=f"scraper-{i}/",
                max_bytes=500_000_000,
                mount=False,                   # no FUSE mount, direct object writes
            ),
        ],
        ...
    )

Scraper runner execution (direct API, no LLM):

# Simple Python script, not an LLM. Uses the Object API directly.
for page in crawl(site_url):
    put_object(cap_token, f"scraper-{my_id}/{page.slug}.json", json.dumps({
        "url": page.url,
        "title": page.title,
        "content": page.text,
        "links": page.links,
    }))
commit_manifest(cap_token, manifest_root)

Actor dispatches collator job after all scrapers complete (reduce phase):

submit_task(
    task_definition=encode_task({
        "model": "kimi-k2",
        "system": "You have scraped web data in /mnt/volumes/scrape-results/. Analyze and collate.",
        "prompt": "Find all mentions of product launches across the scraped sites. Write a summary.",
    }),
    volume_attachments=[
        VolumeAttachment(
            volume_name="scrape-results",
            access_mode=READ_WRITE,
            mount=True,                        # FUSE mount for filesystem access
            max_bytes=100_000_000,
        ),
    ],
    ...
)

What the LLM does in the collator (tool calling):

1. Bash("find /mnt/volumes/scrape-results -name '*.json' | wc -l")
   → 47

2. Bash("ls /mnt/volumes/scrape-results/")
   → scraper-0/  scraper-1/  scraper-2/  scraper-3/  scraper-4/

3. Bash("cat /mnt/volumes/scrape-results/scraper-0/about-page.json | jq '.title'")
   → "About Us - Acme Corp"

4. Bash("grep -rl 'product launch' /mnt/volumes/scrape-results/")
   → scraper-0/news.json
   → scraper-2/blog-post-3.json
   → scraper-4/press-release.json

5. Read("/mnt/volumes/scrape-results/scraper-0/news.json")
   → [full content]

6. [... reads relevant files, analyzes ...]

7. Write("/mnt/volumes/scrape-results/collated/summary.md", summary)
8. Write("/mnt/volumes/scrape-results/collated/product-launches.json", structured_data)

The scrapers used the lightweight Object API (no mount, no filesystem overhead). The collator used the FUSE mount so the LLM could explore the data with standard unix tools. Same volume, two interaction patterns.

A.3 Agent Swarm with Batch Coordination (Filesystem Mount + Concurrent Writers)

A coordinator agent and five sub-agents share a volume. Sub-agents write reports (WRITE_ONLY, prefix-scoped). The coordinator reads reports after each sub-agent commits (READ_WRITE, READ_COMMITTED) using its filesystem tools. Because reads are READ_COMMITTED (§7.5), the coordinator does not see individual files as they are written — it sees a sub-agent’s entire output appear as a batch when that sub-agent commits its manifest. Actor dispatches all jobs:

# Coordinator: READ_WRITE mount, sees the full volume
submit_task(
    task_definition=encode_task({
        "model": "claude-sonnet",
        "system": "You are coordinating 5 research agents. Their reports will appear in /mnt/volumes/swarm/agent-*/. Poll for new reports and synthesize findings. Reports appear in batches as each agent completes and commits.",
    }),
    volume_attachments=[
        VolumeAttachment(volume_name="swarm", access_mode=READ_WRITE, mount=True,
                         sync_interval=5, max_bytes=100_000_000),
    ],
    timeout_blocks=2000,
    ...
)

# Sub-agents: WRITE_ONLY mount, scoped to their prefix
for i in range(5):
    submit_task(
        task_definition=encode_task({
            "model": "claude-haiku",
            "system": f"You are research agent {i}. Write your findings to /mnt/volumes/swarm/.",
            "prompt": f"Research: {topics[i]}",
        }),
        volume_attachments=[
            VolumeAttachment(volume_name="swarm", access_mode=WRITE_ONLY, mount=True,
                             path_prefix=f"agent-{i}/", max_bytes=50_000_000),
        ],
        timeout_blocks=1000,
        ...
    )

Sub-agent (Claude Haiku) tool calls:

1. [Performs research using web tools]

2. Write("/mnt/volumes/swarm/report.md", research_findings)
   → (Visible path: agent-2/report.md due to prefix scoping)

3. Write("/mnt/volumes/swarm/sources.json", source_list)
   → (Visible path: agent-2/sources.json)

── Container shuts down ──
── Sync daemon pushes all shards to Relay Nodes ──
── Runtime calls commit_manifest(), updating onchain manifest_root ──
── Agent-2's files are now committed and visible to other readers ──

Coordinator (Claude Sonnet) tool calls:

1. Bash("ls /mnt/volumes/swarm/")
   → (empty — no sub-agents have committed yet)

   [Waits... sync daemon re-fetches manifest every 5 seconds,
    verifying against onchain manifest_root (§7.5)]

   [Agent-0 finishes and commits its manifest]

2. Bash("ls /mnt/volumes/swarm/")
   → agent-0/
   → (Agent-0 committed — all its files appear at once)

3. Read("/mnt/volumes/swarm/agent-0/report.md")
   → [agent-0's findings]

   [Agent-1 and agent-3 finish and commit around the same time]

4. Bash("ls /mnt/volumes/swarm/")
   → agent-0/  agent-1/  agent-3/
   → (Two more agents committed since last check)

5. Read("/mnt/volumes/swarm/agent-1/report.md")
   → [agent-1's findings]

6. Read("/mnt/volumes/swarm/agent-3/report.md")
   → [agent-3's findings]

   [... continues polling until all 5 agents have committed ...]

7. Write("/mnt/volumes/swarm/synthesis/final-report.md", synthesized_findings)

The coordinator sees sub-agent files appear in batches as each sub-agent commits its manifest. The sync daemon’s pull cycle re-fetches the committed manifest and verifies it against the onchain root (§7.5) before materializing new files locally. No custom polling API — just ls and Read.

A.4 Multi-Stage Pipeline with Handoff

A data processing pipeline: Runner A preprocesses data (direct writes), Runner B runs ML inference (filesystem mount). Stage 1 (preprocess, direct API):

submit_task(
    task_definition=encode_task({"stage": "preprocess", "source": "https://data.example.com/feed"}),
    volume_attachments=[
        VolumeAttachment(volume_name="pipeline", access_mode=WRITE_ONLY,
                         path_prefix="stage1/", mount=False, max_bytes=2_000_000_000),
    ],
    ...
)

Stage 2 (inference, filesystem mount — after Stage 1 callback):

submit_task(
    task_definition=encode_task({
        "model": "kimi-k2",
        "system": "Input data is in /mnt/volumes/pipeline/stage1/. Run inference and write results to /mnt/volumes/pipeline/stage2/.",
    }),
    volume_attachments=[
        VolumeAttachment(volume_name="pipeline", access_mode=READ_WRITE,
                         mount=True, max_bytes=1_000_000_000),
    ],
    ...
)

Appendix B: Comparison with Existing Systems

Dimension	RAS (This CIP)	Filecoin	IPFS	Storj	Sia
Addressing	Path + content hash	Content (CID)	Content (CID)	Path (S3)	Path (S3)
Access Control	CapToken (UCAN-like)	None built-in	None built-in	Macaroon caveats	Encryption-only
Deletion	Immediate by owner	Impossible during deal	Local only	S3 DELETE	Contract expiry
Privacy	AES-256-GCM (client-side, mandatory)	Optional (client-side)	Optional (client-side)	AES-256-GCM (client-side, default)	ChaCha20 (default)
Durability	Reed-Solomon 4/6 (1.5x)	PoSt + sector sealing	None (without pinning)	Reed-Solomon 29/80 (2.7x)	Reed-Solomon 10/30 (3x)
Billing	Per-byte per-epoch (onchain, CBY)	Per-epoch per-sector (onchain, FIL)	Free + pinning services	Tiered monthly (USD/STORJ)	Per-epoch (onchain, SC)
Provisioning	Instant	Hours (sealing)	Instant	Instant	Minutes (contracts)
Trust Model	Permissionless (staked Relay Nodes)	Permissionless (staked miners)	Permissionless (no incentive)	Centralized (Satellites)	Permissionless (staked hosts)
Integration	Native (Cowboy accounts, Runners, CBY)	Separate network	Separate protocol	Separate service	Separate network

Appendix C: CapToken Wire Format

CapToken (variable length)
┌──────────────────────────────────────────────────┐
│ version:         u8          (1 byte)            │
│ volume_id:       bytes32     (32 bytes)          │
│ access_mode:     u8          (1 byte) [0=RO,1=WO,2=RW] │
│ path_prefix_len: u16         (2 bytes)           │
│ path_prefix:     bytes       (variable)          │
│ max_bytes:       u64         (8 bytes)           │
│ valid_from:      u64         (8 bytes)           │
│ valid_until:     u64         (8 bytes)           │
│ runner_address:  bytes32     (32 bytes)          │
│ nonce:           u64         (8 bytes)           │
│ caveats_hash:    bytes32     (32 bytes)          │
│ signature:       bytes64     (64 bytes)          │
└──────────────────────────────────────────────────┘

The caveats_hash is BLAKE3(rlp(caveats_list)), enabling chained delegation without growing the token linearly with caveat depth. The signature field is reserved and presently zero: token authority is the byte-for-byte match against the copy in Storage Manager state (§7.1), not an in-token signature.

Appendix D: “Why Won’t My Volume Mount?”

A storage-attached job must clear several independent gates, in order, before a Runner executes it. Historically each failed silently or with a generic error, turning a single misconfiguration into a multi-hour investigation. A conforming implementation MUST make each gate emit a distinct, structured reason; this appendix is the canonical ordering and the signal to expect at each step.

#	Gate	Layer	Failure signal (structured)	Operator remedy
1	Deferred-tx / `job_submit` materialization (§4.2.2)	Chain (CIP-2/CIP-5)	`JobNotMaterialized { job_id }` — the continuation’s deferred tx was never applied (e.g., dropped on a speculative-cache eviction)	Confirm the validator is live and block height is advancing; verify deferred txs survive cache eviction
2	Committee size vs. storage-capable runners (§4.2.1)	Dispatcher	`InsufficientStorageCapableRunners { required, available, reasons }`	Add storage-capable runners, or reconcile committee size with the storage-runner population
3	Runner health / reputation (§8.2.1)	Dispatcher	runner excluded; shown in the per-gate exclusion summary (health; reputation, including a bootstrap reputation still below the selection threshold)	Ensure the runner heartbeats; let a fresh runner cross the reputation selection threshold
4	`storage_support` capability (§5.1.1)	Dispatcher	runner excluded: `storage_support = false` or missing `encryption_pubkey`	Re-register the runner with storage config (deregister → register, §5.1.1)
5	Mount-allowlist principal (§7.7)	Storage Manager	`MountNotAuthorized { reason }` where reason is `not_on_allowlist` or `principal_type_mismatch`	Owner adds the submitter to the allowlist by bare address (avoiding the Actor/Account variant trap)
6	Runner ↔ validator wire compat (§16.6)	Validator	`UnsupportedTxWireVersion { received, supported }` — not a silent decode drop	Upgrade the runner to a wire-compatible build

The gates are ordered: a job that fails gate 1 never reaches gate 2. Implementations MUST surface which gate was binding rather than collapsing all six into a generic “job did not dispatch” — this is the single most valuable diagnostic in the storage path.

CIP-8: MPP Session (Retroactive)CIP-10: Runner Container Runtime (v2.r2)

​CIP-9: Runner-Backed Storage

​1. Abstract

​2. Motivation

​2.1 Why not existing systems?

​3. Definitions

​3.1 Canonical Manifest DAG

​4. Design Overview

​4.1 Architecture

​4.2 Lifecycle

​4.2.1 Committee Composition for Storage-Attached Jobs

​4.2.2 Dependency on Runner-Backed Continuations

​4.3 Canonical Implementation Boundary

​4.4 Relationship to Existing CIPs

​5. Relay Nodes

​5.1 Role and Responsibilities

​5.1.1 Runner Storage Capability

​5.2 Relay Registry

​5.3 Shard Assignment

​5.3.1 Placement Persistence

​5.3.2 GET_MANIFEST RPC

​5.4 Erasure Coding

​5.5 Autonomous Shard Repair

​5.6 Proof of Retrievability (PoR) Challenges

​5.7 Relay Node Incentives

​5.8 Relay Drain Governance

​6. Storage Addressing

​6.1 Path-Based Namespace

​6.2 Content Integrity

​6.3 Addressing for Deletion

​7. Access Control

​7.1 Capability Tokens

​7.1.1 Auth Enforcement on Placement RPCs

​7.2 Access Modes (CapToken Scopes)

​7.3 Concurrent CapTokens

​7.3.1 Prefix Enforcement Boundaries

​7.4 Caveats and Restrictions

​7.5 Read Consistency

​7.6 Public Volumes (PUBLIC)

​7.6.1 Overview

​7.6.2 Properties

​7.6.3 Relay Node Behavior

​7.6.4 Shard ID Opacity

​7.6.5 Content-Type Metadata

​7.6.6 Cache Headers

​7.7 Cross-Owner Volume Access (Mount Allowlist)

​8. Volume Attachment

​8.1 Attachment at Job Dispatch

​8.2 Attachment Process

​8.3 Detachment

​8.2.1 Dispatch Eligibility Diagnostics

​9. Encryption and Privacy

​9.1 Encryption at Rest

​9.2 Runner Key Access

​9.3 Privacy Guarantees

​10. Billing and Fees

​10.1 Fee Components

​10.2 Effective Size and Erasure Overhead

​10.3 Persistent Storage Billing

​10.3.1 Escrow and Rent Lifecycle

​10.4 Fee Distribution

​10.5 Relationship to CIP-3

​11. onchain State

​11.1 Storage Manager System Actor

​11.2 Relay Registry

​11.3 Key Space

​12. Client Interfaces

​12.1 Filesystem Interface (FUSE Mount)

​12.1.1 Design Rationale

​12.1.2 Mount Point Layout

​12.1.3 Sync Strategy (Hybrid)

​12.1.4 FUSE Operation Mapping

​12.2 Object API (Programmatic)

​12.2.1 Manifest Commit Transport (Write-Relayer)

​12.3 Account Owner API (via Storage Manager Actor)

​12.4 CIP-2 Task Definition Extension

​12.3.1 Volume Creation and Lookup Errors

​13. Garbage Collection

​13.1 Triggers

​13.2 Process

​13.3 Deletion Semantics

CIP-9: Runner-Backed Storage

1. Abstract

2. Motivation

2.1 Why not existing systems?

3. Definitions

3.1 Canonical Manifest DAG

4. Design Overview

4.1 Architecture

4.2 Lifecycle

4.2.1 Committee Composition for Storage-Attached Jobs

4.2.2 Dependency on Runner-Backed Continuations

4.3 Canonical Implementation Boundary

4.4 Relationship to Existing CIPs

5. Relay Nodes

5.1 Role and Responsibilities

5.1.1 Runner Storage Capability

5.2 Relay Registry

5.3 Shard Assignment

5.3.1 Placement Persistence

5.3.2 `GET_MANIFEST` RPC

5.4 Erasure Coding

5.5 Autonomous Shard Repair

5.6 Proof of Retrievability (PoR) Challenges

5.7 Relay Node Incentives

5.8 Relay Drain Governance

6. Storage Addressing

6.1 Path-Based Namespace

6.2 Content Integrity

6.3 Addressing for Deletion

7. Access Control

7.1 Capability Tokens

7.1.1 Auth Enforcement on Placement RPCs

7.2 Access Modes (CapToken Scopes)

7.3 Concurrent CapTokens

7.3.1 Prefix Enforcement Boundaries

7.4 Caveats and Restrictions

7.5 Read Consistency

7.6 Public Volumes (`PUBLIC`)

7.6.1 Overview

7.6.2 Properties

7.6.3 Relay Node Behavior

7.6.4 Shard ID Opacity

7.6.5 Content-Type Metadata

7.6.6 Cache Headers

7.7 Cross-Owner Volume Access (Mount Allowlist)

8. Volume Attachment

8.1 Attachment at Job Dispatch

8.2 Attachment Process

8.3 Detachment

8.2.1 Dispatch Eligibility Diagnostics

9. Encryption and Privacy

9.1 Encryption at Rest

9.2 Runner Key Access

9.3 Privacy Guarantees

10. Billing and Fees

10.1 Fee Components

10.2 Effective Size and Erasure Overhead

10.3 Persistent Storage Billing

10.3.1 Escrow and Rent Lifecycle

10.4 Fee Distribution

10.5 Relationship to CIP-3

11. onchain State

11.1 Storage Manager System Actor

11.2 Relay Registry

11.3 Key Space

12. Client Interfaces

12.1 Filesystem Interface (FUSE Mount)

12.1.1 Design Rationale

12.1.2 Mount Point Layout

12.1.3 Sync Strategy (Hybrid)

12.1.4 FUSE Operation Mapping

12.2 Object API (Programmatic)

12.2.1 Manifest Commit Transport (Write-Relayer)

12.3 Account Owner API (via Storage Manager Actor)

12.4 CIP-2 Task Definition Extension

12.3.1 Volume Creation and Lookup Errors

13. Garbage Collection

13.1 Triggers

13.2 Process

13.3 Deletion Semantics