Goal
Create a new design for sync that can scale to billions of messages and tens of millions of accounts.
Problem
The current diff sync architecture is not scaling at 60M+ messages. There are a few problems
- It attempts to sync all messages for all users for all time (recently changed to try to prioritize last 2 weeks worth 70% of the time)
- Onchain events, fname proofs and user generated messages are all treated with equal priority
- All message types and all fids are treated the same
- Same approach used to sync all messages since hub last started up (millions of messages) vs missed messages while hub is running (100-1000s of messages)
- Sync performance is highly dependent on single remote peer
- Current sync rate is ~100msgs/sec. Even if we speed this up 10x, it’s not ideal for syncing millions of messages.
- Due to current message volume (+ resulting pruning), and the fact that non warpcast hubs are submitting increasing proportion of messages, the sync trie root hash is very unlikely to be “stable” for long enough to matter. i.e. there is no single source of truth of the state of the network at any particular timestamp.
This currently manifests in the following ways:
- On start if a hub is too many messages behind, it’s stuck on “catchup sync”
- Hubs that missed too many messages never catch up, and will always be missing some messages
- network as a whole is permanently out of sync since hubs sync with each other one pair at a time and mostly time out
Proposed solution
There are 2 parts to the new sync architecture.