Ethereum Node Sync: Snap Sync vs Full Archive Explained
You spin up a fresh Ethereum node, point it at the network, and then you wait. The question that actually matters isn't how long it takes. It's what the node is downloading, and what it will and won't be able to answer once it's done.
That distinction, between snap sync and full archive, is the one most guides bury in paragraph nine. Let's start there instead.
What a Node Is Actually Storing
Ethereum's state is a giant key-value structure: every account address maps to a balance, a nonce, a code hash, and a storage root. At any given block, the entire set of those mappings is called the world state. It's stored as a Merkle Patricia Trie, a tree structure that lets any node cryptographically prove any account's value at any block height.
Here's the wrinkle: that trie gets modified with every single block. Transfer ETH between two addresses and two leaves in the trie change. Deploy a contract and a whole new subtree appears. Over millions of blocks, the historical record of every intermediate trie state becomes enormous. You can keep all of it, or you can keep only the latest snapshot and the chain of blocks that got you here. That's the core tension, and everything else follows from it.
Snap Sync: The Fast Lane and What It Costs You
Snap sync, introduced in Geth and now the default for most Ethereum clients, does something elegant. Instead of replaying every transaction from block zero to reconstruct the current state, it downloads a flat snapshot of the current world state directly from peers. Think of it as grabbing a photograph of the filing cabinet rather than reading every memo ever filed into it, going back years.
Faster. Much faster. The flat state snapshot format transfers and verifies in roughly 4 to 8 hours on decent hardware, compared to days for older replay-based methods. Disk requirements land somewhere around 700 GB to 1.2 TB for a snap-synced full node, depending on the client and pruning settings.
Once it has the current state, the node downloads all block headers and recent block bodies to verify the chain's proof-of-stake history. It can then participate fully: validate new blocks, broadcast transactions, answer queries about current balances.
The catch: ask it for an account balance at block 4,000,000 and it will shrug. The intermediate states are gone. Snap sync nodes prune old trie nodes aggressively, and that pruning is precisely why they're manageable. They're excellent for running a validator, serving a wallet backend, or participating in consensus. For historical queries, they're simply the wrong tool.
Full Archive: The Complete Record, at a Price
An archive node keeps every historical state. Every trie node from every block, going back to genesis. It replays the entire transaction history sequentially, storing the world state after each block.
The storage cost is not subtle. A full Ethereum archive node requires somewhere in the range of 12 to 18 TB of disk space, growing by roughly 1 to 2 TB per year as new blocks accumulate. Fast NVMe SSDs are essentially mandatory. A spinning hard drive cannot handle the random read/write patterns the trie generates at that scale, full stop.
Sync time reflects the workload. Expect three to five days minimum on a machine with a modern CPU and fast storage, and some operators report longer. The node is literally re-executing every transaction Ethereum has ever processed.
Who actually needs this? Block explorers like Etherscan run archive infrastructure. DeFi protocols verifying historical collateral positions need it. Researchers auditing MEV or tracing a specific exploit need it. Most individual users and even most dApp developers do not, and they should stop feeling guilty about that.
A Concrete Scenario: Two Operators, Same Goal
Say two developers, Priya and Marcus, both want to run their own Ethereum node to back their DeFi analytics tool. Priya's tool shows users their current portfolio: balances, open positions, live prices. Marcus's tool lets users audit their historical tax positions, querying what their wallet held at any block in the past three years.
Priya snap-syncs a node on a 2 TB NVMe machine in about six hours. It costs her roughly $150 a month in cloud compute. Her tool works perfectly.
Marcus tries the same setup. His first query for a historical balance returns nothing. He re-provisions a 16 TB machine, waits four days for archive sync, and pays roughly $600 a month. Both choices are correct. They're just answering different questions.
The sync method isn't a quality difference. It's a capability selection.
What People Get Wrong About Archive Nodes
The most persistent misconception is that archive nodes are somehow more "real" or more trustworthy than snap-synced nodes. They aren't. A snap-synced node verifies the entire chain's proof-of-stake finality and can fully validate every new block. Its view of the current state is just as authoritative as an archive node's. The trie roots match. The cryptographic guarantees are identical for anything happening now.
The second misconception is that snap sync skips verification. It doesn't. The flat state snapshot is verified against the state root committed in the latest finalized block header. If a peer sends corrupted data, the hashes won't match and the node rejects it. Snap sync is fast because it avoids recomputation, not because it avoids verification. Conflating those two things is a surprisingly common mistake among people who should know better.
Still, there's a real limitation worth naming honestly. Snap sync nodes depend on finding peers willing to serve the flat state snapshot. In a scenario where very few archive or full nodes exist on the network, bootstrapping a new snap sync node becomes harder. This is a genuine concern about long-term network health, which is why Ethereum client teams actively work to incentivize archive node operators.
Choosing Without Overthinking It
For most node operators, the decision tree is short. Need historical state queries? Archive. Everything else? Snap sync.
So ask yourself: is the thing you're building time-sensitive about the present, or forensically curious about the past?
The practical checklist:
- Running a validator or staking solo: snap sync, 1 to 2 TB SSD, done.
- Building a wallet or dApp that reads current state: snap sync.
- Running a block explorer, historical analytics, or forensic tooling: archive, budget 16-plus TB and several days.
- Contributing to network decentralization with minimal resources: snap sync is still a full contribution.
The Ethereum client landscape, Geth, Nethermind, Besu, Erigon, each implements these modes with slight variations. Erigon's archive implementation is notably more storage-efficient than Geth's, achieving archive-level history in closer to 2 to 3 TB through a different internal database layout. Worth knowing if archive is your path.
The network needs both types. Archive nodes are the long memory of Ethereum, expensive to run for a reason, irreplaceable when you need them. Snap sync nodes are the working majority keeping the network alive, decentralized, and fast. The mistake isn't choosing one over the other. It's provisioning the wrong one and discovering that halfway through a four-day sync.