bcacheFS is all you need
I'm migrating away from MooseFS, the distributed filesystem software that powers my entire >6 nodes Proxmox cluster. To bcachefs (local filesystem) + NFS on a single NAS computer.
I know, it was not an easy decision. Motivation is reducing power consumption. I'm reducing from 3 storage-compute nodes to a single NAS node. Reducing average storage-related power consumption from idle 600 Watts (!) to only 130W. Hardware details of the new NAS will be on another future post, stay tuned!
What I'm looking on a new filesystem
We should look at what hardware I do already have, without going deep in details.
| Quantity | Interface Type | Drive Type | Capacity (Each) | Total Capacity |
| 4 | SATA | HDD | 4 TB | 16 TB |
| 8 | SAS | HDD | 3 TB | 24 TB |
| 9 | SAS3 | SSD | 4 TB | 36 TB |
| 3 | SAS3 | SSD | 3 TB | 9 TB |
| 1 | SAS3 | SSD | 7 TB | 7 TB |
| 3 | SATA | SSD | 1 TB | 3 TB |
| Totals | - | 12 HDDs / 16 SSDs | - | 40 TB (HDD) / 55 TB (SSD) |
As you can see on the list, there's a random amount of drives with not equal sizes ☹️. Casually we have near 1:1 ratio SSDs with HDDs which is really good.
Summing all up, we have around 95TB of mixed storage that we want to use in the most efficient way possible, since 55TB (60~%) of it is very expensive.
Expected use-case: Dropping everything in the same pool

Idea: ALL Proxmox Nodes will connect to the NFS share created by our future and unique NAS.
To simplify management, I want a single NFS-backed storage pool named "storageDC" for the cluster. Proxmox nodes do not need to know that the NAS has a salad of HDDs and SSDs. It just needs to drop qcow2 or raw images in a pool and do "compute-work". Let the NAS do NAS things, like backups and deduplication.
Choosing a appropiate NAS filesystem for uneven drives
I don't want a technical debt, so I'm not going to use ZFS, since it does not support uneven drives (unless you do some NASty tricks (ha-ha) with partitioning which I'm NOT going to do). mdadm is discarded by the same reason as ZFS, it does not support natively uneven drives.
Why did I discard BTRFS?
Upon exploring BTRFS, it does support uneven drives! And compression and deduplication with bees 🐝
In their upstream documentation, as of 27 June 2026, they mention that RAID 5 and RAID 6 are still discouraged due to the write-hole issue. This issue means that, on an unsafe power loss while a write operation is running, you risk losing the entire data pool because of how RAID56 is implemented. The risk is very low, but can't be ignored. Flash storage is too expensive to give up and use RAID1 or RAID10.
Looking up further more, I wanted to use both SSD+HDD in a single pool, but BTRFS and most other local filesystems don't distinguish between drive types, they don't implement any sort of "tiering" and would depend on two separate pools.
Enter bcachefs, a filesystem unicorn that solves all my problems.
Bcachefs as a salad filesystem 🥗
Since last week, bcachefs is now finally considered "no longer experimental" by its developer for usage in production, this is great news!
Bcachefs is very similar, in features, to Btrfs, including up to 255 drives support. But bcachefs implements more features, like native encryption and O(1) snapshots. They even highlight a whole section to "we implement RAID56 without write-holes" in their user documentation as a feature :)
The features that we are interested in for our newest NAS setup are:
Multi-tier caching:
Remember that we have near 55TB of SSDs and 40TB of HDDs? bcachefs can use our SSDs as a "write cache" (this means write first to ssd pool and, in the background, write to hdd pool) and also keep frequently used data in SSD devices (as long as we have each block device labelled correctly, as bcachefs will not detect the device type automatically).
Read-cached data is evicted from the Hot Data pool in a Least-Recently-Used basis whenever the SSD pool is getting close to 95~% full, unlike systems like MooseFS which evict cached data based on time passed without being accessed.
Erasure coding
Since March 2026, bcachefs's erasure coding implementation is complete (only for data, not available for metadata). It supports up to 3 parity chunks, and the fun part is that bcachefs dynamically (over-simplification) "chooses" the data chunks based on how many storage devices you have, allowing you to have the same storage efficiency as RAID5 (with one parity chunk) or RAID6 (with two parity chunks). Unlike traditional RAID, you don't have to rebuild the array when you add or remove a device.
Erasure coding in bcachefs is fundamentally a background process. If you are using multi-tier caching, this process strictly targets your Cold data tier (the --background_target parameter), but it does not require a multi-tier setup to function and works equally well on a flat storage pool.
To prevent the write-hole issue that BTRFS suffers from, bcachefs writes data initially in the foreground as standard full replicas. Later, a background thread groups these settled blocks into a new stripe, calculates and writes the full parity to disk, and finally deletes the extra initial replicas to free up space.
Future plans
I will explore further on how to de-duplicate all my data with duperemove. Considering that I have over 100 LXCs, there's a lot of space that can be regained at a to-be-measured performance cost.
I've been missing live-snapshots too, since the MooseFS Proxmox plugin doesn't support snapshots yet.
The recovery process in case of a failing drive looks very straightforward just like any other filesystem. So far I'm very happy with bcachefs 😄 If I had a mixed bag of HDDs and SSDs, I would 100% choose bcachefs again over a mixture of btrfs+mergerfs+snapraid like I did in the past.
That was everything for today, have a good day (or night)!