Skip to main content

Trying to understand really big numbers

Posted by enicholas on December 19, 2006 at 8:12 AM PST

Ok, this isn't strictly Java-related, but it's geeky enough that I hope you find it interesting regardless.

Various sites have recently broken the news that the next version of MacOS X, code-named Leopard, will feature support for Sun's ZFS filesystem. As a Mac user, I find this news particularly exciting, but those of you still using Windows may want to take note as well.

If I sat down and wrote a list of all the things a super-powerful futuristic filesystem should do, completely without regard for practicality or implementation difficulty, not only would ZFS already do everything I came up with, but I doubt I would have imagined even half of its actual features. Suppose you want to clone your hard drive, install a test application, and then roll back to the previous state of your hard drive. How long would that take you? For most of you, long enough that you'd rather just cross your fingers and hope for the best.

Under ZFS, creation of a writable clone of a filesystem is essentially instant. It only has to maintain the difference between the two states, rather than two complete copies of the data, so the clone initially takes no space and virtually no time to create. Once you're done with your tests, destroying the clone is also essentially instant. The ability to instantly create, restore, and destroy snapshots and clones is incredibly powerful and something I'm very excited about, but it's not the only trick up ZFS' proverbial sleeve.

Among other things, ZFS is a 128-bit filesystem, meaning that the total storage it can manage in a single storage pool is 2128 blocks, which is a very big number. In fact, 2128 is such a big number that I'm going to unequivocally state that we will never, ever need more storage than that.

That's a bold claim. Many computational limits have been thought sufficient in the past -- who ever thought we would need more than 4GB of memory in a desktop system? I've seen people making similar claims in response to ZFS, thinking that we've passed every other limit, so why not this one? That's a reasonable question to ask, so let's take a look at how much data a 128-bit filesystem can actually hold.

We need to store a lot of data for this thought experiment, and nothing fills hard drives like video. Let's say it's high-definition video, complete with surround sound -- maybe 10GB / hour after compression. And you record this video 24 hours a day, 365.25 days a year. That's 85.6 terabytes a year, which is certainly a lot of data, but it's well within the reach of modern storage systems. So let's record this video for a very long time, say since the formation of the Earth 4.5 billion years ago. That's an inconceivable amount of data, roughly 359 billion terabytes, and is already more than a 64-bit filesystem can handle.

But what good is only one camera? It might end up at the bottom of the ocean and spend a billion years filming a family of sponges. We clearly need many, many cameras. Let's put one camera for each square meter of the Earth's surface, all of them recording high-definition video for 4.5 billion years. We're up to 2 x 1038 bytes now, an inconceivably large number. You could also express it as 200 trillion trillion terabytes, but that doesn't make it any easier to handle -- it's just too big for human understanding. We must have filled up the filesystem by now, right?

Well, this incomprehensibly gargantuan amount of data has indeed put a dent in our 128-bit filesystem, which is now about 0.1% full. All the data ever produced by the human race -- all speech, books, plays, movies, music, emails, everything -- is a tiny, tiny drop in the bucket in comparison.

A 128-bit filesystem effectively cannot be filled. The laws of physics set an upper bound on the amount of information we can cram into a certain amount of mass and volume, which means that it would take at least 136 billion kilograms worth of matter to hold that much data. And that's just a lower bound on the amount of matter necessary; it might not be a very tight bound (meaning the actual requirement is probably many orders of magnitude greater). Even ignoring the obvious impossibility of creating a storage device that large, you could never create enough data to fill it. Even with a high-definition camera on each square meter of the Earth's surface, it would take almost 4 trillion years' worth of video to fill it.

I think there's a lesson here. Our computers are now powerful enough that it's reasonable to choose limits so large that we can be essentially 100% confident that they will never, ever be reached, not in this universe at least. The question "How large can I imagine this value getting?" is very dangerous, because we humans are creatures of small imaginations. I'm not suggesting that every single limit must be so ridiculously large as 2128, but it's important to remember that arbitrarily chosen small limits have historically been a much, much greater problem than asking the computer to process an extra couple of bytes here and there.

And, because I want to at least mention Java here, take a look at JSR-202, "Java Class file Specification Update". One of the major changes is increasing various limits, because those initially chosen for the sizes of methods and so forth turned out to be too small. The limits of human imagination strike again.