fbpx

There are many different types of filesystems with different capabilities.

Once you understand these capabilities, you’ll not only be able to select the proper filesystem for your needs but will understand, for example, how your computer knows when you download applications from the internet so it can warn you when you try to run them.

At this point, each of these capabilities will seem unrelated to each other. In a way, they are. Think of them like individual features in a car. This episode will talk about file size limits and volume size limits. You can read the full transcript below.

Transcript

Knowing what these capabilities are will help you to understand which filesystem to use and how it can be used. This series will explain some common filesystem abilities. You’ll need this understanding to make any sense of one filesystem vs. another. Without it, you’ll be limited to whatever filesystem comes by default with your computer and probably won’t understand why you have problems transferring files from one computer to another.

Once you understand these capabilities, you’ll not only be able to select the proper filesystem for your needs but will understand, for example, how your computer knows when you download applications from the internet so it can warn you when you try to run them.

At this point, each of these capabilities will seem unrelated to each other. In a way, they are. Think of them like individual features in a car.

We can talk about which cars have power steering, what it is, and how it helps you to park. We can also discuss anti-lock brakes, what they are, and how they help you to avoid accidents. Power steering and anti-lock brakes are unrelated except that they both describe capabilities that some cars have and others don’t.

These are referred to as features. A car salesperson new to the business will first learn all the features and which ones exist on which cars in the lot. This is definitely a lot better than being clueless about the cars and only thinking in terms of colors and style. But that salesperson will make a lot more sales when the features can be explained in terms of benefits. Talking about how power steering help you to park is more of a benefit.

It’s the benefits that you’re interested in when deciding which car to buy. You want to know how a feature can make your life easier or make your travel safer.

I could go quickly through common filesystem features and even give you a quick overview of each one. And all that would fit into a single episode. But you’ll get more value from a more detailed explanation of each feature that explains the benefits to you. What will each of these features do for you?

Before we get too far though, I want to ask your feedback about something. I started this podcast a couple years ago as a way to help you to learn how to program. Listening to audio is just one activity and you really need to actively program your own projects to really learn. Reading books, watching videos, taking classes, talking with other programmers, subscribing and reading programming magazines, all of these things are needed. There wasn’t a lot of actual teaching going on with audio podcasts at the time. Even today, most podcasts are based on interviews instead of specific topics designed to teach you how to program. Don’t get me wrong. Interviews are great. I listen to them too.

But you need more than that. And I sponsored the podcast myself to let you know that I could help you with live classes. I got several offers from potential sponsors but declined them because their products didn’t seem like something you would get value from.

Eventually, I found out about Patreon which is a service that provides a way for creators like myself to get paid for providing you with something of value. This can be almost anything.

Anyway, I’d like to ask you for feedback about the value you get from this podcast. You see, I’m considering moving more of the podcast to Patreon. Right now, you can get an extra bonus podcast episode each month by sponsoring Take Up Code as a patron.

You can visit takeupcode.com and click the link at the top to become a patron. Just one dollar a month is all it takes to get the bonus episode. And while you’re there, take just a minute to fill out the contact form to provide feedback. What do you think of the idea of moving more of the podcast to Patreon?

I eventually want to get to the point where I can produce regular video episodes as well. But video takes a lot more time to produce than audio. Moving more of the podcast to a paid subscription model will make it feasible to produce even more valuable content for you.

Alright, back to filesystem features and benefits. This episode will talk about file size limits and volume size limits.

The previous episode describes cylinders, heads, and sectors. You know that a disk is full of sectors laid out like pie pieces and layered like onions on each surface of a spinning disk. That’s the conceptual model anyway. Modern disks might actually layout sectors differently to take advantage of the fact that there’s more room at the outside cylinders than in the inner cylinders. The point here though is that each sector holds a certain number of bytes. A common size is 512 bytes. Although this is changing.

I remember working at Seagate in the 1990’s when computers had trouble with hard drives larger than 512 MB. I forget now exactly what the problem was whether it was with the partitioning or with the filesystem. But hard drives were quickly getting bigger. And Seagate came out with one of the first 1 GB hard drives that had an interesting feature where you could flip a switch and the hard drive would appear to be two hard drives. Each hard drive was then 512 MB and avoided the problem.

This is what I mean about how a hard drive can rearrange sectors. In this case, it was doing a lot more than rearranging a few sectors. It was pretending to be two hard drives in one.

Another recent change you’ll find in hard drives is a larger sector size such as 4096 bytes. Hard drives are getting smarter all the time and have their own controller board that is capable of changing things such as sector locations. It turns out that 4K sectors are a good size for error correction code. And that needs a lot more code than just rearranging sectors. This processing is done by the drive itself and the computer is not even aware of it. The hard drive might not expose these larger sectors to the operating system though. Or if it does, it will be in terms of logical vs. physical sector sizes. So a hard drive might actually store 4096 bytes in a sector. That’s the physical sector size. And report to the operating system that it only stores 512 bytes in each sector. That’s the logical sector size.

Why am I talking about sector size again? I mean, the last episode already explained this, right?

Because I want to draw a clear distinction between sector sizes which are used by the hard drive and cluster sizes which are used by the filesystem. You might also hear clusters referred to as allocation units. Or some filesystems might use blocks.

Sectors are important because that’s the size of data that the hard drive itself works with. If you want to change even one byte of data on a hard drive, you need to figure out which sector holds that byte on the hard drive and then read the entire sector. Once you have all the bytes from the sector in memory, then you can change the byte and write the whole sector back to the hard drive.

Filesystems work with clusters or allocation units or blocks as a way to map between the hard drives and the operating system. This will determine the maximum capacity of the filesystem as well as how big any single file can be. It also has an effect on the efficiency of the filesystem and can lead to wasted space.

All filesystems are different. So the idea of a cluster might not exist in all of them. It is a useful idea because it allows a filesystem to keep track of files in its own manner no matter how the actual disk is organized.

You might find that the cluster size changes by default as the size of the filesystem changes and you might be able to control this.

Smaller cluster sizes mean more work for the filesystem to keep track of but allow the total used space on the disk to closely match the size of a file. While larger cluster sizes might be faster but waste more space.

Let’s say you have a cluster size of 512 bytes and need to store a file somewhere around 750 bytes long. You’ll need two clusters and the second cluster will only be about half full. You’ll waste about 250 bytes. Because while the file itself is 750 bytes long, it needs 2 clusters for 1024 bytes of disk space.

Now, let’s take that same example with a cluster size of 4096 bytes. The same 750 byte file will fit in a single cluster now but will waste over 3000 bytes.

Any unused bytes in a cluster are wasted because they belong to some file and can’t be shared with other files.

At least not in most filesystems. It could be possible to design a filesystem that would reuse wasted space. This would be a more advanced cluster system where some clusters could be broken into smaller units. I’m not aware that the filesystems I’m familiar with will do this.

No matter how big the cluster size is, some of your files will need more than one. Filesystems keep track of clusters and know which clusters belong to which files. This record keeping takes some space as well and filesystems will have a maximum number of clusters that can be tracked for any given file. Once you reach that limit, you just can’t grow the file anymore and you’ll start getting errors from the filesystem that you’ve reached the limit.

Think of it like an expandable box that you can fill with papers. The box can only be expanded to a maximum size. Each paper can hold a certain amount of text. The papers are like clusters. And the box is like the record keeping system that can only hold so many papers before it’s stuffed so full that you can’t fit any more.

Depending on which filesystem you have, the boxes will be able to hold more or less papers. Usually, older filesystems are the ones with limits you might run into. Bach when those old filesystems were young, the boxes seemed big enough that nobody would ever notice the limit. But as time passed and we started using computers to hold more information, we realized that we needed more modern filesystems. Who knows? Even today’s filesystems might seem small and restrictive in a few years.

You’re most likely to run into this limit with a filesystem called FAT32 which has a maximum file size of 4 GB.

Why would you want an older filesystem with limits if newer ones have higher limits that you’re unlikely to notice? Because the older filesystems such as FAT32 have been around long enough that they’re well supported on many different platforms. So if you want the most portability, then an older filesystem is going to be better.

Other than just a limit on how big files can be, there’s also a limit on how big the filesystem can be on any given partition. Again, older filesystems will have limits you might run into while newer ones are big enough that you probably don’t care. The limit for FAT32 is 8 TB which you might start running into if you get a new hard drive.

The last thing to consider is how do clusters line up with sectors.

If they’re both the same size, then when a filesystem needs one cluster, it gets a single sector on the disk.

If the clusters are smaller than the disk sectors, then reading or writing to a cluster will still require the disk to read and write a full sector. You can end up going to the disk again for the same sector when it comes time to work with the next cluster. This can really slow things down because the hard drives are so much slower than main memory and reading and writing the same sector just to first work with some data in the beginning and then to work with some date at the end is wasteful.

If the clusters are bigger than the disk sectors, then reading or writing will involve multiple sector reads and writes. But at least each sector will only need to be read or written once.