186: Git: Keep Track Of Your Files As They Change. Part 1.

Programming involves change and managing that change is the only way to make sense of it.

It’s more than history. As your projects get bigger and especially when you have a team of software developers all working on the same project files, a version control system is absolutely critical. This is either called version control or source code control. Both mean the same thing.

How does it work? And how can you use it? That’s what I’ll begin explaining in this episode. Specifically, I’ll be discussing my favorite system called Git. We won’t actually discuss Git in this episode. There’s a bit of background material to explain first about the differences between text files and binary files.

Listen to the full episode to learn more about the differences between text files and binary files. You can also read the full transcript below.

Transcript

How does it work? And how can you use it? That’s what I’ll begin explaining in this episode. Specifically, I’ll be discussing my favorite system called Git. That’s spelled G i t. We won’t actually discuss Git in this episode. There’s a bit of background material to explain first about the differences between text files and binary files.

I’ve mentioned version control many times in previous episodes and you might want to listen to episodes 142 about comments and episodes 166 through 169 which are a guide to computer programming.

Before I explain how to use a version control system, let’s look at how you might keep track of files yourself. I’ve done this before and you’ve probably done similar things.

If you only have one file, it’s a little easier but still a chore. Maybe you have a budget that you save to a file on your computer. If you keep updating the file each month then either the previous month’s information will get deleted or changed or if you want to keep all that information, then the file will grow until it gets too large to work with. Then you’ll face the same decision to start deleting old information from past months.

Instead of either of these two options, maybe you decide to start saving the budget with a new name each month. This keeps your file size small and lets you keep as many previous months as you want. But soon you find yourself scrolling through all these files. Your directory starts filling up. So you move some of the old files into an archive folder.

This is really what a version control system does for you. Only it does it better by managing all the work for you, keeping even smaller files archived so you use less space, and lets you compare changes.

One thing you should be aware of is that any version control system works great with text files. So you can use it to manage more than just source code. Binary files are a bit different.

But wait a minute, aren’t computers based on binary? Yes, and a text file is composed of binary bits just like any file. What makes a file a text file is that it contains sequences of binary codes that have meaning separated every now and then by new lines. All that means is that a text file contains characters that you typed split into one or more lines.

Each character that you type could be stored in the text file differently depending on how it’s encoded. And it gets more complicated if the text contains something like Chinese characters. Listen to episode 107 about chars and bytes and episodes 114 through 116 about strings for more information. The encoding doesn’t really matter here. The bytes still represent the characters you typed no matter how they’re stored.

The new line character can also be different depending on what operating system you’re using. Windows computers actually insert two characters when you press the enter key. The first is a carriage return and the second is a new line. This goes back to old typewriters that had a carriage that would move left as you typed. The keys would cause strikers to flip up and hit a ribbon covered in ink which then pressed the image of the letter into the paper. Both the carriage and the ink ribbon would advance because each striker was designed to hit the same spot each time.

The strikers could sometimes get tangled up if you pressed the keys too fast. If you’ve ever wondered why the keys on your keyboard are arranged the way they are, it was because the original typewriters purposely wanted to slow down the typists by mixing up the keys. The keyboard arrangement most often used is called Qwerty because that’s what the first few keys spell in the top row of letters. But the salespeople who were trying to sell typewriters would sell more typewriters if they could show how fast they were so all the keys needed to spell the word “typewriter” were placed in the first row. It made for a good demonstration and the Qwerty format eventually became the standard.

Okay, that was a fun detour. On the old typewriters, once the carriage moved all the way to the left which meant that the strikers were hitting the page as far on the right as possible, then the typist would need to slide the carriage all the way to the right again. This was a carriage return. But if that’s all that was done, then any new typing would just start appearing on top of what was just typed. So the carriage had to be turned as well to move the paper up. This would cause new typed characters to appear on the next line.

Because the old typewriters needed two separate actions to start a new line of text, Windows followed this and also adds a carriage return and a new line character to your text files. You don’t actually see these individual carriage return or new line characters when using an application to create and edit some text. The application hides them from you and just displays text on separate lines. Just like you don’t actually see characters left on a hand-typed page from when the typist had to move the carriage. All you see are the typed characters appearing where they should.

The Unix operating system took a different approach and decided that electronic files really only need a single hidden character to represent a new line of text and uses just a single new line character. Linux and Mac computers also follow this pattern.

You shouldn’t confuse new lines with word wrapping. On some editors, when you type beyond the width of the screen, it just scrolls over to make more room. You can type an entire book on a single line if you wanted by just never hitting the enter key to start a new line. On other editors or with the word wrapping feature enabled, the editor will rearrange your text so that it always fits on the screen. At least as it fits across the screen. You’ll still need to scroll up and down. The new lines that the editor places for you are called soft line breaks. That’s different than a hard line break you get when you press the enter key and insert a new line. A hard line break will always form a new line no matter what word wrapping does.

You can tell if word wrapping is enabled by changing the width of the editor. If the text rearranges itself to fit the width, then you have word wrapping enabled. When you save this document to a file, it’s up to the editor if it’ll replace the soft line breaks with new line characters. Soft line breaks are never stored as characters in the file. They’re computed by the editor based on the width of the editing window.

Some editors might seem like text editors but they’re really word processors. You can tell a word processor because it’ll allow you to format your text with different sizes, colors, and fonts, add images charts, and tables, and usually a lot more. All this extra formatting has to be stored somewhere and that means it goes in the same file. But you don’t actually type this extra formatting. At least not usually. There are simple markup techniques that can provide similar results and these are controlled by typing special codes into your text. The end result is that word processors may or may not save their files as text. Even if it is text, it’ll probably look nothing like what you see when you’re editing your document. This is why you shouldn’t use word processors when programming. And it’s also bad for version control systems.

Well, it’s not so much that it’s bad. It just means that files with all this extra stuff inside them will be harder to merge. Or maybe they can’t be merged at all. I’ll explain merging in part 2. And other file types such as that budget I was describing also fall into this category where extra stuff gets added to your file.

For now though, just know that merging and version control in general work line by line. If there’s a bunch of lines that you didn’t type yourself, then it can be hard to understand their meaning. And that means you won’t know how to merge them. It also means that a small change that you make directly could result in many additional changes made for you throughout the file.

If you have a text file that’s just a single long line, then it also defeats the purpose of the file being text. Because any change in the file will mean that the whole line has changed. It’s all or nothing. Any change anywhere in the line means that the version control system has to replace the old line with the new one when managing your versions.

This is why binary files are so troublesome. A binary file is one that can contain any binary value for any byte in the file. It’s also something that you can’t just open and change byte values without corrupting the file. Files like executable files or images are binary. If you try to modify the contents then either the application will crash the next time you run it or the image will fail to load properly. Because binary files can contain any value, that includes carriage returns and new lines. But in binary files, these characters don’t have the same meaning. There’s no such thing as a new line in a binary file. They behave more like a text file with just a single long line. Except that you can still make some sense out of the contents of any text file. A binary file really is a bunch of ones and zeros.

You can store binary files in a version control system just like text files. But the version control system will treat each binary file as a whole unit that can’t be processed line by line.

186: Git: Keep Track Of Your Files As They Change. Part 1.

Transcript

Tags

Leave a ReplyCancel reply

186: Git: Keep Track Of Your Files As They Change. Part 1.

Transcript

Share this:

Tags

Leave a ReplyCancel reply