188: Git: Keep Track Of Your Files As They Change. Part 3.

Programming involves change and managing that change is the only way to make sense of it. You’ll learn about branching and what it means to commit your changes in this episode.

Git is more than a system that allows you to checkout a file, make some changes, and then check it in again. That’s how libraries work. And when the book you want to read is already checked out by somebody else, you have to wait. Libraries also treat each book separately.

Programming is different. First of all, the bits that make up a file can be copied as many times as you have hard drive space with no real limit to how many people can be working on a file at any time. And secondly, you’re most likely going to need to work with several files each time you make a change. For a large project and a large change, you might need to work with hundreds of files at a time.

Git can handle all of this and it encourages you to make smaller changes and record those changes often. This makes tracking changes easier. The last thing you want is for your version control system to have an initial commit and then six months later a giant commit with a comment that says everything is now ready to release version 1.0 of your software.

That’s exaggerating things quite a bit. And I’m also getting ahead of myself. Listen to the full episode to learn about branching, committing changes, and how you can use branches to manage your work. You can also read the full transcript below.

Transcript

You’ll learn about branching and what it means to commit your changes in this episode.

Git can handle all of this but it encourages you to make smaller changes and record those changes often. This makes tracking changes easier. The last thing you want is for your version control system to have an initial commit and then six months later a giant commit with a comment that says everything is now ready to release version 1.0 of your software.

That’s exaggerating things quite a bit. And I’m also getting ahead of myself. Let’s go back to the beginning.

I’ve worked with other version control systems before that would allow you to check out a file to make changes. The system would actually change every file in your project to be read-only and make you check out a file before it would allow you to save your changes.

Many editors were perfectly happy to allow you to open these read-only files and even make changes to the file. I would sometimes forget to check out the file first and would be reminded when I tried to save my changes. That’s when the editor would complain that it couldn’t write the changes to a read-only file on disk.

That’s also when I would switch over to the version control system to check out the file. As long as nobody else had changed the file, this was okay. But if somebody else had changed the file, then when I would check it out, my local copy of the file on disk would be updated with the latest changes. But those changes would not be included in what my editor was ready to write back to disk. Because I had no way to know what the changes were or how they might conflict with what I was about to write, I would usually choose to abandon my work and start over again. That’s not fun.

Sometimes, if I had a lot of changes, I would choose to save the changes I had made to a different file name so that I could compare the file that I just saved with the updated file. This is where a lot of people have bad experiences with merging changes. Because we’re trying to compare two similar but unrelated files. There’s no common history between the files to know what changed in each of them. Listen to the previous episode about merging for more information.

I was very glad when my editor finally added an option to prevent changes being made to read-only files. With this option set, if I forgot to check out a file, the moment I typed a single new character, the editor would display a popup window letting me know that the file was read-only. That would make me go to the version control system and check out the file. If there had been any other changes to the file, then my local copy would be updated and my editor would ask if I wanted to reload the file. Yes, please, reload the file. This made sure that I would always be working with the latest version before even beginning to make any of my own changes.

But what if somebody else had already checked out the file and was still not done? Or worse, that person was on vacation and wouldn’t be back to work for another week. The only option then was to go find an administrator and ask them to forcibly remove the lock held on the file because the other person had the file checked out. I’m sure you can imagine how that other person would feel coming back from vacation with a mess to clean up.

Now you might think, this won’t happen to me. I work on a small team or even by myself. I don’t need to worry about locks held by files checked out. All I can say is good luck. The next time you’re in the middle of a large change and have an idea for an unrelated smaller change, you won’t be able to switch back, make the small change, and then continue again with your larger change. You’ll be blocking yourself.

Also notice how throughout this entire discussion about read-only files and locks how the focus has been on individual files. There was no concept of locking a group of files. Sure, several files could be checked out and locked at the same time. But that created separate locks on each file.

The real problem is that the whole concept of checking out files, locking them, making your changes, and then checking them in again to release the locks, is a broken design. It doesn’t work.

Git takes a completely different approach. There’s still a concept of a checkout in Git but it has a completely different meaning. First of all, you don’t check out files in Git. You checkout a branch. That means you need to have something already in Git. So before we go any further with branching, let me first explain how you get files into Git.

Let’s start with a brand new repository. A repository is what Git calls the collection of all the files and their history managed by Git. Normally, this is just called a repo. Git knows what’s already inside a repo. So you don’t need to tell it ahead of time that you want to make changes. Maybe you want to change a file already in the repo or add or delete a file. Just make your changes and Git will figure out what’s been done.

With a brand new repo, all you need to do is create some files with whatever source code you want in them and save them to disk. The files can be in the root folder of the repo or in subfolders. It doesn’t matter. Git will find them. The only requirement is that the files do need to exist inside the directory structure where the repo was first created. That just means that you have some folder on your hard drive where you want your project to live. This is the root folder of the repo. It’s not the root folder of your entire file system. From this top level folder where the repo was created, you can create other folders inside to organize your project files. Any file stored anywhere under the repo folder will be found and can be included.

I say that the files can be included because you still need to tell Git which files you actually want it to manage. This way, if you have some temporary file, it won’t be included in your project history unless you want it to.

Okay, once you have some files, you tell Git that you want to add them to the repo. You can add individual files or an entire folder which will add all the files in that folder. There needs to be at least one file in a folder though.

Until you tell Git that you want to add a file, it will show up as an untracked file. Git knows these extra files exist but it won’t do anything with them until you tell Git to add them to the repo.

Just adding the files is not enough. You need to commit your changes. Notice that you don’t check in your files. There’s no concept of checking in anything in Git. You commit changes instead. Your files remain as they are after committing the changes. Git doesn’t change them in any way. All Git does is make a copy of the new files inside a special hidden folder. This folder exists in the root folder of your repo. And it’s the presence of this hidden repo folder that makes a Git repository.

Your commit is also treated as a whole unit. Each file that was added or changed and committed at the same time will be included in the commit. Git knows which files belong together because you included them together in a single commit.

If you then change a file, or add a new file, or even delete a file that was previously committed, Git will allow you to commit these changes as another whole unit of change associated with its own commit. Each commit you make builds on previous commits so you can trace your changes back to the very beginning.

When you create a new repo, you also get something called a branch created for you. This branch is called master. You can create and delete other branches whenever you need to.

I’ve worked with other version control systems that also have branching but their branches are heavy items that can duplicate all your files in each branch. It’s almost like having separate repositories for each branch. In these systems, branching is expensive and not used very much. At least it’s used very carefully.

Git encourages you to create as many branches as you want. Because in Git a branch is nothing more than a label attached to a commit id. Each time you commit your changes, Git creates a new identification number to be able to uniquely tell one commit from another. These numbers are very long and consist of some letters as well as numbers. You can work directly with commit ids if you want. And you don’t even need to specify the full commit id. Usually, just the first seven digits in the commit id are enough to tell one commit from another.

While Git might be happy to work with these commit ids, for us humans, they look like nothing more than a long string of random letters. We want something more friendly to work with. And that where branches come in.

When you started out with that empty repo and made your first few commits, each time you created a new commit, Git looked to see what branch you were currently using. That would be master in this case. Each time you created a new commit, Git updated the master branch to point to the new commit. All it really did was move the label to point to the new commit id.

If you want to go back to an earlier commit, you check out that commit. This is where Git uses the concept of a check out. Checking out a commit doesn’t lock anything. All it does if put your files back to whatever they were at the time of that commit you’re checking out. This can add or remove files too if needed to get back to what your files looked like at the time of that commit. Now, if you have any untracked files, Git will leave those alone. It’s not going to delete a file that you’ve been working on and haven’t yet committed.

Once you’re back at some earlier commit id, you can create a new branch with the checkout command. Or you can go back to that earlier commit id and create a new branch at the same time. The point is, you can create as many branches as you want all pointing to the same commit id. Git doesn’t care. Whenever you make a new commit, git will create a new commit id to identify the new commit and will advance the current branch label to point to the new commit. Any other labels will remain pointing at the same commit id and won’t be advanced.

So how should you normally use branches? Well, you start off with branch called master that will advance with each commit you make to point to the new commit. It’s a good idea to let master point to the last or most current release of your software. Anytime you want to make changes, first checkout master to get the most recently released changes. and then before making your changes, create a new branch and checkout that branch. This won’t change any of your files because the new branch and master will both point to the same commit id. the difference is that now when you make changes and commit them, only your new branch label will advance to the new commits. The master branch will remain pointing to the same commit id until you decide to move it.

You can work out of your new branch making changes and commits as often as you want. Once you’re happy with your changes and want to include them in the master branch, just checkout the master branch again. All your changes will go away this time. Don’t worry, they’re not lost. But checking out the master branch puts all the files back the way they were when that commit id was created. This might even remove some of your files. Or it could put some files back which you deleted in your own branch.

Now, once you’re back in the master branch, you tell Git to merge in all your changes from your own branch. This is where Git really shows its power. Because it will merge all the changes you made in your own branch with any changes that were made to master since you last branched off of master to begin your won changes. Even of other people or other changes have been made to master since you began working on your own changes. All of these changes across all the files involved will be put together line by line and a new commit id will be created. Listen to the previous episode about merging for more information. This is merging on a larger scale than just a single file. And just like always whenever you make a new commit, Git will advance the current branch label to point to the new commit. That means that the master branch will move up to include all the changes you just merged.

You can then delete your own branch that you created for this work. All those commits you made still exist inside Git. Deleting a branch doesn’t remove any commits. It just removes a label that was pointing to some commit. Or you can leave your branch as is and check it out again. Or create a entirely new branch. Just remember that any new commit you make will advance the current branch to point to the new commit. By checking out a branch other than master, you allow master to remain pointing at a known location while you work on changes that might not be ready to be released yet.

One final thing I want to explain is this. It’s possible and sometimes necessary to check out a commit id directly. You need to do this anytime there’s no branch currently pointing to that commit id. You can also tag commits which creates a friendly label for a commit id. Tags are different than branches because tags always remain pointing to the same commit id. They don’t change. The current branch will advance to point to a new commit id. But if you either checkout a commit id directly or checkout a commit id through its friendly tag name, then you’ll find yourself in a state where there is no current branch. Git calls this a detached head. It’s not as bad as the name sounds. You can continue making commits as normal in a detached head state. Each commit is perfectly happy to keep track of all the changes made in that commit as well as the previous commit that those changes are based on. At any time when in a detached head state, you can create a new branch. All this does is create a branch label pointing to the current commit id. Once you have a branch, everything goes back to the way I described earlier.

188: Git: Keep Track Of Your Files As They Change. Part 3.

Transcript

Share this:

Tags

Leave a ReplyCancel reply