fbpx

Programming involves change and managing that change is the only way to make sense of it. You’ll learn about submodules in this episode and how that enables you to reference code from other repositories.

There’s a concept in programming called refactoring. I could spend several episodes about this one topic and will in future episodes. For now though, I’ll just say that you normally don’t want to repeat code. In other words, you don’t want similar or even the same source code spread out in multiple places in your project. That’s because if you ever need to change this code or fix a bug, then you need to make sure to change each location. It’s too easy to miss places.

What you need is a way to use code from another repo inside your own repo. And Git has two options available for you. Listen to the full episode or read the full transcript below to learn about Git subtrees and Git submodules and why I prefer submodules.

Transcript

You’ll learn about submodules in this episode and how that enables you to reference code from other repositories.

There’s a concept in programming called refactoring. I could spend several episodes about this one topic and will in future episodes. For now though, I’ll just say that you normally don’t want to repeat code. In other words, you don’t want similar or even the same source code spread out in multiple places in your project. That’s because if you ever need to change this code or fix a bug, then you need to make sure to change each location. It’s too easy to miss places.

If instead that code was in a single place, and exposed as a class or even just a single method, then it can still be used in multiple places. But the code itself is now in a single place. Any changes or bug fixes can be made directly to the one spot.

That’s a great way to design code to be reused throughout your project. It can be so useful that you might find yourself wanting to use the code in other projects too. Then what do you do?

One option is to just copy the code into your new project and then make sure that it’s used from just that single location. You’ll soon realize that all you’ve done is take the same old problem and just made it bigger. You still have to remember all the other projects that might be using your code when it comes time to fix a bug.

Another option is to just put all your projects in one big Git repository. That’s not a very good idea either. Maybe the code you want to reuse isn’t even your code. Maybe it already exists in another Git repository and you want to be able to benefit from any enhancements or bug fixes done to the original code. You don’t want to be copying it to your repository even if the license says that you can.

What you need is a way to use code from another repo inside your own repo. And Git has two options available for you. I’m only going to briefly mention the first option and really should mention this option last because I’m not a big fan of using subtrees. The only reason to mention subtrees first is so we can move past it quickly.

Git allows you to copy the code from another repo and include the files and folders as if they were part of your own repo. Any commits you make can include changes to your files as well as the files in the code that you’re reusing. It’s a simple solution that you don’t have to think about too much. It just works. But there’s a catch. When it comes time to update your borrowed code to pick up any bug fixes or maybe you want to contribute a few of your own changes back to the original repo, it becomes hard to separate your commits. Subtrees make it harder to update changes by pushing extra work onto the person doing the merging. I’ve seen professional software development teams use subtrees like this before and they ended up just giving up on merging and decided to only make changes through the copied subtree. This is not the way Git should work.

The better solution is called submodules. They require a little more knowledge of how Git works. But the time it takes you to learn this skill will be very rewarding. This is what I hope to explain in this episode. So forget I ever mentioned subtrees. You’re going to learn how to properly use submodules instead.

First of all, you can’t just ask Git to reuse a single file from another repo. Or even a few files. You have to reuse the entire repo. All you do is tell Git that you’d like to create a submodule, tell it where in your project to put the submodule folder, and which other repo you want to use for the submodule. Git will copy the latest files from the master branch and at first it might look exactly like the subtree that I just explained and asked you to forget about.

The difference is not immediately obvious. By creating a submodule, though, you actually have an entire cloned repo of the source repo that you’re reusing. This cloned repo is sitting inside your own project repo. The main point to understand is that your project repo and the cloned repo are separate. It’s just that one happens to be located inside the other. The cloned repo becomes a submodule inside your project repo.

If you make any changes in your own files and ask Git for its status, Git will tell you exactly which files have changed. But if you make any changes to the files in the submodule and ask your own Git repo what’s changed, you’ll see a difference. All it will tell you is that there are uncommitted changes inside the submodule. To find out what those changes are, you need to ask the submodule repo itself. You do that in the command line by changing into the folder where the submodule lives. Any Git commands you enter are always relevant to which Git repo you happen to be in at the time.

When you navigate inside the submodule and ask Git for its status, then it will tell you which files have changed. You can then commit those changes and even push the changes back to the origin where you cloned the repo from in the first place. While you’re inside this submodule, Git only looks at that repo. Actually, this is the way Git always behaves. There’s nothing special about being inside a submodule. Because when you’re inside a submodule, Git doesn’t know anything about the outer repo.

In fact, you can even have submodules within submodules. Git doesn’t care. Because when you’re inside a particular repo, Git considers that repo to be all there is. If that repo contains submodules, then it knows about those submodules but treats them as separate repos. And when you navigate into those submodules, Git again forgets about anything except the repo you happen to be in. This can go on as far as needed.

Let’s back out a bit and go back to your outermost project repo. This is where people get confused about submodules and don’t realize how to use them properly. Because your file system doesn’t know anything about these repos within repos within repos. All your filesystem sees is a single hierarchy of files organized into folders. When you’re working inside your favorite programming environment, it’s easy to make changes in this repo and more changes in that repo, and still more changes in another repo.

Then when you ask Git to commit your changes, the only changes that are committed are those in the outermost project repo. People make changes across repos because of the submodules and don’t realize they’re working with separate Git repos.

The most straight-forward way to fix this is to first detect the situation. You should be aware of what submodules your project is using and where they are. But if you forget, that’s okay. Just watch out for Git telling you that there are uncommitted changes in the submodules when you ask Git for its status. Git will tell you that there are uncommitted changes inside the submodule even if it stops there and won’t tell you exactly what those changes are until you go into that submodule.

And that’s what you need to do. Once you know there are uncommitted changes in a submodule, just go into that submodule and commit the changes to that submodule repo. You can then push the changes to that submodule’s origin so other people and projects can benefit. I’ll explain how to bring changes from the submodule origin into your local submodule in just a moment.

Once you’ve got everything committed and just the way you want it inside the submodule, that’s the time to go back to the outer repo. Now when you ask Git for its status, it will tell you that the submodule has changed. But wait a minute, didn’t you just fix that problem. What you did was commit the changes inside the submodule. But Git detects that the submodule is now sitting at that new commit. You’ll get the same status anytime you check out a different commit or a branch that results in the current commit changing.

The reason this happens is because while the submodule itself has no knowledge of the outer repo, the outer repo does know about the submodule repo. And this knowledge includes a specific commit id that should be used. In other words, the outer repo knows about the inner repo and that knowledge includes which specific commit id should be used.

When you’ve made changes to the submodule and committed them. Or even when you’ve update the submodule by pulling down new changes and checking out those new changes. Really, it comes down to anytime you change the current commit id that checked out in the submodule, that’s when the outer repo will detect this new commit id and tell you that the submodule has changed. The message won’t tell you that there are uncommitted changes. Unless there really are changes inside the submodule that you haven’t yet committed. Assuming everything is committed though and the submodule is just currently on a different commit id that the outer repo is expecting, that’s when the outer repo will let you know.

How do you fix this then? All you have to do is add the repo to the files that you want to commit in the outer repo. This might seem like you’re adding all the changed files. But all you’re really doing is adding the new commit id that the outer repo should be using. If you look at a difference of your changes, you’ll see the entire submodule represented as a single commit id and you’ll see the old commit id is being updated with the new commit id.

So anytime you want to update your outer repo to include new committed changes from a submodule, just add that submodule and make another commit in the outer repo. It’s a little extra work. But what you get is completely separate commits that don’t cross any repo boundaries. The submodule repo is completely unaware of the outer repo. And the outer repo knows only about the location of the submodule, the location of the origin of the submodule, and which commit id it should be using.