Moving One Git Repository Into Another

13 Jan 2012

I recently needed to combine several Git repositories into a single one, with each old repo living in a subdirectory of the new repo. I could simply copy the files over manually, importing the contents of each project in a single commit, but I'd lose the commit history of each subproject. After some brief searching, I found this helpful article, which describes how to import the old repos, complete with their commit history, into the new repo.

So suppose you have two repos: subproj and mainproj. If you want to move subproj into mainproj in a directory called sub/, the method consists of two steps:

  1. Move all the files in the root of subproj into a directory called sub/.

  2. Merge the modified subproj into mainproj, thereby putting the sub/ directory at mainproj's root.

It turns out the first step is harder than the second, since it involves changing all the commits in the subproj repo's history to act on files located in the new subdirectory, rather than in the project's root. The tool we'll use to do this is git filter-branch. It lets you rewrite the project's revision history, similar to how git rebase modifies your commits as it "replays" them on a new branch. But where git rebase re-orders existing the commits, git filter-branch lets you run a shell script before re-applying each commit. In our case that script will be to move all the files into the new subdirectory.

Note that all the precautions that apply to git rebase also apply to git filter-branch. If you're changing a repository's commits, you can't expect to push them back upstream. So these manipulations are best done on projects you haven't shared yet, or, as is the case here, that you plan to delete once you've merged them in elsewhere..

The procedure to move the contents of subproj's root into a subdirectory is as folows.

$ git clone subproj subproj_tmp
$ cd subproj_tmp/
$ git filter-branch -f --prune-empty --tree-filter '
> mkdir -p .sub;
> mv * .sub;
> mv .sub sub
> ' -- --all

Let's break this down. We're using the -f switch to force git filter-branch to continue in situations where it may abort, such as if there are temporary directories, etc. The --prune-empty switch tells it to skip empty commits, which may result from the application of the filter. This is unlikely in our case, but we may as well leave it in.

The --tree-filter switch is the meat of the command. It's argument is a shell script executed in the root of the repository before the re-application of each commit. The "-- --all" arguments specify that our filter is to be applied to all branches and tags.

It's worth noting that the --tree-filter option does not honor any .gitignore rules when creating the new commits, so "ignored" files may find their way back into the commits if they are present in the working repo. We avoided this by working in a fresh clone.

So before each commit is re-applied, we're creating the .sub/ directory, moving all files in the project's root into that directory, then renaming it to sub/. We need to create the intermediate .sub/ directory because otherwise mv * would try to move sub/ into itself and cause an error. But mv ignores hidden files, so the above method works.

A downside to this strategy is that any hidden files in the root of your project, such as .gitignore, will be skipped. We address this issue by simply moving these files into the subdirectory manually, and committing the change.

$ git mv .gitignore sub/
$ git commit -am "Move .gitignore into subdirectory."

The repo now has a new sub/ directory, but it also still has the original files in the project root. These are untracked, however, so they can be ignored for our purposes.

Before leaving subproj_tmp, we use the git gc command to delete loose objects, etc.

$ git gc --aggressive

We can now merge subproj into mainproj.

$ cd ../mainproj
$ git remote add subproj ../subproj_tmp
$ git fetch subproj
$ git merge subproj/master

Here we added the subproj_tmp repo as a new remote for mainproj, fetched it and merged it in. Since all of subproj_tmp's commits live in the sub/ directory, the result of this merge is simply to add the sub/ directory to mainproj.

We can now delete the remote we created, clean up, and push mainproj to it's origin.

$ git remote rm subproj
$ git gc --aggressive
$ git push origin master

It's probably a good idea to delete our working subproj repo and archive the original, just in case.

$ cd ..
$ rm -rf subproj_tmp/ 
$ mkdir archive
$ mv subproj/ archive/

And that's how you make one repo a subproject of another one, while maintaining the commit history of both.

Version control software seems to follow the Pareto principle — you get 80% of the benefits by learning 20% of the features. The downside is you tend to never get around to learning that other 80%, which can be useful in a pinch. Problems like this are a good excuse to further explore that 80%.