Skip to main content

Being the only one who uses git in your company whilst everyone stucked in SVN

A year after being the only man using git in my svn-based company, I guess its time to share some of my experience about git-svn. Due to the odd setup of our SVN server, I had to went through so many different solutions, that are so specific, it might only suited to this one case. For now where I stand, there are still some inconvenient workarounds just to keep my workflow running. Do take this with a pinch of salt

Background

Before I joined the workforce in my company, I had been rather well versed in basic git command. Working in a team had gotten me comfortable with basic git workflow. I am so used to the concept of having multiple branches and local commits, and having an upstream git server, where we squash our 30+ commits made over 2 hours into a consolidated commit with nice message about what we had done.

Things changed when I joined my current company, in which we are using SVN as our version control tools. There is no concept of staging commit, and basically everyone works with one copy, and the term "commit" does not differentiate local commit and pushed commit. That means if I do svn commit as I would with git commit, I would have committed and push a commit with tons of swear words in its commit message, which I had only fixed a mishandled parenthesis onto the SVN server.

And also, if I am going to treat the SVN server main developing branch as what I would do in local git, the server will run out of disk space rapidly. This is due to the SVN server is setup in an interesting (but questionable) way. It was setup in late 2006 (based on the commits I'm seeing), and it seems to be having a post commit hook which checks the build by compiling the binary. Now comes the interesting part, if it build successfully, it will send us the build log, and commit the binaries it built onto the SVN itself!!!

Now you see why I wouldn't want to do that and annoy both my team for awful commit messages, and the operation team for having to migrate the SVN server.

Let's clone the repo!

I was aware of the existence of git-svn at that time, but I wasn't sure how well it would work. One of my first entry point is Use Git even if your team doesn’t: git-svn tips and tricks. It gave me plenty of information to get me started, such as performing code checkout and how to handle svn:exclude.

One thing I did not follow, however, is I did not do a shallow clone for the repo and start working from there. I first tried to checkout the entire repo, thinking I would want to preserve all of my seniors' comment and work. It turns out I wasn't that bright. The checkout did not finish even after I left my work laptop running overnight in office (SVN server in office local network). In the end, I checkout only the branch that we are working on to keep my sanity in check, but that still leaves the me a repo sitting around 20GB after several git gc!

I had also setup an "upstream" git server locally using Gitea, so I can keep track of "issues" (basically an easy TODO list). Now I have 2 copies of the same git sitting in my working laptop, 20GB each.

Trimming down the repo (or so I thought)

There are plenty of tools on the internet made for removing binary or large files from git history tree. I had tried git filter-branch and bfg, and bfg worked exceptionally great in its job. I was able to trim a 20GB repo down to about 250MB!

It all works well until I tried to git svn rebase to update local repo from the original SVN server. I ran into troubles which the SVN server wouldn't recognise my local git repo. Removing binary history has apparently broken the integrity of my repo, and I couldn't for my life make the link back onto the SVN server.

Furthermore, whenever there is a new commit made by our team onto the SVN, another commit which contains the binary is automatically added. If I am going to remove binary from history for every single commits, might as well just get the diff file, apply patch, copy commit message and push it myself.

I gave up after several weeks of tinkling, I kept the 20GB repo and working from its branches.

Handling svn:externals

Like any good, modern, modular software, our repo also comes with external libraries... that are developed internally. One of the advantages of file-based SVN is you can basically point to any subfolder and just checkout the code and it will treat it as if its the root; whilst git does not have this privilege.

Our project structure roughly looks as follow:

the_company_project
|-- src/
   |-- ext_lib/  <-- svn:external
   |-- other_lib/
   |-- ...
|-- inc/
   |-- ext_lib/  <-- svn:external
   |-- other_lib/
    |-- ...
|-- bin/
|-- bin64/
|-- ...

And our external library structure looks as follow

ext_lib
|-- src/  <-- points here
   |-- http.c
   |-- charset.c
   |-- ...
|-- inc/  <-- and here
   |-- http.h
   |-- charset.h
   |-- charset.inc
   |-- ...
|-- lib/

I have searched through the internet and it seems like the general consensus for cases like this is that, there is no fine way to transform it into git submodules elegently. So what are you going to see might be the dumbest and ugliest way possible to barge these together.

What I had done:

  1. git svn clone the external library
  2. Create new branches src and inc, checkout to them
  3. git reset everything till the first commit, hard
  4. Checkout src/ or inc/ from master HEAD, depending on your branch
  5. Copy everything in src/* or inc/* to your root repository folder
  6. Remove the folder
  7. git reset --soft to undo the staging checkout
  8. Stage, commit, and push to your upstream

So now, we can point our git submodules to the same repo location using different branch, and now all the files are in place! It is ugly, but it works so well that I have to only set it up once. Do take note that whenever there is an update in the external library, I will need to perform step 4-7 again, then update the main project submodule. Unlike SVN, git will not do that automatically for you.

SVN server migration

Due to constant renewal of new binaries and having to keep track of them, eventually our SVN server had run out of disk space. I bear the risk of having to checkout everything from the new SVN server and lost working branches on the old one that I had, as different URL in the link and time makes a different SHA1 for git commit. It will become two different entity, ignoring the commit UUID which the SVN had.

Luckily, rewrite root worked so well to me, that a simple git svn info immediately shows me the new SVN server already connected.

$ git config --local --replace-all svn-remote.svn.rewriteRoot `git config --local --get svn-remote.svn.url`
$ git config --local --replace-all svn-remote.svn.url <new_url>

What it does is basically treat all of the old SVN URL in your commit messages as alias to your new SVN URL. I did not use git-filter-branch, as it will have to rewrite the history of the commit message. As this is such an easy solution, I don't see the point of wasting time going for the more complicated solution. Not to mention at the time, I am working on several local branches. The last thing I wanted is having the main branch being modified since the dawn of time.

Actual git-svn in operation

It works fine to some extent. I had to make a git-related commit whenever I branch out from main (the one I keep up with SVN branch), and then I am free to do all sorts of crazy git stuff all I want. Whenever I need to commit back onto svn, I just need to do a git merge --squash, remove git related things (such as submodules and .gitignore), make a nice commit message summarising the changes, and git svn dcommit.

Of course, the other way round works as well... To some extent... Whenever I wanted to merge from main branch to get updates from svn, I would normally rebase the branch if I hadn't gone too far, or do just a direct merge otherwise. Due the nature of our SVN setup, I try to refrain myself from merging using squash, as git will always try to break my computer when it is trying to work out how to squash binaries.

Good news is, I might have nagged my supervisor enough, that she might have told our CTO about the problems on old SVN server committing binary that no one is using. Or maybe he realised that, yes, no one is using the committed binary and leaving it there will just further eat up more the storage space. He decided wisely to removed the binary commit. Now, the new SVN server does not commit the binary anymore, it will just send an email informing everyone about the build status.

Sensible SVN setup deserves sensible git mirror (?)

Here's my plan:
Since we now have SVN that consists only sensible code changes, I am going to perform a shallow clone/checkout git svn clone -rN:HEAD <svn.url> from the first commit on the new SVN. I don't care about the old history anymore, and now, the git-svn-id in the commits will be pointing to the new SVN URL. I do not need to worry about merging binary across branches.

Sounds great right? Except in the eye of git, they are now two totally different repo and cannot possibly be merge unless you are some sort of lunatic. I couldn't find a way to sensibly migrate my branch changes from the huge 20GB repo onto the fresh copy. I don't even think it is possible considering I branched out prior to the first commit in this fresh new checkout.

For now, I am thinking to just put the 20GB git repo apart after committing all of my code changes onto SVN, and start working on new things on this new checkout. I will keep the 20GB only at "upstream", and remove the local files to save some storage space. I guess this is the most sensible way of doing so, and I get to keep my precious **FOKKIN SEMICOLON** commit message from the old repo.