Bravely squash or cowardly refuse: a git conundrum

Why would somebody ever want to take perfectly fine code history and erase it? Phrased like that it makes git squashing seem ridiculous. Alas, some people find sanitizing history is the preferred option. I’m a bit skeptical, but they have their reasons.

Why bother

In git, squashing is the process of taking several commits and combining them into one.

Why would I bother doing this? Some people view code as a series of revisions, with each one doing something useful on its own. If anything, this can make it easier to review changes since they come in digestible small chunks.

It’s also important because git sees the code as a series of revisions. Options like rolling back and cherry-picking are primarily based on revisions. The saner the revision history the easier it is to use these features.

I begrudgingly squash because of these few reasons.

Branches are the problem

The problem I have is that I don’t work in a series of revisions, nor do many programmers I know. Looking at various commit histories I’m willing to say the vast majority of programmers do not work in the revision-centric model git offers.

I think primarily in terms of branches. I do bug fixes, and new features, entirely on a branch. My goal is entirely defined by the branch, not the individual commits.

There are a few primary reasons I commit:

  • Every safe point when the code is behaving and the tests are running, I’ll commit. This shortens debug time when I make a mistake later, since I have less code to review, and in worst case I can revert without losing my previous good work.
  • I need to interrupt my work and switch to another branch. As git encourages having a single working directory I need to put my current work somewhere. Committing it seems like the safest option. I use stash sometimes, but it then consumes mental space remembering I have things stashed.
  • I’m done for the day and wish to save my code. Regardless of the state I tend to save when I’m done work and push it the remote server. I do this since I have no desire to ever redo my work should my computer crash, or I find myself sitting somewhere else.

As you can see, my selection process is quite lenient. It results in a lot of small commits, many of which don’t represent any obvious progress. Yet all these commits are useful to me while I’m working.

Attempts to sanitize this history are hard. It’s not always clear to me how to group together revisions into something meaningful. I’m sometimes tempted to just squash the entire branch into a single revision.

How to make squash soup

Part of my apprehension is perhaps the complexity of the git commands involved in squashing.

It has the less than sensible git rebase -i HEAD~15 syntax. Where the number, like 15, is found by using gitk and counting backwards until you reach a place that seems okay. As far as I can tell rebase -i is unrelated to the rebase command: rewrite would have at least been a more suitable common name.

It looks like I could do other stuff with this tool as well, but all my attempts are interpreted by git as a request to corrupt my code. It’s so confident in fumbling that when something odd happens it reminds me how to abort and get my code back. I wish all git commands would be that polite.

The corruption of code is something that really bothers me with rewriting. One of the primary purposes of source control is to ensure I can get back to a previous working state. If something goes wrong you can actually lose work with git rebase. If you notice immediately, there are ways to get back, but they are not common git commands.

I’ll stick to the squash option.

It’s a shared problem

As silly as it sounds, a significant problem with rewriting is that it actually rewrites the revisions. I’m not actually combining revisions, or moving them to a new base; git is simply erasing the history and making a new one. This is troublesome if somebody or something else already has my code. If multiple people are working on a branch, a squash needs to be properly coordinated between all participants.

I would much prefer if git saw branches as a first-level entity rather than just a series of revisions. This would solve a lot of problems.

Categories: Programming

Tagged as: , ,

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s