Git is the worst version control system in the world, apart from all the other ones I’ve used.
Almost all teams that collaborate on code use Git, especially if that collaboration involves the internet. Across IT, developers recognise the benefits of distributed source control and the branch-review-approve-merge workflow that Git suits so well.
Except that there isn’t just one way to use Git. As a consultant, I get to see how different organisations make use of tooling and I’ve yet to come across two that work exactly alike. Even within a single company, it’s really common (and absolutely fine) for different teams to use source control in different ways.
Whatever works fine for a team is a great choice for that team-it’s satisfying when people can tweak a tool to suit their circumstances. This article isn’t about when source control works fine, though; it’s really about when it doesn’t. I’m going to run through a few tips on how to avoid patterns that make for problems.
Be nice to future you
I don’t often look at commit messages - but when I do, I might be in a hurry. When I started in IT, people used to carry pagers: the kind that doctors still use because they’re straightforward and reliable and the batteries last ages.
That experience means that when I’m writing a commit message, my target audience is future-me reading the message at 09:01 on a Monday, or maybe just after midnight on Sunday. It’s also for the colleague whose git bisect has just found the bad commit. Hindsight, along with careful testing, makes it easy to spot the wrong line of code but it doesn’t give you as much of a clue about what the code should have looked like instead.
Specifically, I try to write:
- Succinct commit messages. Ideally, commit messages should be short and to the point. “Fix frobnicator reload bug” serves as a good summary; at least it does if the change corrects a bug in the frobnicator.
- Relevant commit messages. “asdf” isn’t a summary of a change; it’s some characters that keep a linter happy. If you end up having to look through old commits, run a bisect, or rebase, then a good summary saves you time.
- Commit messages that explain intent. You don’t need to describe how the code behaves, because the code (and comments) do that already. A good commit message lets future-you ask the question “what was I trying to do here?”, and get a reasonable answer.
There’s a very strong Git convention to have a short summary line, then a blank line, then the rest of the information. If every commit message is one line, that’s a sign that people aren’t taking the time to go into detail for the occasional, more involved change; for example:
git commit -m "Fix frobnicator reload bug" \
-m "BUG-42 Reloads no longer fail between 00:00 and 01:00 UTC"
keeps the summary short for skimming.
There’s relevant extra detail (the second
-m adds the commit message body)
plus a defect reference that future-you, or another colleague, can follow up
on if need be.
Just right squashing
A squash converts a series of changes into a single change that has the same end result. I’ll always recommend you avoid too much squashing. This is where you might run up against company policy. The motivation for squashing is to avoid a merging a series of commits that show the rough work on the way to getting a change ready to merge. I’ve seen both extremes: all rough work left in, or at the other end every PR squashed down to one commit. I want to recommend a middle ground.
Let’s say you have commits that look a bit like:
f5171df Implement metrics scraping
fb2b925 Fix missing header
d662124 Update index.js
87ac99d Apply suggestions from code review
c210169 Enable metrics scraping
As it happens, the first and the last commit there are the important ones:
you’re adding a new feature, and you’re enabling it.
Keeping those changes separate is great, because if you ever want to turn off
scraping for a test you can revert
c210169 and leave the implementation in place.
It’s a similar story if you want to switch to looking up feature flags at runtime.
Git lets you take your branch and selectively squash those commits to something like:
fb9680b Implement metrics scraping
2e029c9 Enable metrics scraping
Any further and you’re losing important detail. Watch out for that. If it’s a company policy to always squash down to a single commit, I actually recommend you see about changing that policy.
CI / CD integration
Just like there isn’t just one way to use source control, there’s no right way to integrate with automated build and test processes. No worries. I’ve still got some tips that’ll help you avoid doing too many redesigns.
Decide what to test.
This one sounds simple, but you’ve often got a choice: do I test the code from the pull request, or do I test what the target branch might look like after a merge?
Lots of CI tools pick the first approach by default. You put in a pull request for your branch which, let’s say, ends at commit
2e029c9. The CI system clones the repo, checks out
2e029c9, and reports all is well. Only trouble here is that although Git is happy to merge that into master, the merged code doesn’t work. You didn’t test the merged code and it turns out there’s a subtle bug.
If you want that feedback before you merge a change that breaks the build, you can have that. Set up your CI system to fetch master and (locally, inside the CI system) merge in the tip of the branch from the PR. Now run the tests (and remember to invalidate or re-run them if something else merges to master).
Is it worth switching CI service to make this happen? That’s up to you.
I definitely want you to know that it’s possible.
Connect deployments to the PR that triggered them.
Unless you’re a lot luckier than I’ve seen, problems still slip past branch-level tests and end up in master. They get deployed and they need dealing with. To give your team the best start in that, track the commit that you’re building from and preserve that information through and past the point where the code goes live.
If you’re using a monitoring tool that lets you add metadata to metrics or traces, capture the component’s commit ID as part of that. There’s a whole topic on why this is a great idea but, trust me, it’s a huge time-saver when you need it.
Commit IDs are great; you can go further and track pull request details too. The idea here is to make it easy for developers to watch the deploy they just did or at least to notice when there might be problems. Tools like deliverybot are there to let you get on with the next bit of work but also drag you back to your last PR if things aren’t looking right.
Write tests you can run locally. That’s right. The tests that your CI / CD system carries out should also work on your PC, so that:
- you can separate troubleshooting the tests from troubleshooting the thing that runs the tests. If you need to submit a pull request just to see if your changes fix it, that slows you down quite a bit.
- you can switch to a more manual workflow if your provider is down. Outages are
rare, but perhaps not as rare as you’d like: even big names like Microsoft
(who own GitHub) sometimes see services go offline and take time to come back.
Because Git is distributed, you can even sign up with a different service during a too-long outage, push your branches there, and ask a colleague to check your merge request manually.
- you can work offline. Ubiquitous connectivity means this isn’t the vital detail it used to be, but if you’re the wrong end of a flaky or just plain broken internet connection, it’s really useful to be confident that when it comes back up you’ll have a set of changes ready for review.
(Almost) always be merging
I hope this isn’t news to you: it’s a good idea to do work in small batches, to avoid long-lived branches, and to merge PRs as soon as they’re approved. For super prompt merging, tools like Mergify, Prow/Tide from Kubernetes, or Bulldozer from Palantir, can go beyond convention and make it automatic.
Yes, then: do all the above. When isn’t it a good time to merge?
GitHub has a setting “require branches to be up to date before merging”, useful
for some scenarios like infrastructure-as-code. I’m picking that example because it’s
one where you have desired and actual state. Your source code is the desired state and
your running infrastructure is the actual state. With app software you can quite easily
have 6 different versions you support and that are concurrently published.
With infrastructure it’s a different story; you only have one live environment, maybe with a blue/green style replica or something like that. There’s only one place that’s supposed to line up with master and that’s what gets changes applied whenever you merge a change.
For an easy life, you want master to match what’s actually deployed. Unlike app code changes, it can make sense to roll out infrastructure updates in a defined sequence. Imagine that I submit a pull request and you approve it, and then another colleague’s change lands first–and the impact this might have. Usually, of course, it’s fine. With good processes it can always be fine. If it goes wrong you can have serious consequences - from undoing a feature flag you just flipped on, to triggering a destroy-and-replace on a resource and blowing your error budget.
What do you do if your pull request is out of date? GitHub suggests that you merge master
(or whatever your target branch is called) into your PR, with a tempting button.
I recommend though that you fetch the updated master branch and
git rebase. That way,
your PR has a clean set of commits.
Similarly, if you and a colleague are working on the same area of code, think twice before you merge their code into your branch. Talk to them, decide whose changes will land first, and if you’re the one going second, rebase. If you have to have both changes land at the same time, Git lets you do an octopus merge (though I’ve yet to see a case where this was needed).
This blog is written exclusively by The Scale Factory team. We do not accept external contributions.