Hardscrabble 🍫

By Max Jacobson

On conflicts

03 Apr 2021

Usually when git tells me that my feature branch has a conflict, it makes sense to me. Occasionally though, I’ll have a conflict, and I’ll look at and think… “Huh? Why is that even a conflict?”

I think it makes sense to me now though. Let’s talk about it.

An example conflict

Let’s look at a classic example of a git conflict.

Let’s say on the main branch, there’s a file that looks like this:

class Dog
  def speak
    puts "woof"
  end
end

Then let’s say I check out a feature branch my-feature, and I change that file to look like this:

class Dog
  def speak
    puts "ruff"
  end
end

So far, so good.

However, unbeknownst to me, my colleague also checked out a feature branch and changed the file like so:

class Dog
  def speak
    puts "bark"
  end
end

And then merged that in.

The commit history looks a bit like this:

* dog.rb is introduced     -- commit A, on branch main
| \
|  * I update dog.rb       -- commit B, on branch my-feature
* colleague updates dog.rb -- commit C, on branch main

I have a merge conflict on my branch now! But why exactly and what is a conflict?

Why we have conflicts, and what they are

Derrick Stolee recently published a very good blog post on the GitHub blog called Commits are snapshots, not diffs which I would encourage you to read. It seeks to clear up that common misconception by going into a lot of the weeds of how git organizes its information.

In our example, we have three commits, which means we have three snapshots of the repository at three points in time.

The problem, then, is: how do you combine two snapshots?

Commits are snapshots, not diffs. That just means that git is not storing diffs on your file system. However, each commit does know which commit came before it (its parent), and git does know how to generate a diff by comparing two snapshots.

In that example, commit B’s parent is commit A. Commit C’s parent is also commit A.

So what happens if, while on the my-feature branch, we run git rebase main, to keep our feature branch up-to-date with what our colleagues are doing?

  1. git figures out which are the new commits that aren’t yet on main (just commit B)
  2. git figures out what is the latest commit on main (commit C)
  3. git resets to how things are as of the commit C snapshot
  4. git goes through the new commits, one by one. For each one, git generates a diff between it and its parent, and then attempts to apply that diff to the current workspace. As soon as there’s a conflict, it halts.

In this simple example, there’s only one new commit, and its diff with its parent looks like this:

$ git diff A..B
diff --git a/dog.rb b/dog.rb
index 825e50b..079f1c1 100644
--- a/dog.rb
+++ b/dog.rb
@@ -1,5 +1,5 @@
 class Dog
   def speak
-    puts "woof"
+    puts "ruff"
   end
 end

When git attempts to apply this diff, it needs to do a handful of things:

  1. locate the file
  2. locate the old lines that need to change
  3. replace them with the new lines

This is a deceptively difficult task. Let’s review the information that is available in the diff:

  1. the old filename (--- a/dog.rb means it used to be named dog.rb)
  2. the new filename (+++ b/dog.rb means it’s still named dog.rb)
  3. the line numbers represented in the diff. We can interpret @@ -1,5 +1,5 @@ as: “in the old version of the file, this diff starts at line 1 and covers 5 lines and in the new version of the file, … same!”
  4. the lines that were removed (the ones starting with -)
  5. the lines that were added (the ones starting with +)
  6. some contextual lines before and after

That certainly sounds like a lot of information, but consider a scenario. Let’s say someone added a one line comment on the main branch. Our diff thinks it applies to lines 1-5, but now it applies to lines 2-6.

The line numbers, then, are not enough to locate the lines that need updating.

Interesting, right?

Ok then, so we can imagine that it scans through the file looking for the line to be removed (puts "woof"), and replaces it with the line to be added (puts "ruff"), whatever line it is on now.

This gives us our first hint of what a merge conflict actually is. If we want to change puts "woof" to puts "ruff", but it no longer even says puts "woof", how exactly are we supposed to do that? The diff cannot be applied.

Fair enough, right?

It isn’t that simple, though. We can imagine a scenario where that same line shows up several times in the same file, but only one of them should be updated. So how does git figure out which ones to update?

Look: perhaps it’s too late in the blog post to disclose this, but I’m just guessing and speculating about how any of this works. But here’s what I think. I think it looks at those unchanged contextual lines before and after the changed lines.

By default, git-diff produces diffs with three lines of context on either side of a change. With that, git can locate which is the relevant change. Don’t just look for the line to change, but also make sure it appears in the expected context.

This points to another kind of conflict, which I hinted at the beginning of this post. This is a conflict which can feel unnecessary, but when I think about how git applies diffs in this way, it makes more sense.

A more interesting example

Let’s say on the main branch, there’s a file that looks like this:

def calculate(a, b, c)
  a + b + c
end

Then let’s say I check out a feature branch my-feature, and I change that file to look like this:

def calculate(a, b, c)
  a + b * c
end

So far, so good.

However, unbeknownst to me, my colleague also checked out a feature branch and changed the file like so:

def calculate(a, b, c)
  # Tells us how many lollipops there are
  a + b + c
end

And then merged that in.

The commit history looks a bit like this:

* calculate.rb is introduced     -- commit A, on branch main
| \
|  * I update calculate.rb       -- commit B, on branch my-feature
* colleague updates calculate.rb -- commit C, on branch main

We updated different lines, so there shouldn’t be a conflict, right?

Wrong! There is a conflict.

And it kind of makes sense now, doesn’t it? When git is applying the diff, it relies on those context lines to figure out where in the file to make the change. If the context lines in the diff no longer exactly line up, git cannot be 100% sure it’s doing the right thing, and so it raises a flag.

And I suppose all I can say is… fair enough!

How to watch tennis

27 Mar 2021

I like to watch tennis, but I had to learn what to pay attention to. Here are some of those things. This is all pretty basic stuff, and I’m not at all an expert, etc, etc. I’ll assume you know the basic rules.

Actually watch

While the ball is in motion, tennis audiences are silent. They will erupt in uncontained screams and cheers, but only at the appropriate times: when a point has been won. This means you actually need to watch. If you’re on your phone and you’re waiting for some audience cues to tell you when something exciting is happening and you should pay attention, you’re going to miss everything.

What do you see?

It’s two people hitting a ball back and forth, adversarially?

Yes.

Gladiators

It’s also very high drama. Two players enter the tennis court alone, carrying their own gear, and enter into a battle of wills, endurance, resolve, and skill, and push each other to their limits, while thousands of people cheer and boo. At the end, one slinks off, carrying their own bag into the locker room, defeated.

The other survived.

Grand slams

The tennis season is super long, it’s basically the whole year, and there are events pretty much all the time. As a tennis fan, what should you actually watch?

Watch the Grand Slams. They’re the four biggest tournaments each year. There’s the most prize money, so everybody shows up. And because they’re in this special class of their own, people pay attention to how many Grand Slams a player has won, and that’s the measure of who are the very best players.

The grand slams are:

  1. The Australian Open. It happens in January. Players are well-rested after their holiday break. It’s summer there, which is fun. They call it “the happy slam”. It’s played on hard courts.
  2. The French Open (aka Roland-Garros). It happens in May. It’s played on clay. On the men’s side, Rafael Nadal always wins, because he is “the king of clay”.
  3. Wimbledon. It happens in July, in London. It’s played on grass. Everyone dresses in all white. It’s a bit retro.
  4. The US Open. It happens in August in Queens, New York. It’s played on hard courts. I’ve gone to this one a few times.

If you aren’t sure where to start, just wait until the next one of these and dive in.

Where do I watch?

I mostly watch on Tennis Channel Everywhere a streaming site that I pay for. During Grand Slams, some of the matches are only streaming on ESPN, so I might temporarily subscribe to cable for a month to get those matches, too.

How do I enjoy a tournament?

At the start of a tennis tournament, there may be something like 128 players in the draw. You’re not gonna know who most of them are, but if you stick around you’ll start learning some names. In the early rounds, just pick some matches at random and watch them. Because of the way brackets get seeded, all of the top players will be paired up against bottom players and are fairly likely to advance, and it’s not that fun to watch somebody get clobbered. It’s probably going to be more fun to watch some of the middle players who got paired up against each other, who are more evenly matched. Pick someone and root for them. If they’re both boring, just switch to another match. The men’s and women’s events run at the same time, which means that there’s really quite a lot of matches on at the beginning of an event. Try to find someone you like rooting for and stick with them for the whole match.

Each round, half of the players are knocked out. It’s vicious, thrilling, efficient.

Storylines start to emerge. Someone seems unstoppable. Someone no one has heard of has made it to round three. Someone sustained an injury and we wonder how it’ll affect them in their next match. Last year’s finalists are on a collision course for a rematch in round four!

Etc.

Pick the story that resonates with you. Root for someone to go all the way. Believe that they will. See what happens for you emotionally, if anything.

Get comfortable

Sometimes tennis matches are really long. Like it’s not that weird for a match to be five or six hours long. You can sometimes take a break from a match, watch a movie, and come back to catch the end.

Women’s matches are basically always three sets, while men’s matches are sometimes three and other times five, depending on the particular tournament.

A typical set takes anywhere from 30-90 minutes depending on factors like how many games there are (a 7-6 set takes a lot longer than a 6-0 one, of course), how long the players are taking between serves, how long the rallies are, how many deuces there are (you have to win by two, so individual games can technically go on forever), and whether there are any medical timeouts.

Some tournaments use a tie breaker in the final set, while others let the final set go on indefinitely, until someone wins by two games.

There are perennial debates about whether they should downsize all five set match events to three sets, which are less grueling for the player and audience. No one can agree. Part of the sport is stamina, and some are loath to decrease that element.

You’ll know you’re a lost cause when you start wishing matches were longer.

Enjoy the crescendos

The most exciting moments in a tennis match are at the end of sets, especially sets which could decide the match. I’m reminded of that Seinfeld bit about muffin tops. Why don’t they make the whole match out of the exciting ending?

Well, in tennis, they kind of do. In a normal sport, there’s only one exciting ending. In tennis there’s as many as five, if it’s a five set match.

True, after each set they reset and start over, and that can be a little bit boring. But it can be fun to pay attention at the start of the new set. The person who just lost the last set, do they seem defeated or do they seem pissed off and full of resolve? The person who just won, can they keep up the momentum, or are they acting like they’re just happy they were able to win one and they can go home now?

Are we on serve?

One thing I didn’t realize when I first started watching tennis is that for each game, one of the players is “supposed” to win, and that’s the player who’s serving. In each game, only one player is serving, and they alternate. If the serving player wins each game, you will hear the commentator say “they are on serve”. When the set is on serve, the score will move like this:

  • 0-0
  • 1-0
  • 1-1
  • 2-1
  • 2-2
  • 3-2
  • 3-3
  • 4-3
  • 4-4
  • 5-4
  • 5-5
  • 6-6
  • And then they’ll play a special tiebreaker game to determine who wins the set

If that’s what’s happening, it means that the players are well-matched, and you’re building to that dizzying crescendo of the set tiebreaker.

If the serving player loses a game, that’s called “getting broken”, and it puts them on a course toward losing the set. We can imagine a sequence like this:

  • 0-0
  • 1-0
  • 2-0 (a break of service!)
  • 3-0
  • 3-1
  • 4-1
  • 4-2
  • 5-2
  • 5-3
  • 6-3

In that sequence, there is only one break, but the first player will take the set 6-3. In a set like this, it’s pretty clear after the second game which player is on course to win the set. That can still be very exciting, though, because it puts a lot of pressure on the losing player to right the ship. Every single game they serve, they must win and they need to “break back” to “get back on serve”. And even if they do that, which is not easy, then they still need to break again to get ahead, or try their luck in a tiebreaker.

If a player goes up by two breaks, there’s not a ton of suspense. It’s so hard to come back from being down two breaks. So… when someone does come back from two breaks, you would not believe how exciting it is. When you, as the spectator, have given up on the player, but the player has not given up, it teaches you something.

Who’s your favorite chair umpire?

The chair umpire sits in a big chair and directs the flow of the match. You’ll hear their voice throughout the match, doing things like:

  • impassively declaring the score of the current game
  • stating that a serve hit the net, and the serving player will need to serve again
  • acknowledging when a player has challenged a call (e.g. the ball was called out but they really think it was in)
  • issuing warnings for taking too long to serve or cursing
  • acknowledging that a player is taking a medical time out

The tours have a stable of chair umpires who they re-use all the time. The die-hard fans know their names and have favorites. Some of them have legendary voices. There’s one guy who says “deuce” with such abundant gravitas that you almost wonder if he’s joking.

I don’t know any of their names, but I think I’ll get there.

Watch out for the fist pumps

The universal tennis celebration is the fist pump. If someone wins a long rally they might let out a furtive little fist pump If someone wins a set, you know that fist is pumping above their head. Absolutely nobody pumps their fist better than Rafael Nadal. There are YouTube compilations. It’s a thrill. I encourage you to adopt this habit in your day-to-day life.

Another common celebration is to scream something like:

  1. “Come on!”
  2. “Vamos!”
  3. “Let’s go!”

These are all also great, the more guttural the better.

“New balls, please”

During a match, there’s a finite set of balls that they’re playing with. You’ll notice that there are a number of “ball kids” on court. After each point, a ball kid will scurry after the ball and grab it, then get back into position. At the next opportunity, when the next point ends, they’ll roll that ball down the court in the direction of the player who’s serving. At that end, there are two dedicated ball kids who are accumulating all of the balls, so that they’re ready to feed them to the serving player.

The players really wallop these balls, and they pretty quickly lose their bounciness. After every X games, the chair umpire will say “new balls, please”, at which point they replace the dead ones with some fresh ones. The commentators will often act like this has a big impact on the vibe of the gameplay but I honestly have never noticed this.

Geography

It really matters where on the court the player is standing. Some players will stand way behind the baseline. This can be great: if their opponent hits the ball all the way to the left or all the way to the right, they have more time to get to it. This can be bad: their opponent can hit a “drop shot”; if they pop the ball just over the net, there’s no way to make it in time.

Some players like to work their way toward the net. This can be great: from the net, you can smash the ball into the ground, sending it flying out of your opponent’s reach. This can be bad: your opponent can make a “passing shot” where they send the ball flying past you, and because you’re so close to them, you need to have lightning reflexes to actually return it; or, they can make a “lob” where they hit the ball in a high arc over your head, forcing you to try and chase it down, staggering backwards, eyes peering into the sun.

Nothing makes me happier than watching a beautiful lob float over someone’s head and drop right on the edge of the baseline.

Running ragged

It can be instructive to pay attention to which player is running more. You’ll often see one player standing stock still in the middle of the court around the baseline, hitting the ball left, then right, then left, then right, forcing their opponent to run back and forth. Which one would you rather be?

If you’re just looking to get a quick sense of which player is “in control”, pay attention to who is moving more.

Second serves

There are very specific rules for what makes a valid serve. If your serve isn’t valid, you get to try again. If your second serve isn’t valid, you just lose the point, which is called “double faulting”.

That’s all clear enough, but what I didn’t realize until I started watching tennis is that this system incentivizes players to serve in a particular way. Basically: on your first serve, players take more risk and on second serves they take less risk. A riskier serve is one where you’re hitting it harder, or aiming for the edges of the service box, making it more likely that you wham the ball right into the net, or out of bounds. Those riskier serves are going to be harder to return, so you’re more likely to win that point if you get that sweet serve in. And if you don’t, who cares, you just get to try again. The second serve has higher stakes, because you can actually lose the point if you mess up twice. So a typical “second serve” is much slower, and it’s aiming for right square in the middle of the service box. These are going to be much easier to return, but at least you didn’t double fault.

Some players make the calculated choice to always go all out and do “two first serves”. With this strategy, they will double fault sometimes, but they’ll never give their opponent something easy to work with. If they’re consistent enough, that can be a smart calculation. Players with very consistent, aggressive serves are sometimes called “servebots”.

If one player is making more of their first serves in than the other player, that’s a big advantage, and something worth paying attention to.

Tennis twitter

One of the best parts of following tennis is tennis twitter. I have a list with ~75 people on it that I will check in on during tournaments to know if there’s an exciting match I should be watching or a storyline brewing I should be aware of.

Some of the active/informative/fun ones I follow, if you want to bootstrap your own tennis twitter:

Other media

If you want more tennis media to stay up-to-date, I recommend these:

Wrap up

In conclusion, tennis is great. It’s also fun to play it. But that’s maybe another post.

Making FactoryBot.lint verbose

10 Feb 2021

Here’s a quick blog post about a specific thing (making FactoryBot.lint more verbose) but actually, secretly, about a more general thing (taking advantage of Ruby’s flexibility to bend the universe to your will). Let’s start with the specific thing and then come back around to the general thing.

If you use Rails, there’s a good chance you use FactoryBot to help you write your tests. The library enables you to define “factories” for the models in your system with sensible default values.

FactoryBot has a built-in linter, which you can run as part of your CI build. It will try to identify any factory definitions which are faulty.

At work, we run this as a Circle CI job. It almost always passes, but every now and then it catches something, so we keep it around.

Recently, it started failing occasionally, but not because of anything wrong with our factories. Instead, it was failing because it was just… slow. Circle CI is happy to run things like this for you, but it gets antsy when something is running for a while and printing no output. Is it stuck? Is it going to run forever? If something is running for 10 minutes with no output, Circle CI just kills it.

Our factory linter apparently takes ten minutes now, we learned.

So, what to do about that?

Per that support article, one option is to just bump up the timeout. That’s easy enough. We could tell Circle CI to wait 15 minutes. In a year or two, maybe we’ll need to bump it up again, assuming such pedestrian concerns continue to dog us then, while we drive around in our flying cars.

Another option would be to just stop running it. It’s useful-ish but not essential. That’s easy enough.

Another option would be to configure the linter to print verbose output while it’s running. If we could do that, then we’d get multiple benefits: first of all, Circle CI would be satisfied that it is not stuck, and that it is making progress, and that it might eventually finish, even if it takes more than ten minutes; but also, having some output might be interesting and useful, no? Hmm. I pulled up the FactoryBot docs and saw an option verbose: true, but it didn’t seem to be what I wanted:

Verbose linting will include full backtraces for each error, which can be helpful for debugging

I want it to print output even when there are no errors. I didn’t see anything like that.

Imagine a ruby file with this stuff in it:

require 'factory_bot'

class Dog
  attr_accessor :name

  def save!
    sleep 4
    true
  end
end

class Band
  attr_accessor :name

  def save!
    sleep 4
    true
  end
end

FactoryBot.define do
  factory :dog, class: Dog do
    name { 'Oiva' }
  end

  factory :band, class: Band do
    name { 'Bear Vs. Shark' }
    albums { 2 }
  end
end

factories = FactoryBot.factories
FactoryBot.lint(factories)

There’s actually a bug in our Band factory: it references an attribute called albums which does not exist in our model code. The linter will catch this.

Looking at that last line, it looks like we just pass in the list of factories, and then presumably it will loop over that list and check them one-by-one.

Looping over things is a really common thing in Ruby. Anything that you can loop over is considered “enumerable”. Arrays are enumerable. Hashes are enumerable. When you query a database and get back some number of rows, those are enumerable.

A list of factories is enumerable. Hmm.

Let’s try writing our own enumerable class, to wrap our list of factories. We’ll call it ChattyList. It’ll be a list, but when you loop over it, it’ll chatter away about each item as they go by.

In general, if you’re calling a method and passing in one enumerable thing, it would also be fine to pass in some other enumerable thing. It’s just going to call each on it, or reduce, or something like that from the Enumerable module.

class ChattyList
  include Enumerable

  def initialize(items, before_message:, after_message:, logger:)
    @items = items
    @before_message = before_message
    @after_message = after_message
    @logger = logger
  end

  def each
    @items.each do |item|
      @logger.info @before_message.call(item)
      yield item
      @logger.info @after_message.call(item)
    end
  end
end

factories = ChattyList.new(
  FactoryBot.factories,
  logger: Logger.new($stdout),
  before_message: -> (factory) { "Linting #{factory.name}" },
  after_message: -> (factory) { "Linted #{factory.name}" },
)

FactoryBot.lint(factories)

When I run the script, the output looks like this:

I, [2021-02-10T22:43:03.676359 #57462]  INFO -- : Linting dog
I, [2021-02-10T22:43:07.678616 #57462]  INFO -- : Linted dog
I, [2021-02-10T22:43:07.678712 #57462]  INFO -- : Linting band
I, [2021-02-10T22:43:07.679373 #57462]  INFO -- : Linted band
Traceback (most recent call last):
        2: from app.rb:59:in `<main>'
        1: from /Users/max.jacobson/.gem/ruby/2.6.6/gems/factory_bot-6.1.0/lib/factory_bot.rb:70:in `lint'
/Users/max.jacobson/.gem/ruby/2.6.6/gems/factory_bot-6.1.0/lib/factory_bot/linter.rb:13:in `lint!': The following factories are invalid: (FactoryBot::InvalidFactoryError)

* band - undefined method `albums=' for #<Band:0x00007ff7b9022890 @name="Bear Vs. Shark"> (NoMethodError)

Nice! As the linter chugs thru the factories, it prints out its progress. With this, Circle CI will see that progress is happening and won’t decide to kill the job. This option wasn’t offered by the library, but that doesn’t have to stop us. Isn’t that fun?

By the way: that might be a good option to add to FactoryBot! Feel free, if you’re reading this, to take that idea.