Misunderstanding Binky Points

A user at the BBO message board comments about my site:

I like his "Interactive evaluator." I tried this hand on it: ♠AKQJ ♥AKQ ♦AKQ ♣AKQ. "Binky HCP NT" count rated this hand at 34.8 but "Binky HCP suit" count rated it 37.4 - I fear some head-shaking if spades don't break.
More impressively, Binky trick count confidently predicted 15.06 tricks in no trumps - it knows a good hand when it sees it.

I don't have the ability to post to BBO, so I'm going to post this here. If you have a BBO account, please post a link to here from the discussion. Thanks for posting the link, Rainer.

I'm posting this partly as a specific response, but also partly because I think this is a common mistake.

Summary response:

Binky Points can be more than 13 because Binky Points can be less than 0. They are designed to be additive with your partner's hand's value.
Binky HCP NT and Binky HCP Suit are meant to very roughly correlate with HCP, but they are not pure correlations.
Binky Points are designed statistically. In particular, if you design a system to get very rare hands right, you are going to get lots of common hands wrong.
A Binky Points Walrus would get to the right contract.

Binky Points Are Additive

Binky Points are meant to be additive - it is entirely possible for Binky Points to return 15.06 tricks in notrump, because partner's hand can have negative value. Indeed, the 4333 hand with no honors is worth about -2.36 points in notrump. With one jack, the value is -1.9.

I sometimes refer lazily to Binky Points as "tricks," but the Binky Points value for a single hand does not represent how many tricks that hand is expected to take.

Another example where Binky Points gives an odd-seeming result is hands with 13 cards in one suit. Binky points evaluates that hand as only worth about 8 Binky Points in suit contracts. Why? Because partner is necessarily void, and he will over-value his hand. As it turns out, the average value of partner's hand, when he is known to be void, is about 5 Binky Points. So we get to 13 Binky Points when we add our value to partner's value. (But note below, we don't really care about such extreme cases.)

The "expected" number of tricks is closer for our BBO poster's hand, being 13.57 for notrump and 13.52 in a suit. This is how many tricks you'd expect to make if partner has an "average" hand opposite yours (meaning about 2/3 a jack..) It tends to overvalue lots of big hands in the same way you might say you have "more than enough tricks for a grand." Only the suit contract number is odd, because it would seem to indicate we are highly likely to make 7 in some suit, and we can easily imagine cases where we don't.

It is possible for the Binky Points of two hands to add up to more than 13, even more than 14. In those cases, there is almost always good play for 13 tricks.

In notrump, the actual number of tricks you can take on a pair of hands usually has a 95% interval of plus or minus 2, based on the total points in any evaluation. So 13 total BPs is not a sure 13 tricks, it is a statistical suggestion.

Because Binky Points is based on double dummy data, it is often the case that 13 total Binky Points makes fewer tricks in reality. At 13-13.25 total BPs, you make grand slam only about 62% of the time, even with double dummy play. That does not filter out the occasional case where you are missing an ace, which is more likely to happen than if you restrict yourself to 37 HCP. However, a BP total of 14 is a pretty great bet.

As an aside, the suit values for Binky Points don't know what trump suit you are going to choose. The suit value above doesn't assume that spades are trump, but rather that you and partner will figure out what your best trump fit is. (One "bug," however, is that it assumes you find it double dummy. Opposite the above hand, if partner is x-xxxx-xxxx-xxxx, you make a suit grand if spades are 4-4, or any of the other suits are 3-3. We "hope" this averages out - most of the time, the obvious best fit is the best fit. But it is an area where the technique could lead to bias and error. I've been meaning to check what happens when we only pick an "obvious" fit. I have not done such an analysis yet.)

Binky HCP Are Not Simple Translations

The Binky HCP (bHCP) values are very rough translations from Binky points to standard HCP, using the idea that an extra HCP point is worth about 1/3 a trick in a suit contract and about 1/2 a trick in notrump. You need more points in Binky to make grand slam in a suit than you do in notrump because of this. In particular, if Binky HCP NT is less than Binky HCP Suit, that does not mean the hand plays better in a suit contract. There is no contradiction in the results for the above hand.

Specifically, on average you and partner are dealt 20 points together (in notrump HCP values) and take on average 6 tricks in notrump. If additional points are worth 1/2 a trick, then 34 points is worth 13 tricks. So if you have N Binky Notrump Points, the formula for computing a roughly equivalent HCP value would be to take (N-3)*2+10=2N+4 points.

On the other hand, in suit contracts, you tend to have more than 20 HCP between you and partner, since on average you have some shortness. (TODO: Look up this number) The average number of tricks you and partner make in a suit contract is about 8 1/3. Each additional point is worth and 1/3 trick. So to get to 13 tricks, you again need 14 extra points to make. But this time, you need to add that 14 to a higher baseline.

Binky Points Are Statistical and Simple By Design

Binky Points are designed to follow a common simple pattern for hand-evaluation - take a value for the hand pattern, then add the values for each suit holding. This is a very common hand evaluation technique - virtually all evaluators follow it - some of them don't even have a pattern valuation, simply evaluating the pattern per holding. I call evaluators which fit this pattern a "shape-adjusted holding evaluator," or SAHE for short here. These evaluators have nice features that make the statistical problems with them essentially linear, while more complicated evaluators become non-linear.

SAHEs have an obvious problem: In a 2-3-1-7 hand, is the value of ♠Ax really the same as the value of ♠Ax in a 2-4-4-3 hand? Remember, we have already counted the playing value of the shape, but it is quite possibly the case that in 2-3-1-7 the ♠Ax adds a different value than it does in 2-4-4-3.

Similarly, there is nothing in an SAHE that lets you consider holdings across suits - what is the value of holding a control in every suit in notrump? SAHEs cannot capture the value of that.

A typical example of the problem with SAHEs is the "aces and spaces" hand type. Aces are valuable both because they take a trick and they give you a tempo to establish your tricks. In the "aces and spaces" hand, however, you have no suit to establish, unless partner can provide a suit. I've found that Binky Points over-estimates the value of the "aces and spaces" hand type by about 1/2 of a bHCP in both suits and notrump (that is 1/6 a trick in suits, 1/4 a trick in notrump.)

These are tricky problems to solve, which is why almost no hand evaluator tries to solve it - virtually every evaluator commonly discussed is an SAHE.

In particular, statistics have a hard time solving for non-linear factors like this. I've done a little bit of exploring of the difference of values of holdings in different shapes, and it is there, but it is relatively small - often the difference is about the value of upgrading a nine to a ten.

Finally, because Binky Points are statistical, we don't care how effective they are on rare hands. We might know that the above hand is always going to take 13 tricks in notrump, but if you design a SAHE to work on extreme hands that are very rare, then you are going to get a useless evaluator. (As noted, we actually do get this question essentially correct, but the point is, even if we did not, it would not be a big thing.)

Indeed, it is precisely when it is "obvious" how many tricks we are going to take by looking at our own hand alone that a numerical hand evaluation technique is least useful. You want a hand evaluator to help you with the borderline and common cases, not the easy and rare ones.

Binky Points were not defined with any definition of "obvious," of course, but by its statistical nature, it discounts rare examples, and the "obvious" hands are also very rare.

We can always find examples where evaluators fail. The question is, do they fail on every-day deals? Is the Law of Total Tricks bollacks because we can find examples where it fails spectacularly? The LOTT might be useful or not, but individual cases do not prove either way. It is fundamentally a failure of analysis to try to argue that way.

Binky Points Walrus

If a Walrus bidder used Binky Points instead of High Card Points, he'd still get to the right contract, 7NT, on the poster's hand.

On the other hamd, will a HCP Walrus get to the right contract dealt AKQJxxx-AK-AK-A? A HCP Walrus says, "28 points, certainly I have game." A Binky Points Walrus sees 11.27 tricks in suit contracts and 12.27 tricks in notrump. It's true that this still isn't "accurate," but it is much more accurate than a pure HCP notrump evaluation. (Suit evaluation, of course, depends on how you count shortness.)

Now, this is not really import. Although this hand type is 35 times more likely than the poster's hand, it is still exceedingly rare. We really don't care.

And we also don't really care about what a Walrus does. Real world bidding is, by its nature, a re-evaluation process, and Binky Points (so far) only deal with the initial evaluation. They are a beginning, not an end.

Caveats and Conclusions

I do not know about the efficacy of Binky Points in real play, and I am always careful to say so. Other people ascribe more weight to them than I do. I even chose the name "Binky" because it was a silly nickname. However, if you are going to argue against them, at least make a modest attempt to understand what you are reading, rather than make a knee-jerk mockery.

I take no view official view on the main topic of that BBO discussion. My gut level says that a 5-3-1 count is probably not very good, but that is my gut. I have no data either way, particularly not for real play.

Side note: Evaluating Axxxx-Axxx-Axx-A

The person who said that the value of the aces in Axxxx-Axxx-Axx-A are all different actually captures Binky Points with that comment. In Binky Points, I'd also add that the values are different in suit than in notrump.

It's a little hard to define "what is the value of this ace?" But we can say, "What is the value of Axxxx versus the value of xxxxx."

In notrump, Axxxx is worth 2.23 tricks more than xxxxx, Axxx is worth 2.27 tricks more than xxxx, Axx is worth 2.32tricks more than xxx, and A is worth 2.13 tricks than a small singleton. That is quite "regular." At 1/2 trick per point, this makes the aces worth (very roughly) 4.5 points in the longer suits, and 4.25 tricks in the singleton.

In suits, Axxxx is worth 1.59 tricks versus nothing, Axxx is worth 1.59, and Axx is worth 1.61, while A is worth only 1.33 versus the singleton. This means roughly that the aces are worth closer to 5 points in suit contracts, except for the stiff, which is worth only 4 points.

So we see that, for the most part, an ace is an ace is an ace in the longer suits, when not supported by other high cards. The value of the stiff ace is the interesting one. If you accept that a HCP is about 1/3 a trick in suit contracts, then the value of the singleton ace is almost a full point lower than the other aces in suit contracts. In notrump, a point is about 1/2 a trick, so the difference is only about a half of a HCP.

Note, this does not tell us the value of an ace. It only tells us the value of an Axxxx versus xxxxx. The value of an ace also comes in the value of AQxxx versus Qxxxx, etc. Indeed, Binky Points do not care about the values of individual honors, really - Binky Points are computed by evaluating the entire holding. So we can compare Axxxx versus KQxxx (almost the same in suit contracts, the latter is worth about 1/10th of a point more in notrump.) We have a harder time comparing the value of Ax and Axxxx, because the sets of hand patterns are so different. (We can compute such a difference, it is just not clear what the value of such a difference would be.)

Consider the differences:

	Tricks Diff.
	NT	Suit
JTX vs QTX	0.58	0.31
JTXX vs QTXX	0.45	0.30
JTXXX vs QTXXX	0.43	0.28
JTXXXX vs QTXXXX	0.42	0.25

Consider what QTx gives you in notrump that JTx does not. QTx has some chance of being a stop all by itself, so we get a rather large difference between QTx and JTx in notrump compared to the value of turning the jack to the queen in the longer suits.