# Bet Sizing Is Not Intuitive

Humans are not good bettors.

It takes effort both in study and practice to become more proficient. But like anything hard, most people won’t persevere. Devoting some cycles to improve will arm you with a rare arrow in your quiver as you go through life.

Skilled betting demands 2 pivotal actions:

1. Identifying attractive propositions

This can be coded as “positive expected value” or “good risk/reward”. There is no strategy that turns a bad proposition into an attractive one on its own merit (as opposed to something like buying insurance which is a bad deal in isolation but can make sense holistically). For example, there is no roulette betting strategy that magically turns its negative EV trials into a positive EV session.

2. Effective bet sizing

Once you are faced with an attractive proposition, how much do you bet? While this is also a big topic we can make a simple assertion — bad bet sizing is enough to ruin a great proposition. This is a deeper point than it appears. By sizing a bet poorly, you can fumble away a certain win. You cannot afford to get bet sizing dramatically wrong.

Of these 2 points, the second one is less appreciated. Bet sizing is not very intuitive.

To show that, we will examine a surprising study.

### The Haghani-Dewey Biased Coin Study

In October 2016, Richard Dewey and Victor Haghani (of LTCM infamy) published a study titled:

Observed Betting Patterns on a Biased Coin (Editorial from the Journal of Portfolio Management)

The study is a dazzling illustration of how poor our intuition is for proper bet sizing. The link goes into depth about the study. I will provide a condensed version by weaving my own thoughts with excerpts from the editorial.

The setup

• 61 individuals start with $25 each. They can play a computer game where they can bet any proportion of their bankroll on a coin. They can choose heads or tails. They are told the coin has a 60% chance of landing heads. The bet pays even money (i.e. if you bet$1, you either win or lose $1). They get 30 minutes to play. • The sample was largely composed of college-age students in economics and finance and young professionals at financial firms. We had 14 analyst and associate-level employees of two leading asset management firms. Your opportunity to play Before continuing with a description of what an optimal strategy might look like, we ask you to take a few moments to consider what you would do if given the opportunity to play this game. Once you read on, you’ll be afflicted with the curse of knowledge, making it difficult for you to appreciate the perspective of our subjects encountering this game for the first time. If you want to be more hands-on, play the game here. Devising A Strategy 1. The first thing to notice is betting on heads is positive expected value (EV). If X is your wager: EV = 60% (x) – 40% (x) = 20% (x) You expect to earn 20% per coin flip. 2. The next observation is the betting strategy that maximizes your total expected value is to bet 100% of your bankroll on every flip. 3. But then you should notice that this also maximizes your chance of going broke. On any single flip, you have a 40% of losing your stake and being unable to continue this favorable game. 4. What if you bet 50% of your bankroll on every flip? On average you will lose 97% of your wealth (as opposed to nearly 100% chance if you had bet your full bankroll). 97% sounds like a lot! How does that work? If you bet 50% of your bankroll on 100 flips you expect 60 heads and 40 tails. If you make 50% on 60 flips, and lose 50% on 40 flips your expected p/l: 1.560 x .5040 = .033 You will be left with 3% of your starting cash! This is because heads followed by tails, or vice versa, results in a 25% loss of your bankroll (1.5 * 0.5 = 0.75). This is a significant insight on its own. Cutting your bet size dramatically from 100% per toss to 50% per toss left you in a similar position — losing all or nearly all your money. Optimal Strategy There’s no need for build-up. There’s a decent chance any reader of this blog has heard of the Kelly Criterion which uses the probabilities and payoffs of various outcomes to compute an “optimal” bet size. In this case, the computation is straightforward — the optimal bet size as a fraction of the bankroll is 20%, matching the edge you get on the bet. Since the payoff is even money the Kelly formula reduces to 2p -1 where p = probability of winning. 2 x 60% – 1 = 20% The clever formula developed by Bell Labs researcher John Kelly: provides an optimal betting strategy for maximizing the rate of growth of wealth in games with favorable odds, a tool that would appear a good fit for this problem. Dr. Kelly’s paper built upon work first done by Daniel Bernoulli, who resolved the St. Petersburg Paradox— a lottery with an infinite expected payout—by introducing a utility function that the lottery player seeks to maximize. Bernoulli’s work catalyzed the development of utility theory and laid the groundwork for many aspects of modern finance and behavioral economics. The emphasis refers to the assumption that a gambler has a log utility of wealth function. In English, this means the more money you have the less a marginal dollar is worth to you. Mathematically it also means that the magnitude of pain from losing$1 is greater than magnitude of joy from gaining $1. This matches empirical findings for most people. They are “loss-averse”. How did the subjects fare in this game? The paper is blunt: Our subjects did not do very well. Suboptimal betting came in all shapes and sizes: overbetting, underbetting, erratic betting, and betting on tails were just some of the ways a majority of players squandered their chance to take home$250 for 30 minutes play.

Let’s take a look, shall we?

Only 21% of participants reached the maximum payout of $250, well below the 95% that should have reached it given a simple constant percentage betting strategy of anywhere from 10% to 20% • 1/3 of the participants finished will less money than the$25 they started with. (28% went bust entirely!)
• 67% of the participants bet on tails at some point. The authors forgive this somewhat conceding that players might be curious if the tails really are worse, but 48% bet on tails more than 5 times! Many of these bets on tails occurred after streaks of heads suggesting a vulnerability to gambler’s fallacy.
• Betting patterns and debriefings also found prominent use of martingale strategies (doubling down after a loss).
• 30% of participants bet their entire bankroll on one flip, raising their risk of ruin from nearly 0% to 40% in a lucrative game!

Just how lucrative is this game?

Having a trading background, I have an intuitive understanding that this is a very profitable game. If you sling option contracts that can have a $2 range over the course of their life and collect a measly penny of edge, you have razor-thin margins. The business requires trading hundreds of thousands of contracts a week to let the law of averages assure you of profits. A game with a 20% edge is an astounding proposition. Not only did most of our subjects play poorly, they also failed to appreciate the value of the opportunity to play the game. If we had offered the game with no cap [and] assume that a player with agile fingers can put down a bet every 6 seconds, 300 bets would be allowed in the 30 minutes of play. The expected gain of each flip, betting the Kelly fraction, is 4% [Kris clarification: 20% of bankroll times 20% edge]. The expected value of 300 flips is$25 * (1 + 0.04)300 = $3,220,637! In fact, they ran simulations for constant bet fractions of 10%, 15%, and 20% (half Kelly, 3/4 Kelly, full Kelly) and found a 95% probability that the subjects would reach the$250 cap!

Instead, just over 20% of the subjects reached the max payout.

### Editorialized Observations

• Considering how lucrative this game was, the performance of the participants is damning. That nearly one-third risked the entire bankroll is anathema to traders who understand that the #1 rule of trading (assuming you have a positive expectancy business) is survival.

• Only 5 out of the 61 finance-educated participants were familiar with Kelly betting. And 2 out of the 5 didn’t consider using it. A game like this is the context it’s tailor-made for!
• The authors note that the syllabi of MIT, Columbia, Chicago, Stanford, and Chicago MBA programs do not make any reference to betting or Kelly topics in their intro finance, trading, or asset-pricing courses.

• Post-experiment interviews revealed that betting “a constant proportion of wealth” seemed to be a surprisingly unintuitive strategy to participants.

Given that many of our subjects received formal training in finance, we were surprised that the Kelly criterion was virtually unknown among our subjects, nor were they able to bring other tools (e.g., utility theory) to the problem that would also have led them to a heuristic of constant-proportion betting.

These results raise important questions. If a high fraction of quantitatively sophisticated, financially trained individuals have so much difficulty in playing a simple game with a biased coin, what should we expect when it comes to the more complex and long-term task of investing one’s savings? Given the propensity of our subjects to bet on tails (with 48% betting on tails on more than five flips), is it any surprise that people will pay for patently useless advice? What do the results suggest about the prospects for reducing wealth inequality or ensuring the stability of our financial system? Our research suggests that there is a significant gap in the education of young finance and economics students when it comes to the practical application of the
concepts of utility and risk-taking.

Our research will be worth many multiples of the $5,574 winnings we paid out to our 61 subjects if it helps encourage educators to fill this void, either through direct instruction or through trial-and-error exercises like our game. As Ed Thorp remarked to us upon reviewing this experiment, “It ought to become part of the basic education of anyone interested in finance or gambling.” I will add my own concern. It’s not just individual investors we should worry about. Their agents in the form of financial advisors or fund managers, even if they can identify attractive proposition, may undo their efforts by poorly sizing opportunities by either: 1. falling far short of maximizing Since great opportunities are rare, failing to optimize can be more harmful than our intuition suggests…making$50k in a game you should make $3mm is one of the worst financial errors one could make. 2. overbetting an edge There isn’t a price I’d play$100mm Russian Roulette for

Getting these things correct requires proper training. In Can Your Manager Solve Betting Games With Known Solutions?, I wonder if the average professional manager can solve problems with straightforward solutions. Nevermind the complexity of assessing risk/reward and proper sizing in investing, a domain that epitomizes chaotic, adversarial dynamics.

Nassim Taleb was at least partly referring to the importance of investment sizing when he remarked, “If you gave an investor the next day’s news 24 hours in advance, he would go bust in less than a year.”

Furthermore, effective sizing is not just about analytics but discipline. It takes a team culture of truth-seeking and emotional checks to override the biases that we know about. Just knowing about them isn’t enough. The discouraged authors found:

…that without a Kelly-like framework to rely upon, our subjects exhibited a menu of widely documented behavioral biases such as illusion of control, anchoring, overbetting, sunk-cost bias, and gambler’s fallacy.

### Conclusion

Take bet sizing seriously. A bad sizing strategy squanders opportunity. With a little effort, you can get better at maximizing the opportunities you find, rather than needing to keep finding new ones that you risk fumbling.

You need to identify good props and size them well. Both abilities are imperative. It seems most people don’t realize just how critical sizing is.

Now you do.

# Another Kind Of Mean

Let’s use this section to learn a math concept.

We begin with a question:

You drive to the store and back. The store is 50 miles away. You drive 50 mph to the store and 100 mph coming back. What’s your average speed in MPH for the trip?

[Space to think about the problem]

*

*

*

[If you think the answer is 75 there are 2 problems worth pointing out. One of them is you have the wrong answer.]

*

*

*

[The other is that 75 is the obvious gut response, but since I’m asking this question, you should know that’s not the answer. If it’s not the answer that should clue you in to think harder about the question.]

*

*

*

[You’re trying harder, right?]

*

*

*

[Ok, let’s get on with this]

If you drive 50 MPH to a store 50 miles away, then it took 60 minutes to go one way.

If you drive 100 MPH on the way back you will return home in half the time or 30 minutes.

You drove 100 miles in 1.5 hours or 66.67 MPH

Congratulations, you are on the way to learning about another type of average or mean.

You likely already know about 2 of the other so-called Pythagorean means.

• Arithmetic mean

Simple average. Used when trying to find a measure of central tendency in a set of values that are added together.

• Geometric mean

The geometric mean or geometric average is a measure of central tendency for a set of values that are multiplied together. One of the most common examples is compounding. Returns and growth rates are just fractions multiplied together. So if you have 10% growth then 25% growth you compute:

1 x 1.10 x 1.25 = 1.375

If you computed the arithmetic mean of the growth rates you’d get 17.5% (the average of 10% and 25%).

The geometric mean however answers the question “what is the average growth rate I would need to multiply each period by to arrive at the final return of 1.375?”

In this case, there are 2 periods.

To solve we do the inverse of the multiplication by taking the root of the number of periods or 1.375^1/2 – 1 = 17.26%

We can check that 17.26% is in fact the CAGR or compound average growth rate:

1 x 1.1726 * 1.1726 = 1.375

Have a cigar.

The question about speed at the beginning of the post actually calls for using a 3rd type of mean:

The harmonic mean

The harmonic mean is computed by taking the average of the reciprocals of the values, then taking the reciprocal of that number to return to the original units.

That’s wordy. Better to demonstrate the 2 steps:

1. “Take the average of the reciprocals”

Instead of averaging MPH, let’s average hours per mile then convert back to MPH at the end:

50 MPH = “it takes 1/50 of an hour to go a mile” = 1/50 HPM
100 MPH = “it takes 1/100 of an hour to go a mile” = 1/100 HPM

The average of 1/50 HPM and 1/100 HPM = 1.5/100 HPM

2. “Take the reciprocal of that number to return to the original units”

Flip 1.5/100 HPM to 100/1.5 MPH. Voila, 66.67 MPH

Ok, right now you are thinking “Wtf, why is there a mean that deals with reciprocals in the first place?”

If you think about it, all means are computed with numbers that are fractions. You just assume the denominator of the numbers you are averaging is 1. That is fine when each number’s contribution to the final weight is equal, but that’s not the case with an MPH problem. You are spending 2x as much time as the lower speed as the higher speed! This pulls the average speed over the whole trip towards the lower speed. So you get a true average speed of 66.67, not the 75 that your gut gave you.

I want to pause here because you are probably a bit annoyed about this discovery. Don’t be. You have already won half the battle by realizing there is this other type of mean with the weird name “harmonic”.

The other half of the battle is knowing when to apply it. This is trickier. It relies on whether you care about the numerator or denominator of any number. And since every number has a numerator or denominator it feels like you might always want to ask if you should be using the harmonic mean.

I’ll give you a hint that will cover most practical cases. If you are presented with a whole number that is a multiple, but the thing you actually care about is a yield or rate then you should use the harmonic mean. That means you convert to the yield or rate first, find the arithmetic average which is muscle memory for you already, and then convert back to the original units.

Examples:

• When you compute the average speed for an entire trip you actually want to average hours per mile (a rate) rather than the rate expressed as a multiple (mph) before converting back to mph. Again, this is because your periods of time at each speed are not equal.
• You can’t average P/E ratios when trying to get the average P/E for an entire portfolio. Why? Because the contribution of high P/E stocks to the average of the entire portfolio P/E is lower than for lower P/E stocks. If you average P/Es, you will systematically overestimate the portfolio’s total P/E! You need to do the math in earnings yield space (ie E/P). @econompic wrote a great post about this and it’s why I went down the harmonic mean rabbit hole in the first place:

The Case for the Harmonic Mean P/E Calculation (3 min read)

• Consider this example of when MPG is misleading and you actually want to think of GPM. From Percents Are Tricky:

Which saves more fuel?

1. Swapping a 25 mpg car for one that gets 60 mpg
2. Swapping a 10 mpg car for one that gets 20 mpg

[Jeopardy music…]

You know it’s a trap, so the answer must be #2. Here’s why:

If you travel 1,000 miles:

1. A 25mpg car uses 40 gallons. The 60 mpg vehicle uses 16.7 gallons.
2. A 10 mpg car uses 100 gallons. The 20 mpg vehicle uses 50 gallons

Even though you improved the MPG efficiency of car #1 by more than 100%, we save much more fuel by replacing less efficient cars. Go for the low-hanging fruit. The illusion suggests we should switch ratings from MPG to GPM or to avoid decimals Gallons Per 1,000 Miles.

• The Tom Brady “deflategate” controversy also created statistical illusions based on what rate they used. You want to spot anomalies by looking at fumbles per play not plays per fumble.

Why Those Statistics About The Patriots’ Fumbles Are Mostly Junk (14 min read)

The most important takeaway is that whenever you are trying to average a rate, yield, or multiple consider

a) taking the average of the numbers you are presented with

AND

b) doing the same computation with their reciprocals then flipping it back to the original units. That’s all it takes to compute both the arithmetic mean and the harmonic mean.

If you draw the same conclusions about the variable you care about, you’re in the clear.

Just knowing about harmonic means will put you on guard against making poor inferences from data.

For a more comprehensive but still accessible discussion of harmonic means see:

On Average, You’re Using the Wrong Average: Geometric & Harmonic Means in Data Analysis: When the Mean Doesn’t Mean What You Think it Means (20 min read)
by @dnlmc

This post is so good, that I’m not sure if I should have just linked to it and not bothered writing my own. You tell me if I was additive.

# Greeks Are Everywhere

The option greeks everyone starts with are delta and gamma. Delta is the sensitivity of the option price with respect to changes in the underlying. Gamma is the change in that delta with respect to changes in the underlying.

If you have a call option that is 25% out-of-the-money (OTM) and the stock doubles in value, you would observe the option graduating from a low delta (when the option is 25% OTM a 1% change in the stock isn’t going to affect the option much) to having a delta near 100%. Then it moves dollar for dollar with the stock.

If the option’s delta changed from approximately 0 to 100% then gamma is self-evident. The option delta (not just the option price) changed as the stock rallied. Sometimes we can even compute a delta without the help of an option model by reasoning about it from the definition of “delta”. Consider this example from Lessons From The .50 Delta Option where we establish that delta is best thought of as a hedge ratio 1:

Stock is trading for $1. It’s a biotech and tomorrow there is a ruling: • 90% of the time the stock goes to zero • 10% of the time the stock goes to$10

First take note, the stock is correctly priced at $1 based on expected value (.90 x$0 + .10 x $10). So here are my questions. What is the$5 call worth?

• Back to expected value:90% of the time the call expires worthless.10% of the time the call is worth $5 .9 x$0 + .10 x $5 =$.50

The call is worth $.50 Now, what is the delta of the$5 call?

$5 strike call =$.50

Delta = (change in option price) / (change in stock price)

• In the down case, the call goes from $.50 to zero as the stock goes from$1 to zero.Delta = $.50 /$1.00 = .50
• In the up case, the call goes from $.50 to$5 while the stock goes from $1 to$10Delta = $4.50 /$9.00 = .50

The call has a .50 delta

Using The Delta As a Hedge Ratio

Let’s suppose you sell the $5 call to a punter for$.50 and to hedge you buy 50 shares of stock. Each option contract corresponds to a 100 share deliverable.

• Down scenario P/L:Short Call P/L = $.50 x 100 =$50Long Stock P/L = -$1.00 x 50 = -$50

Total P/L = $0 • Up scenario P/L:Short Call P/L = -$4.50 x 100 = -$450Long Stock P/L =$9.00 x 50 = $450 Total P/L =$0

Eureka, it works! If you hedge your option position on a .50 delta your p/l in both cases is zero.

But if you recall, the probability of the $5 call finishing in the money was just 10%. It’s worth restating. In this binary example, the 400% OTM call has a 50% delta despite only having a 10% chance of finishing in the money. ### The Concept of Delta Is Not Limited To Options Futures Futures have deltas too. If the SPX cash index increases by 1%, the SP500 futures go up 1%. They have a delta of 100%. But let’s look closer. The fair value of a future is given by: Future = Seʳᵗ where: S = stock price r = interest rate t = time to expiry in years This formula comes straight from arbitrage pricing theory. If the cash index is trading for$100 and 1-year interest rates are 5% then the future must trade for $105.13 100e^(5% * 1) =$105.13

What if it traded for $103? • Then you buy the future, short the cash index at$100
• Earn $5.13 interest on the$100 you collect when you short the stocks in the index.
• For simplicity imagine the index doesn’t move all year. It doesn’t matter if it did move since your market risk is hedged — you are short the index in the cash market and long the index via futures.
• At expiration, your short stock position washes with the expiring future which will have decayed to par with the index or $100. • [Warning: don’t trade this at home. I’m handwaving details. Operationally, the pricing is more intricate but conceptually it works just like this.] • P/L computation:You lost$3 on your futures position (bought for $103 and sold at$100).
You broke even on the cash index (shorted and bought for $100) You earned$5.13 in interest

Net P/L: $2.13 of riskless profit! You can walk through the example of selling an overpriced future and buying the cash index. The point is to recognize that the future must be priced as Seʳᵗ to ensure no arbitrage. That’s the definition of fair value. You may have noticed that a future must have several greeks. Let’s list them: • Theta: the future decays as time passes. If it was a 1-day future it would only incorporate a single day’s interest in its fair value. In our example, the future was$103 and decayed to $100 over the course of the year as the index was unchanged. The daily theta is exactly worth 1 day’s interest. • Rho: The future’s fair value changes with interest rates. If the rate was 6% the future would be worth$106.18. So the future has $1.05 of sensitivity per 100 bps change in rates. • Delta: Yes the future even has a delta with respect to the underlying! Imagine the index doubled from$100 to $200. The new future fair value assuming 5% interest rates would be$210.25.Invoking “rise over run” from middle school:delta = change in future / change in index
delta = (210.25 – 105.13)/ (200 – 100)
delta = 105%

That holds for small moves too. If the index increases by 1%, the future increases by 1.05%

• Gamma: 0. There is no gamma. The delta doesn’t change as the stock moves.

Levered ETFs

Levered and inverse ETFs have both delta and gamma! My latest post dives into how we compute them.

✍️The Gamma Of Levered ETFs (8 min read)

This is an evergreen reference that includes:

• the mechanics of levered ETFs
• a simple and elegant expression for their gamma
• an explanation of the asymmetry between long and short ETFs
• insight into why shorting is especially difficult
• the application of gamma to real-world trading strategies
• a warning about levered ETFs
• an appendix that shows how to use deltas to combine related instruments

And here’s some extra fun since I mentioned the challenge of short positions:

Bonds

Bonds have delta and gamma. They are called “duration” and “convexity”. The duration is the sensitivity to the bond price with respect to interest rates. Borrowing from my older post Where Does Convexity Come From?:

Consider the present value of a note with the following terms:

Face value: $1000 Coupon: 5% Schedule: Semi-Annual Maturity: 10 years Suppose you buy the bond when prevailing interest rates are 5%. If interest rates go to 0, you will make a 68% return. If interest rates blow out to 10% you will only lose 32%. It turns out then as interest rates fall, you actually make money at an increasing rate. As rates rise, you lose money at a decreasing rate. So again, your delta with respect to interest rate changes. In bond world, the equivalent of delta is duration. It’s the answer to the question “how much does my bond change in value for a 1% change in rates?” So where does the curvature in bond payoff come from? The fact that the bond duration changes as interest rates change. This is reminiscent of how the option call delta changed as the stock price rallied. The red line shows the bond duration when yields are 10%. But as interest rates fall we can see the bond duration increases, making the bonds even more sensitive to rates decline. The payoff curvature is a product of your position becoming increasingly sensitive to rates. Again, contrast with stocks where your position sensitivity to the price stays constant. Corporations Companies have all kinds of greeks. A company at the seed stage is pure optionality. Its value is pure extrinsic premium to its assets (or book value). In fact, you can think of any corporation as the premium of the zero strike call. [See a fuller discussion of the Merton model on Lily’s Substack which is a must-follow. We talk about similar stuff but she’s a genius and I’m just old.] Oil drillers are an easy example. If a driller can pull oil out of the ground at a cost of$50 a barrel but oil is trading for $25 it has the option to not drill. The company has theta in the form of cash burn but it still has value because oil could shoot higher than$50 one day. The oil company’s profits will be highly levered to the oil price. With oil bouncing around $20-$30 the stock has a small delta, if oil is $75, the stock will have a high delta. This implies the presence of gamma since the delta is changing. Games One of the reasons I like boardgames is they are filled with greeks. There are underlying economic or mathematical sensitivities that are obscured by a theme. Chess has a thin veneer of a war theme stretched over its abstraction. Other games like Settlers of Catan or Bohnanza (a trading game hiding under a bean farming theme) have more pronounced stories but as with any game, when you sit down you are trying to reduce the game to its hidden abstractions and mechanics. The objective is to use the least resources (whether those are turns/actions, physical resources, money, etc) to maximize the value of your decisions. Mapping those values to a strategy to satisfy the win conditions is similar to investing or building a successful business as an entrepreneur. You allocate constrained resources to generate the highest return, best-risk adjusted return, smallest loss…whatever your objective is. Games have mine a variety of mechanics (awesome list here) just as there are many types of business models. Both game mechanics and business models ebb and flow in popularity. With games, it’s often just chasing the fashion of a recent hit that has captivated the nerds. With businesses, the popularity of models will oscillate (or be born) in the context of new technology or legal environments. In both business and games, you are constructing mental accounting frameworks to understand how a dollar or point flows through the system. On the surface, Monopoly is about real estate, but un-skinned it’s a dice game with expected values that derive from probabilities of landing on certain spaces times the payoffs associated with the spaces. The highest value properties in this accounting system are the orange properties (ie Tennessee Ave) and red properties (ie Kentucky). Why? Because the jail space is a sink in an “attractor landscape” while the rents are high enough to kneecap opponents. Throw in cards like “advance to nearest utility”, “advance to St. Charles Place”, and “Illinois Ave” and the chance to land on those spaces over the course of a game more than offsets the Boardwalk haymaker even with the Boardwalk card in the deck. In deck-building games like Dominion, you are reducing the problem to “create a high-velocity deck of synergistic combos”. Until you recognize this, the opponent who burns their single coin cards looks like a kamikaze pilot. But as the game progresses, the compounding effects of the short, efficient deck creates runaway value. You will give up before the game is over, eager to start again with X-ray vision to see through the theme and into the underlying greeks. [If the link between games and business raises an antenna, you have to listen to Reid Hoffman explain it to Tyler Cowen!] Wrapping Up Option greeks are just an instance of a wider concept — sensitivity to one variable as we hold the rest constant. Being tuned to estimating greeks in business and life is a useful lens for comprehending “how does this work?”. Armed with that knowledge, you can create dashboards that measure the KPIs in whatever you care about, reason about multi-order effects, and serve the ultimate purpose — make better decisions. # The Gamma Of Levered ETFs Levered ETFs use derivatives to amplify the return of an underlying index. Here’s a list of 2x levered ETFs. For example, QLD gives you 2x the return of QQQ (Nasdaq 100). Levered ETFs use derivatives to get the levered exposure. In this post, we will compute the delta and gamma of levered ETFs and what that means for investors and traders. ## Levered ETF Delta In options, delta is the sensitivity of the option premium to a change in the underlying stock. If you own a 50% delta call and the stock price goes up by$1, you make $.50. If the stock went down$1, you lost $.50. Delta, generally speaking, is a rate of change of p/l with respect to how some asset moves. I like to say it’s the slope of your p/l based on how the reference asset changes. For levered ETFs, the delta is simply the leverage factor. If you buy QLD, the 2x version of QQQ, you get 2x the return of QQQ. So if QQQ is up 1%, you earn 2%. If QQQ is down 1%, you lose 2%. If you invest$1,000 in QLD your p/l acts as if you had invested $2,000.$100 worth of QLD is the equivalent exposure of $200 of QQQ. Your dollar delta is$200 with respect to QQQ. If QQQ goes up 1%, you make 1% * $200 QQQ deltas =$2

The extra exposure cuts both ways. On down days, you will lose 2x what the underlying QQQ index returns.

The takeaway is that your position or delta is 2x the underlying exposure.

Dollar delta of levered ETF = Exposure x Leverage Factor

In this case, QLD dollar delta is $200 ($100 x 2).

Note that QLD is a derivative with a QQQ underlyer.

## Levered ETF Gamma

QLD is a derivative because it “derives” its value from QQQ. $100 exposure to QLD represents a$200 exposure to QQQ. In practice, the ETF’s manager offers this levered exposure by engaging in a swap with a bank that guarantees the ETF’s assets will return the underlying index times the leverage factor. For the bank to offer such a swap, it must be able to manufacture that return in its own portfolio. So in the case of QLD, the bank simply buys 2x notional the NAV of QLD so that its delta or slope of p/l matches the ETFs promise.

So if the ETF has a NAV of $1B, the bank must maintain exposure of$2B QQQ deltas. That way, if QQQ goes up 10%, the bank makes $200mm which it contributes to the ETF’s assets so the new NAV would be$1.2B.

Notice what happened:

• QQQ rallied 10% (the reference index)
• QLD rallies 20% (the levered ETF’s NAV goes from$1B –>$1.2B)
• The bank’s initial QQQ delta of $2B has increased to$2.2B.

Uh oh.

To continue delivering 2x returns, the bank’s delta needs to be 2x the ETF’s assets or $2.4B, but it’s only$2.2B! The bank must buy $200M worth of QQQ deltas (either via QQQs, Nasdaq futures, or the basket of stocks). If we recall from options, gamma is the change in delta due to a change in stock price. The bank’s delta went from 2 (ie$2B/$1B) to 1.833 ($2.2B/$1.2B). So it got shorter deltas, in a rising market –> negative gamma! The bank must dynamically rebalance its delta each day to maintain a delta of 2x the ETF’s assets. And the adjustment means it must buy deltas at the close of an up day in the market or sell deltas at the close of a down day. Levered ETFs, therefore, amplify price moves. The larger the daily move, the larger the rebalancing trades need to be! I’ve covered this before in Levered ETF/ETN tool, where I give you this spreadsheet to compute the rebalancing trades: ## From Brute Force To Symbols There was confusion on Twitter about how levered ETFs worked recently and professor @quantian stepped up: Junior PM interview question: An X-times leveraged fund tracks an underlying asset S. After time T, S have moved to ST = (1+dS)S0. The initial delta is of course X. What is the portfolio gamma, defined as (dDelta)/(dS), as a function of X? Despite correctly understanding how levered and inverse ETFs work I struggled to answer this question with a general solution (ie convert the computations we brute-forced above into math symbols). It turns out the solution is a short expression and worth deriving to find an elegant insight. @quantian responded to my difficulty with the derivation. I’ll walk you through that slowly. Mapping variables to @quantian’s question: • NAV =1 You are investing in a levered ETF that starts with a NAV of 1 • X = The leverage factor The bank needs to have a delta of X to deliver the levered exposure. For a 2x ETF, the bank’s initial delta will be 2 * NAV = 2 • S = the underlying reference index The dynamic: • When S moves, the bank’s delta will no longer be exactly X times the NAV. Its delta changed as S changed. That’s the definition of gamma. • When S moves, the bank needs to rebalance (buy or sell) units of S to maintain the desired delta of X. The rebalancing amount is therefore the change in delta or gamma. Let’s find the general formula for the gamma (ie change in delta) in terms of X. Remember X is the leverage factor and therefore the bank’s desired delta. The general formula for the gamma as a function of the change in the underlying index is, therefore: X (X – 1) where X = leverage factor Intuition There are 2 key insights when we look at this elegant expression: 1. The gamma, or imbalance in delta due to the move, is proportional to the square of the leverage factor. The more levered the ETF, the larger the delta adjustment required. If there was no leverage (like SPY to the SPX index), the gamma is 0 because 0 (0-1) = 0 2. The asymmetry of inverse ETFs — they require larger rebalances for the same size move! Imagine a simple inverse ETF with no leverage. -1 (-1 – 1) = 2 A simple inverse ETF, has the same gamma as a double long ETF. Consider how a double short ETF has a gamma of 6!: -2 (-2 -1) = 6 When I admit that I had only figured out the rebalancing quantities by working out the mechanics by brute force in Excel, @quantian had a neat observation: I originally found this by doing the brute force Excel approach! Then I plotted it and was like “hm, that’s just a parabola, I bet I could simplify this” X2– X shows us that the gamma of an inverse ETF is equivalent to the gamma of its counterpart long of one degree higher. For example, a triple-short ETF has the same gamma as a 4x long. Or a simple inverse ETF has the gamma of a double long. The fact that a 1x inverse ETF has gamma at all is a clue to the difficulty of running a short book…when you win, your position size shrinks and the effect is compounded by the fact that your position is shrinking even faster relative to your growing AUM as your shorts profit! I’ve explained this asymmetry before in The difficulty with shorting and inverse positions as well as the asymmetry of redemptions: • As the reference asset rallies, position size gets bigger and AUM drops due to losses. As reference asset falls, position size shrinks while AUM increase due to profits. • Redemptions can stabilize rebalance requirements in declines and exacerbate rebalance quantities in rallies as redemptions reduce shares outstanding and in turn AUM while in both cases triggering the fund’s need to buy the reference asset which again is stabilizing after declines but not after rallies. In other words, profit-taking is stabilizing while puking is de-stabilizing. Rebalancing In Real Life The amount of the rebalance from our derivation is: X(1 + X ΔS) – X (1+ ΔS) where: X = leverage factor ΔS = percent change in underlying index Another way to write that is: X (X-1) (ΔS) In our example, 2 * (2-1) * 10% =$.2 or an imbalance of 20% of the original NAV!

In practice, the size of the rebalance trade is of practical use. If an index is up or down a lot as you approach the end of a trading day then you can expect flows that exacerbate the move as levered ETFs must buy on up days and sell on down days to rebalance. It doesn’t matter if the ETF is long or inverse, the imbalance is always destabilizing in that it trades in the same direction as the move. The size of flows depends on how much AUM levered ETFs are holding but they can possibly be mitigated by profit-taking redemptions.

During the GFC, levered financial ETFs had large rebalance trades amidst all the volatility in bank stocks. Estimating, frontrunning, and trading against the rebalance to close was a popular game for traders who understood this dynamic. Years later levered mining ETFs saw similar behavior as precious metals came in focus in the aftermath of GFC stimulus. Levered energy ETFs, both in oil and natural gas, have ebbed and flowed in popularity. When they are in vogue, you can try to estimate the closing buy/sell imbalances that accompany highly volatile days.

Warning Label

Levered ETFs are trading tools that are not suitable for investing. They do a good job of matching the levered return of an underlying index intraday. The sum of all the negative gamma trading is expensive as the mechanical re-balancing gets front-run and “arbed” by traders. This creates significant drag on the levered ETF’s assets. In fact, if the borrowing costs to short levered ETFs were not punitive, a popular strategy would be to short both the long and short versions of the same ETF, allowing the neutral arbitrageur to harvest both the expense ratios and negative gamma costs from tracking the index!

ETFs such as USO or VXX which hold futures are famous for bleeding over time. That blood comes from periods when the underlying futures term structure is in contango and the corresponding negative “roll” returns (Campbell has a timeless paper decomposing spot and roll returns titled Deconstructing Futures Returns: The Role of Roll Yield). This is a separate issue from the negative gamma effect of levered or inverse ETFs.

Some ETFs combine all the misery into one simple ticker. SCO is a 2x-levered, inverse ETF referencing oil futures. These do not belong in buy-and-hold portfolios. Meth heads only please.

[The amount of variance drag that comes from levered ETFs depends on the path which makes the options especially tricky. I don’t explain how to price options on levered ETFs but this post is a clue to complication — Path: How Compounding Alters Return Distributions]

### Key Takeaways

• Levered ETFs are derivatives. Their delta changes as the underlying index moves. This change in delta is the definition of gamma.

• Levered and inverse ETFs have “negative gamma” in that they must always rebalance in a destabilizing manner — in the direction of the underlying move.
• The required rebalance in terms of the fund’s NAV is:

X (X-1) (ΔS)

• The size of the rebalance is proportional to the square of the leverage factor. The higher the leverage factor the larger the rebalance. For a given leverage factor, inverse ETFs have larger gammas.

• The drag that comes from levered ETFs means they will fail to track the desired exposure on long horizons. They are better suited to trading or short-term risk management.

### Appendix: Using Delta To Summarize Exposures

We can see that delta is not limited to options, but is a useful way to denote exposures in derivatives generally. It allows you to sum deltas that reference the same underlying to compute a net exposure to that underlying.

Consider a portfolio:

• Short 2000 shares of QQQ
• Long 1000 shares of QLD
• Long 50 1 month 53% delta calls

By transforming exposures into deltas then collapsing them into a single number we can answer the question, “what’s my p/l if QQQ goes up 1%?”

We want to know the slope of our portfolio vis a vis QQQ.

A few observations:

• I computed net returns for the portfolio based on the gross (absolute value of exposures)
• The option exposure is just the premium, but what we really care about is the delta coming from the options. Even though the total premium is <$37k, the largest delta is coming from the options position. # Moontower on Gamma The first option greek people learn after delta is gamma. Recall that delta represents how much an option’s price changes with respect to share price. That makes it a convenient hedge ratio. It tells you the share equivalent position of your option position. So if an option has a .50 delta, its price changes by$.50 for a $1.00 change in the stock price. Calls have positive deltas and puts have negative deltas (ie puts go down in value as the stock price increases). If you are long a .50 delta call option and want to be hedged, you must be short 50 shares of the stock (options refer to 100 shares of underlying stock). For small moves in the stock, your call and share position p/l’s will offset because you are “delta neutral”. This is true for small moves only. “Small” is a bit wishy-washy because small depends on volatility and this post is staying away from that much complexity. Instead, we want to focus on how your delta changes as the stock moves. This is vital because if our option delta changes then your equivalent share position changes. If your position size changes, then that same$1 move in the stock leads means your p/l changes are not constant for every $1 change. If I’m long 50 shares of a stock, I make the same amount of money for each$1 change. But if I’m long 50 shares equivalent by owning a .50 delta option, then as the stock increases my delta increases as the option becomes more in-the-money. That means the next $1 change in the stock, produces$60 of p/l instead of just $50. We know that deep in-the-money options have a 1.00 delta meaning they act just like the stock (imagine a 10 strike call expiring tomorrow when the stock is trading for$40. The option price and stock price will move perfectly in lockstep. The option has 100% sensitivity to the change).

A call option can go from .50 delta to 1.00 delta. Gamma is the change in delta for the change in stock. Suppose you own a .50 delta call and the stock goes up by $1. The call is solidly in-the-money and perhaps its new delta is .60. That change in delta from .50 to .60 for a$1 move is known as gamma. In this case, we say the option has .10 gamma per $1. So if the stock goes up$1, the delta goes up by .10.

While this is mechanically straightforward, some of the lingo around gamma is confusing. People spout phrases like “a squared term”, “curvature”, “convexity”. I’ve written about what convexity is and isn’t because I’ve seen it trip up people who should know better. See Where Does Convexity Come From?. In this post, we will demystify the relationship of these words to “gamma”. In the process, you will deeply improve your understanding of options’ non-linear nature.

How the post is laid out:

Explanations

• Acceleration
• The squared aspect of gamma
• Dollar gamma

Applications

• Constant gamma
• Strikeless products
• How gamma scales with price and volatility
• Gamma weighting relative value trades

# Explanations

## Acceleration

You already understand “curvature”. I’ll prove it to you.

You wake up tomorrow morning and see a bizarre invention in your driveway. An automobile with an unrivaled top speed.  You take it on an abandoned road to test it out. Weirdly, it accelerates slowly for a racecar. Conveniently for me, it makes the charts I’m about to show you easy to read.

You are traveling at 60 mph.

Imagine 2 scenarios:

1. You maintain that constant speed.
2. You accelerate such that after 1 minute you are now traveling at 80 mph. Assume your acceleration is smooth. That means over the 60 seconds it takes to reach 80 mph, your speed increases equally every second. So after 3 seconds, you are traveling 61 mph, at 6 seconds you are moving 62 mph. Eventually at 60 seconds, you are traveling 80 mph.

Graphically:

In the acceleration case, what was your average speed or velocity during that minute?

Since the acceleration was smooth, the answer is 70 mph.

How far did you travel in each case?

Constant velocity:

Accelerate at 20mph per minute:

If the acceleration is smooth, we can take the average velocity over the duration and multiply it by the duration to compute the distance traveled.

Let’s now continue accelerating this supercar by a marginal 20mph rate for the next 15 minutes and see how far we travel. Compare this to a vehicle that maintains 60 mph for the whole trip. The table uses the same logic — the average speed for the last minute assumes a constant acceleration rate.

Let’s zoom in on the cumulative distance traveled at each minute:

We found it! Curvature.

Curvature is the adjustment to the linear estimate of distance traveled that we would have presumed if we assumed our initial speed was constant. Let’s map this analogy to options.

• Time –> stock price

How much time has elapsed from T₀ maps to “how far has the stock moved from our entry?”

• Velocity –> delta

Delta is the instantaneous slope of the p/l with respect to stock price, just as velocity is the instantaneous speed of the car.

• Acceleration –> gamma

Acceleration is the change in our velocity just as gamma is the change in delta.

• Cumulative distance traveled –> cumulative p/l

Distance = velocity x time. Since the velocity changes, multiply the average velocity by time. In this case, we can double-check our answer by looking at the table. We traveled 52.5 miles in 15 minutes or 210 mph on average. That corresponds to our speed at the midpoint of the journey — minute 8 out of 15.
P/l = average position size x change in stock price. Just as our speed was changing, our position size was changing!

Delta is the slope of your p/l. That’s how I think about position sizes. Convexity is non-linear p/l that results from your position size varying. Gamma mechanically alters your position size as the stock moves around.

The calculus that people associate with options is simply the continuous expression of these same ideas. We just worked through them step-wise, minute by minute taking discrete averages for discrete periods.

## Intuition For the Squared Aspect Of Gamma

Delta is familiar to everyone because it exists in all linear instruments. A stock is a linear instrument. If you own 100 shares and it goes up $1, you make$100. If it goes up $10, you make$1,000. The position size is weighted by 1.00 delta (in fact bank desks that trade ETFs and stocks without options are known as “Delta 1 desks”).  Since you just multiply by 1, the position size is the delta. If you’re long 1,000 shares of BP, I say “you’re long 1,000 BP deltas”. This allows you to combine share positions and option positions with a common language. If any of the deltas come from options that’s critical information since we know gamma will change the delta as the stock moves.

If your 1,000 BP deltas come from:

500 shares of stock

+

10 .50 delta calls

that’s important to know. Still, for a quick summary of your position you often just want to know your net delta just to have an idea of what your p/l will be for small moves.

If you have options, that delta will not predict your p/l accurately for larger moves. We saw that acceleration curved the total distance traveled. The longer you travel the larger the “curvature adjustment” from a linear extrapolation of the initial speed. Likewise, the gamma from options will curve your p/l from your initial net delta, and that curvature grows the further the stock moves.

If you have 1,000 BP deltas all coming from shares, estimating p/l for a $2 rally is easy — you expect to make$2,000.

What if your 1,000 BP deltas all come from options? We need to estimate a non-linear p/l because we have gamma.

Let’s take an example from the OIC calculator.

The stock is $28.35 This is the 28.5 strike call with 23 days to expiry. It’s basically at-the-money. It has a .50 delta and .12 of gamma. Let’s accept the call value of$1.28 as fair value.

Here’s the setup:

Initial position = 20 call options.

• Delta  =  1,000

.50 x 20 contracts x 100 share multiplier

• Gamma =  240

.12 x 20 contracts x 100 share multiplier

(the other greeks are not in focus for this post)

The greeks describe your exposures. If you simply owned 1,000 shares of BP you know the slope of your p/l per $1 move…it’s$1,000. That slope won’t change.

But what about this option exposure? What happens if the stock increases by $1, what is your new delta and what is your p/l? After$1 rally:

• New delta = 1,240 deltas

.62 x 20 contracts x 100 share multiplier

Remember that gamma is the change in delta per $1 move. That tells us if the stock goes up$1, this call will increase .12 deltas, taking it from a .50 delta call to a .62 delta call.

That’s fun. As the stock went up, your share equivalent position went from 1,000 to 1,240.

Can you see how to compute your p/l by analogizing from the accelerating car example?

[It’s worth trying on your own before continuing]

### Computing P/L When You Have Gamma

(It’s ok to assume gamma is smooth over this move just as we said the acceleration was smooth for the car.)

Your average delta over the move = 1,120

1,120 x $1 =$1,120

You earned an extra $120 vs a basic share position for the same$1 move. That $120 of extra profit is curvature from a simple extrapolation of delta p/l. Since that curvature is due to gamma it’s best to decompose the p/l into a delta portion and a gamma portion. • The delta portion is the linear estimate of p/l = initial delta of 1,000 x$1 = $1,000 • The gamma portion of the p/l is the same computation as the acceleration example: Your gamma represents the change in delta over the whole move. That’s 240 deltas of change per$1. So on average, your delta was higher by 120 over the move. So we scale the gamma by the move size and divide by 2. That represents our average change in delta which we multiply by the move size to compute a “gamma p/l”.

where:

Γ = position weighted gamma = gamma per contract  x  qty of contracts  x  100 multiplier

△S = change in stock price

We can re-write this to make the non-linearity obvious — gamma p/l is proportional to the square of the stock move!

## Generalizing Gamma: Dollar Gamma

In investing, we normally don’t speak about our delta or equivalent share position. If I own 1,000 shares of a $500 stock that is very different than 1,000 shares of a$20 stock. Instead, we speak about dollar notional. Those would be $500,000 vs$20,000 respectively. Dollar notional or gross exposures are common ways to denote position size. Option and derivative traders do the same thing. Instead of just referring to their delta or share equivalent position, they refer to their “dollar delta”. It’s identical to dollar notional, but preserves the “delta” vocabulary.

It is natural to compute a “delta 1%” which describes our p/l per 1% move in the underlying.

For the BP example:

• Initial dollar delta = delta x stock price = 1,000 x $28.35 =$28,350 dollar deltas
• Δ1% = $28,350/100 =$283.50

You earn $283.50 for every 1% BP goes up. Gamma has analogous concepts. Thus far we have defined gamma in the way option models define it — change in delta per$1 move. We want to generalize gamma calculations to also deal in percentages. Let’s derive dollar gamma continuing with the BP example.

1. Gamma 1%

Gamma per $1 = 240 Of course, a$1 move in BP is over 3.5% ($1/$28.35). To scale this to “gamma per 1%” we multiply the gamma by 28.35/100 which is intuitive.

Gamma 1% = 240 * .2835 = 68.04

So for a 1% increase in BP, your delta gets longer by 68.04 shares.

2. Dollar gamma

Converting gamma 1% to dollar gamma is simple. Just multiply by the share price.

By substituting for gamma 1% from the above step, we arrive at the classic dollar gamma formula:

Let’s use BP numbers.

$Gamma = 240 * 28.35² / 100 =$1,929

The interpretation:

A 1% rally in BP, leads to an increase of 1,929 notional dollars of BP due to gamma.

Instead of speaking of how much our delta (equivalent share position) changes, you can multiply dollar gamma by percent changes to compute changes in our dollar delta.

### Generalizing Gamma P/L For Percent Changes

In this section, we will estimate gamma p/l for percent changes instead of $1 changes. Let’s look at 2 ways. The Accelerating Car Method The logic flows as follows (again, using the BP example): • If a 1% rally leads to an increase of$1,929 of BP exposure then, assuming gamma is smooth, a 3.5% rally (or $1) will lead to an increase of$6,751 of BP length because 3.5%/1% * $1,929 • Therefore the average length over the move is$3,375 (ie .5 * $6,751) due to gamma •$3,375 * 3.5% = $118 (This is very close to the$120 estimate we computed with the original gamma p/l formula. This makes sense since we followed the same logci…multiply the average position size due to gamma times the move size.)

The Algebraic Method

We can adapt the original gamma p/l formula for percent changes.

We start with a simple identity. To turn a price change into a percent we simply divide by the stock price. If a $50 stock increased$1 it increased 2%

If we substitute the percent change in the stock for the change in the stock we must balance the identity by multiplying by :

If you have a differentiated opinion about a catalyst, the most efficient way to express it will be through options. They have the most urgent function to a reaction. If you think a $100 stock can move$10, but the straddle implies 5 you can make 100% on your money in a short window of time. Annualize that! Go a step further. Suppose you have an even finer view — you can handicap the direction. Now you can score a 5 or 10 bagger allocating the same capital to call options only. Conversely, if you do not have a specific view, then options can be an expensive, low-resolution solution. You pay for specificity just like parlay bets. The timing and distance of a stock’s move must collaborate to pay you off. So options, whether used explicitly for hedging or for speculating actually conform to a more over-arching definition of hedging — hedges are trades that isolate the investor’s risk. ## The Hedging Paradox If your trades have specific views or reasons, hedging is a good idea. Just like home insurance is a good idea. Whether you are conscious of it or not, owning a home is a bundle of bets. Your home’s value depends on interest rates, the local job market, and state policy. It also depends on some pretty specific events. For example, “not having a flood”. Insurance is a specific hedge for a specific risk. In The Laws Of Trading, author and trader Agustin Lebron states rule #3: Take the risks you are paid to take. Hedge the others. He’s reminding you to isolate your bets so they map as closely as possible to your original reason for wanting the exposure. You should be feeling tense right about now. “Dude, I’m not a robot with a Terminator HUD displaying every risk in my life and how hedged it is?”. Relax. Even if you were, you couldn’t do anything about it. Even if you had the computational wherewithal to identify every unintended risk, it would be too expensive to mitigate3. Who’s going to underwrite the sun not coming up tomorrow? [Actually, come to think of it, I will. If you want to buy galactic continuity insurance ping me and I’ll send you a BTC address]. We find ourselves torn: 1. We want to hedge the risks we are not paid to take. 2. Hedging is a cost What do we do? Before getting into this I will mention something a certain, beloved group of wonky readers are thinking: “Kris, just because insurance/hedging on its own is worth less than its actuarial value, the diversification can still be accretive at the portfolio level especially if we focus on geometric not arithmetic returns…rebalancing…convexi-…”[trails off as the sound of the podcast in the background drowns out the thought]. Guys (it’s definitely guys), I know. I’m talking net of all that. As the droplets of caveat settle the room like nerd Febreze, let’s see if we can give this conundrum a shape. ## Reconciling The Paradox This is a cornerstone of trading: Edge scales linearly, risk scales slower [As a pedological matter, I’m being a bit brusque. Bear with me. The principle and its demonstration are powerful, even if the details fork in practice.] Let’s start with coin flips: [A] You flip a coin 10 times, you expect 5 heads with a standard deviation of 1.584. [B] You flip 100 coins you expect 50 heads with a standard deviation of 5. Your expectancy scaled with N. 10x more flips, 10x more expected heads. But your standard deviation (ie volatility) only grew by √10 or 3.16x. The volatility or risk only scaled by a factor of √N while expectancy grew by N. This is the basis of one of my most fundamental posts, Understanding Edge. Casinos and market-makers alike “took a simple idea and took it seriously”. Taking this seriously means recognizing that edges are incredibly valuable. If you find an edge, you want to make sure to get as many chances to harvest it as possible. This has 2 requirements: 1. You need to be able to access it. 2. You need to survive so you can show up to collect it. The first requirement requires spotting an opportunity or class of opportunities, investing in its access, and warehousing the resultant risk. The second requirement is about managing the risk. That includes hedging and all its associated costs. The paradox is less mystifying as the problem takes shape. We need to take risk to make money, but we need to reduce risk to survive long enough to get to a large enough number of bets on a sliver of edge to accumulate meaningful profits. Hedging is a drawbridge from today until your capital can absorb more variance. ## The Interaction of Trading Costs, Hedging, and Risk/Reward Hedging reduces variance, in turn improving the risk/reward of a strategy. This comes at a substantial cost. Every options trader has lamented how large of line-item this cost has been over the years. Still, as the cost of survival, it is non-negotiable. We are going to hedge. So let’s pull apart the various interactions to gain intuition for the various trade-offs. Armed with the intuition, you can then fit the specifics of your own strategies into a risk management framework that aligns your objectives with the nature of your markets. Let’s introduce a simple numerical demonstration to anchor the discussion. Hedging is a big topic subject to many details. Fortunately, we can gesture at a complex array of considerations with a toy model. The Initial Proposition Imagine a contract that has an expected value of1.00 with a volatility (i.e. standard deviation) of $.80. You can buy this contract for$.96 yielding $.04 of theoretical edge. Your bankroll is$100.

[A quick observation so more advanced readers don’t have this lingering as we proceed:

The demonstration is going to bet a fixed amount, even as the profits accumulate. At first glance, this might feel foreign. In investing we typically think of bet size as a fraction of bankroll. In fact, a setup like this lends itself to Kelly sizing5. However, in trading businesses, the risk budget is often set at the beginning of the year based on the capital available at that time. As profits pile up, contributing to available capital, risk limits and bet sizes may expand. But such changes are more discrete than continuous so if we imagine our demonstration is occurring within a single discrete interval, perhaps 6 months or 1 year, this is a reasonable approach. It also keeps this particular discussion a bit simpler without sacrificing intuition.]

The following table summarizes the metrics for various trial sizes.

What you should notice:

• Expected value grows linearly with trial size
• The standard deviation of p/l grows slower (√N)
• Sharpe ratio (expectancy/standard deviation) is a measure of risk-reward. Its progression summarizes the first 2 bullets…as trials increase the risk/reward improves

Introducing Hedges

Let’s show the impact of adding a hedge to reduce risk. Let’s presume:

• The hedge costs $.01. This represents 25% of your$.04 of edge per contract. Options traders and market makers like to transform all metrics into a per/contract basis. That $.01 could be made up of direct transaction costs and slippage. [In reality, there is a mix of drudgery, assumptions, and data analysis to get a firm handle on these normalizations. A word to the uninitiated, most of trading is not sexy stuff, but tons of little micro-decisions and iterations to create an accounting system that describes the economic reality of what is happening in the weeds. Drunkenmiller and Buffet’s splashy bets get the headlines, but the magic is in the mundane.] • The hedge cuts the volatility in half. Right off the bat, you should expect the sharpe ratio to improve — you sacrificed 25% of your edge to cut 50% of the risk. The revised table: Notice: • Sharpe ratio is 50% higher across the board • You make less money. Let’s do one more demonstration. The “more expensive hedge scenario”. Presume: • The hedge costs$.02

This now eats up 50% of your edge.

• The hedge reduces the volatility 50%, just as the cheaper hedge did.

Summary:

Notice:

• The sharpe ratio is exactly the same as the initial strategy. Both your net edge and volatility dropped by 50%, affecting the numerator and denominator equally.

• Again the hedge cost scales linearly with edge, so you have the same risk-reward as the unhedged strategy you just make less money.

If hedging doesn’t improve the sharpe ratio because it’s too expensive, you have found a limit. Another way it could have been expensive is if the cost of the hedge stayed fixed at $.01 but the hedge only chopped 25% of the volatility. Again, your sharpe would be unchanged from the unhedged scenario but you just make less money. We can summarize all the results in this chart. The Bridge As you book profits, your capital increases. This leaves you with at least these choices: 1. Hedge less since your growing capital is absorbing the same risk 2. Increase bet size 3. Increase concurrent trials I will address #1 here, and the remaining choices in the ensuing discussion. Say you want to hedge less. This is always a temptation. As we’ve seen, you will make money faster if you avoid hedging costs. How do we think about the trade-off between the cost of hedging and risk/reward? We can actually target a desired risk/reward and let the target dictate if we should hedge based on the expected trial size. Sharpe ratio is a function of trial size: where: E = edge σ = volatility N = trials If we target a sharpe ratio of 1.0 we can re-arrange the equation to solve for how large our trial size needs to be to achieve the target. If our capital and preferences allow us to tolerate a sharpe of 1 and we believe we can get at least 400 trials, then we should not hedge. Suppose we don’t expect 400 chances to do our core trade, but the hedge that costs$.01 is available. What is the minimum number of trades we can do if we can only tolerate a sharpe as low as 1?

Using the same math as above (1/.075)2 = 178

The summary table:

If our minimum risk tolerance is a 1.5 sharpe, we need more trials:

If your minimum risk tolerance is 1.5 sharpe, and you only expect to do 2 trades per business day or about 500 trades per year, then you should hedge. If you can do twice as many trades per day, it’s acceptable to not hedge.

These toy demonstrations show:

• If you have positive expectancy, you should be trading
• The cost of a hedge scales linearly with edge, but volatility does not
• If the cost of a hedge is less than its proportional risk-reduction you have a choice whether to hedge or not
• The higher your risk tolerance the less you should hedge
• The decision to dial back the hedging depends on your risk tolerance (as proxied by a measure of risk/reward) vs your expected sample size

### Variables We Haven’t Considered

The demonstrations were simple but provides a mental template to contextualize cost/benefit analysis of risk mitigation in your own strategies. We kept it basic by only focusing on 3 variables:

• edge
• volatility
• risk tolerance as proxied by sharpe ratio

Let’s touch on additional variables that influence hedging decisions.

Bankroll

If your bankroll or capital is substantial compared to your bet size (perhaps you are betting far below Kelly or half-Kelly prescribed sizes) then it does not make sense to hedge. Hedges are negative expectancy trades that reduce risk.

We can drive this home with a sports betting example from the current March Madness tournament:

If you placed a $10 bet on St. Peters, by getting to the Sweet 16 you have already made 100x. You could lock it in by hedging all or part of it by betting against them, but the bookie vig would eat a slice of the profit. More relevant, the$1000 of equity might be meaningless compared to your assets. There’s no reason to hedge, you can sweat the risk. But what if you had bet $100 on St. Pete’s?$10,000 might quicken the ole’ pulse. Or what if you somehow happened upon a sports edge (just humor me) and thought you could put that $10k to work somewhere else instead of banking on an epic Cinderella story? If St. Pete’s odds for the remainder of the tourney are fair, then you will sacrifice expectancy by hedging or closing the trade. If you are rich, you probably just let it ride and avoid any further transaction costs. If you are trading relatively small, your problem is that you are not taking enough risk. The reason professionals don’t take more risk when they should is not because they are shy. It’s because of the next 2 variables. Capacity Per Trade Many lucrative edges are niche opportunities that are difficult to access for at least 2 reasons. • Adverse selection There might only be a small amount of liquidity at dislocated prices (this is a common oversight of backtests) because of competition for edge. Let’s return to the contract from the toy example. Its fair value is$1.00. Now imagine that there are related securities that getting bid up and market for our toy contract is:

.95 – 1.05

10 “up” (ie there are 10 contracts on the offer and 10 contracts bid for)

Based on what’s trading “away”, you think this contract is now worth $1.10. Let’s game this out. You quickly determine that the .95-1.05 market is simply a market-maker’s bid-ask spread. Market-makers tend to be large firms with tentacles in every related market to the ones they quote. It’s highly unlikely that the$1.05 offer is “real”. In other words, if you tried to lift it, you would only get a small amount of size.

What’s going on?

The market-maker might be leaving a stale quote to maximize expectancy. If a real sell order were to come in and offer at $1.00, the market maker might lift the size and book$.10 of edge to the updated theoretical value.

Of course, there’s a chance they might get lifted on their $1.05 stale offer but they might honor only a couple contracts. This is a simple expectancy problem. If 500 lots come in offered at$1.00, and they lift it, they make $5,000 profit ($.10 x 500 x option multiplier of 100). If you lift the $1.05 offer and they sell you 10 contracts, they suffer a measly$50 loss.

So if they believe there’s a 1% chance or greater of a 500 lot naively coming in and offering at mid-market then they are correct in posting the stale quote.

What do you do?

You were smart enough to recognize the game being played. You used second-order thinking to realize the quote was purposefully stale. In a sense, you are now in cahoots with the market maker. You are both waiting for the berry to drop. The problem is your electronic “eye” will be slower than the market-maker to snipe the berry when it comes in. Still, even if you have a 10% chance of winning the race, it still makes sense to leave the quote stale, rather than turn the offer. If you do manage to get at least a partial fill on the snipe, there’s no reason to hedge. You made plenty of edge, traded relatively small size, and most importantly know your counterparty was not informed!

As a rule, liquidity is poor when trades are juiciest. The adverse selection of your fills is most common in fast-moving markets if you do not have a broad, fast view of the flows. This is why a trader’s first questions are “Do I think I’m the first to have seen this order? Did someone with a better perch to see all the flow already pass on this trade?”

In many markets, if you are not the first you might as well be last. You are being arbed because there’s a better relative trade somewhere out there that you are not seeing.

[Side note: many people think a bookie or market-maker’s job is to balance flow. That can be true for deeply liquid instruments. But for many securities out there, one side of the market is dumb and one side is real. Markets are often leaned. Tables are set when certain flows are anticipated. If a giant periodic buy order gets filled at mid-market or even near the bid, look at the history of the quote for the preceding days. Market-making is not an exercise in posting “correct” markets. It’s a for-profit enterprise.]

• Liquidity

The bigger you attempt to trade at edgy prices, the more information you leak into the market. You are outsizing the available liquidity by allowing competitors to reverse engineer your thinking. If a large trade happens and immediately looks profitable to bystanders, they will study the signature of how you executed it. The market learns and copies. The edge decays until you’re flipping million dollar coins for even money as a loss leader to get a look at juicier flow from brokers.

As edge in particular trades dwindles, the need to hedge increases. The hedges themselves can get crowded or at least turn into a race.

Leverage

If a hedge, net of costs, improves the risk/reward of your position, you may entertain the use of leverage. This is especially tempting for high sharpes trades that have low absolute rates of return or edge. Market-making firms embody this approach. As registered broker-dealers they are afforded gracious leverage. Their businesses are ultimately capacity constrained and the edges are small but numerous. The leverage combined with sophisticated diversification (hedging!) creates a suitable if not impressive return on capital.

The danger with leverage is that it increases sensitivity to path and “risk of ruin”. In our toy model, we assumed a Gaussian distribution. Risk of ruin can be hard to estimate when distributions have unknowable amounts of skew or fatness in their tails. Leverage erodes your margin of error.

## General Hedging Discussion

As long as hedging, again net of costs, improves your risk/reward there is substantial room for creative implementation. We can touch on a few practical examples.

Point of sale hedging vs hedging bands

In the course of market-making, the primary risk is adverse selection. Am I being picked off? If you suspect the counterparty is “delta smart” (whenever they buy calls the stock immediately rips higher), you want to hedge immediately. This is a race condition with any other market makers who might have sold the calls and the bots that react to the calls being printed on the exchange. That is known as a point-of-sale hedge is an immediate response to a suspected “wired” order.

If you instead sold calls to a random, uninformed buyer you will likely not hedge. Instead, the delta risk gets thrown on the pile of deltas (ie directional stock exposures) the firm has accumulated. Perhaps it offsets existing delta risk or adds to it. Either way, there is no urgency to hedge that particular deal.

In practice, firms use hedging bands to manage directional risk. In a similar process to our toy demonstration, market-makers decide how much directional risk they are willing to carry as a function of capital and volatility. This allows them to hedge less, incurring less costs along the way, and allowing their capital to absorb randomness. Just like the rich bettor, who lets the St. Peter’s bet ride.

In The Risk-Reversal Premium, Euan Sinclair alludes to band-based hedging:

While this example shows the clear existence of a premium in the delta-hedged risk-reversal, this implementation is far from what traders would do in practice (Sinclair, 2013). Common industry practice is to let the delta of a position fluctuate within a certain band and only re-hedge when those bands are crossed. In our case, whenever the net delta of the options either drops below 20 or above 40, the portfolio is rebalanced by closing the position and re-establishing with the options that are now closest to 15-delta in the same expiration.

Part art, part science

Hedging is a minefield of regret. It’s costly, but the wisdom of offloading risks you are not paid for and conforming to a pre-determined risk profile is a time-tested idea. Here’s a dump of concerns that come to mind:

• If you hedge long gamma, but let short gamma ride you are letting losers grow and cutting winners short. Be consistent. If your delta tolerance is X and you hedge twice a day, you can cut all deltas in excess of X at the same 2 times every day. This will remove discretion from the decision. (I had one friend who used to hedge to flat every time he went to the bathroom. As long as he was regular this seemed reasonable to me.)

• Low net/high gross exposures are a sign of a hedged book. There are significant correlation risks under that hood. It’s not necessarily a red flag, but when paired with leverage, this should make you nervous.

• Are you hedging your daily, weekly, or monthly p/l? Measures of local risk like Greeks and spot/vol correlation are less trustworthy for longer timeframes. Spot/vol correlation (ie vol beta) is not invariant to price level, move size, and move speed. Longer time frames provide larger windows for these variables to change.  If oil vol beta is -1 (ie if oil rallies 1%, ATM vol vols 1%) do I really believe that the price going from 50 to 100 cuts the vol in half?

• There are massive benefits to scale for large traders who hedge. The more flow they interact with the more opportunity to favor anti-correlated or offsetting deltas because it saves them slippage on both sides. They turn everything they trade into a pooled delta or several pools of delta (so any tech name will be re-computed as an NDX exposure, while small-caps will be grouped as Russell exposures). This is efficient because they can accept the noise within the baskets and simply hedge each of the net SPX, NDX, IWM to flat once they reach specified thresholds.

The second-order effect of this is subtle and recursively makes markets more efficient. The best trading firms have the scale to bid closest to the clearing price for diversifiable risk6. This in turn, allows them to grab even more market share widening their advantage over the competition. If this sounds like big tech7, you are connecting the dots.

## Wrapping Up

The other market-makers in the product options pit were not wrong to hedge or close their trades as quickly as they did. They just had different constraints. Since they were trading their own capital, they tightly managed the p/l variance.

At the same time, if you were well-capitalized and recognized the amount of edge raining down in the market at the time, the ideal play was to take down as much risk as you could and find a hedge with perhaps more basis risk (and therefore less cost because the more highly correlated hedges were bid for) or simply allow the firm’s balance sheet to absorb it.

Since I was being paid as a function of my own p/l there was not perfect alignment of incentives between me and my employer (who would have been perfectly fine with me not hedging). If I made a great bet and lost, it would have been the right play but I personally didn’t want to tolerate not getting paid.

Hedging is a cost. You need to weigh that with the benefit and that artful equation is a function of:

• risk tolerance at every level of stakeholder — trader, manager, investor
• capital
• edge
• volatility
• liquidity

Maximizing is uncomfortable. Almost unnatural. It calls for you to tolerate larger swings, but it allows the theoretical edge to pile up faster. This post offers guardrails for dissecting a highly creative problem.

But if you consistently make money, ask yourself how much you might be leaving on the table. If you are making great trades somewhere, are you locking it in with bad trades? If you can’t tell what the good side is that’s ok.

But if you know the story of your edge, there’s a good chance you can do better.

# From CAPM To Hedging

This is a provocative question. Patrick was clever to disallow Berkshire. In this post, we are going to use this question to launch into the basics of regression, correlation, beta hedging and risk.

Let’s begin.

## My Reaction To The Question

I don’t know anything about picking stocks. I do know about the nature of stocks which makes this question scary. Why?

1. Stocks don’t last forever

Many stocks go to zero. The distribution of many stocks is positively skewed which means there’s a small chance of them going to the moon and reasonable chance that they go belly-up. The price of a stock reflects its mathematical expectation. Since the downside is bounded by zero and the upside is infinite, for the expectation to balance the probability of the stock going down can be much higher than our flawed memories would guess. Stock indices automatically rebalance, shedding companies that lose relevance and value. So the idea that stocks up over time is really stock indices go up over a time, even though individual stocks have a nasty habit of going to zero. For more see Is There Actually An Equity Premium Puzzle?.

2. Diversification is the only free lunch

The first point hinted at my concern with the question. I want to be diversified. Markets do not pay you for non-systematic risk. In other words, you do not get paid for risks that you can hedge. All but the most fundamental risks can be hedged with diversification. See Why You Don’t Get Paid For Diversifiable Risks. To understand how diversifiable risks get arbed out of the market ask yourself who the most efficient holder of a particular idiosyncratic risk is? If it’s not you, then you are being outbid by someone else, or you’re holding the risk at a price that doesn’t make sense given your portfolio choices. Read You Don’t See The Whole Picture to see why.

My concerns reveal why Berkshire would be an obvious choice. Patrick ruled it out to make the question much harder. Berkshire is a giant conglomerate. Many would have chosen it because it’s run by masterful investors Warren Buffet and Charlie Munger. But I would have chosen it because it’s diversified. It is one of the closest companies I could find to an equity index. Many people look at the question and think about where their return is going to be highest. I have no edge in that game. Instead, I want to minimize my risk by diversifying and accepting the market’s compensation for accepting broad equity exposure.

In a sense, this question reminds me of an interview question I’ve heard.

You are gifted $1,000,000 dollars. You must put it all in play on a roulette wheel. What do you do? The roulette wheel has negative edge no matter what you do. Your betting strategy can only alter the distribution. You can be crazy and bet it all on one number. Your expectancy is negative but the payoff is positively skewed…you probably lose your money but have a tiny chance at becoming super-rich. You can try to play it safe by risking your money on most of the numbers, but that is still negative expectancy. The skew flips to negative. You probably win, but there’s a small chance of losing most of your gifted cash. I would choose what’s known as a minimax strategy which seeks to minimize the maximum loss. I would spread my money evenly on all the numbers, accept a sure loss of 5.26%.1 The minimax response to Patrick’s question is to find the stock that is the most internally diversified. ## Berkshire Vs The Market I don’t have an answer to Patrick’s question. Feel free to explore the speculative responses in the thread. Instead, I want to dive further into my gut reaction that Berkshire would be a reasonable proxy to the market. If we look at the mean of its annual returns from 1965 to 2001, the numbers are gaudy. Its CAGR was 26.6% vs the SP500 at 11%. Different era. Finding opportunities at the scale Buffet needs to move the needle has been much harder in the past 2 decades. Buffet has been human for the past 20 years. This is a safer assumption than the hero stats he was putting up in the last half of the 20th century. The mean arithmetic returns and standard deviations validate my hunch that Berkshire’s size and diversification 2 make it behave like the whole market in a single stock. Let’s add a scatterplot with a regression. If you tried to anticipate Berkshire’s return, your best guess might be its past 20 year return, distributed similarly to its prior volatility. Another approach would be to see this relationship to the SP500 and notice that a portion of its return can simply be explained by the market. It clearly has a positive correlation to the SP500. But just how much of the relationship is explained by SP500? This is a large question with practical applications. Specifically, it underpins how market netural traders think about hedges. If I hedge an exposure to Y with X how much risk do I have remaining? To answer this question we will go on a little learning journey: 1. Deriving sensitivities from regressions in general 2. Interpreting the regression 3. CAPM: Applying regression to compute the “risk remaining of a hedge” On this journey you can expect to learn the difference between beta and correlation, build intuition for how regressions work, and see how market exposures are hedged. ## Unpacking The Berkshire Vs SP500 Regression A regression is simply a model of how an independent variable influences a dependant variable. Use a regression when you believe there is a causal relationship between 2 variables. Spurious correlations are correlations that will appear to be causal because they can be tight. The regression math may even suggest that’s the case. I’m sorry. Math is a just a tool. It requires judgement. The sheer number of measurable quanitites in the world guarantees an infinite list of correlations that serve as humor not insight3. The SP500 is steered by the corporate earnings of the largest public companies (and in the long-run the Main Street economy4) discounted by some risk-aware consensus. Berkshire is big and broad enough to inherit the same drivers. We accept that Berkshire’s returns are partly driven by the market and partly due to its own idiosyncracies. Satisfied that some of Berkshire’s returns are attributable to the broader market, we can use regression to understand the relationship. In the figure above, I had Excel simply draw a line that best fit the scatterplot with SP500 being the independent variable, or X, and Berkshire returns being the dependant or Y. The best fit line (there are many kinds of regression but we are using a simple linear regression) is defined the same way in line is: by a slope and an intercept. The regression equation should remind you of the generic form of a line y = mx + b where m is the slope and b is the intercept. In a regression: y=α+βx where: y = dependant variable (Berkshire returns) x = independent variable (SP500 returns) α = the intercept (a constant) β = the slope or sensitivity of the Y variable based on the X variable If you right-click on a scatterplot in Excel you can choose “Add Trendline”. It will open the below menu where you can set the fitted line to be linear and also check a box to “Display Equation on chart”. This is how I found the slope and intercept for the Berkshire chart: y = .6814x + .0307 Suppose the market returns 2%: Predicted Berkshire return = .6814 * 2% + 3.07% Predicted Berkshire return = 4.43% So based on actual data, we built a simple model of Berkshire’s returns as a function of the market. It’s worth slowing down to understand how this line is being created. Conceptually it is the line that minimizes the squared errors between itself and the actual data. Since each point has 2 coordinates, we are dealing with the variance of a joint distribution. We use covariance instead of variance but the concepts are analogous. With variance we square the deviations from a mean. For covariance, we multiply the distance of each X and Y in a coordinate from their respective means: (xᵢ – x̄)(yᵢ -ȳ) Armed with that idea, we can compute the regression line by hand with the following formulas: β or slope = covar(x,y)/ var(x) α or intercept = ȳ – β̄x̄ We will look at the full table of this computation later to verify Excel’s regression line. Before we do that, let’s make sure that this model is even helpful. One standard we could use to determine if the model is useful is if it performs better than the cheapest naive model that says: Our predicted Berkshire return simply is mean return from sample. This green arrows in this picture represent the error between this simple model and the actual returns. This naive model of summing the squared differences from the mean of Berkshire’s returns is exactly the same as variance. You are computing squared differences from a mean. If you take square root of the average of the squared differences you get a standard deviation. In, this simple model where our prediction is simply the mean our volatility is 16.5% or the volatility of Berkshire’s returns for 20 years. In the regression context, the total variance of the dependent variable from its mean is knows as the Total Sum of Squares or TSS The point of using regression though is we can make a better prediction of Berkshire’s returns if we know the SP500’s returns. So we can compare the mean to the fitted line instead of the actual returns. The sum of those squared differences is known as the Regression Sum Of Squares or RSS. This is the sum of squared deviations between the mean and fitted predictions instead of the actual returns. If there is tremendous overlap between the RSS and TSS, than we think much of the variance in X explains the variance of Y. The last quantity we can look at is the Error Sum of Squares or ESS. These are the deviations from the actual data to the predicted values represented by our fitted line. This represents the unexplained portion of Y’s variance. Let’s use 2008’s giant negative return to show how TSS, RSS, and ESS relate. The visual shows: TSS = RSS + ESS We can compute the sum of these squared deviations simply from their definitions:  TSS (aka variance) Σ(actual-mean)² ESS (sum of errors squared) Σ(actual-predicted)² RSS (aka TSS – ESS) Σ(predicted-mean)² The only other quantities we need are variances and covariances to compute β or slope of the regression line. In the table below: ŷ = the predicted value of Berkshire’s return aka “y-hat” x̄ = mean SP500 return aka “x-bar” ȳ = mean Berkshire return aka “y-bar” β = .40 / .59 = .6814 α = ȳ – β̄x̄ = 10.6% – .6814 * 11.1% = 3.07% This yields the same regression equation Excel spit out: y=α+βx ŷ = 3.07% + .6814x ## R-Squared We walked through this slowly as a learning exercise, but the payoff is appreciating the R². Excel computed it as 52%. But we did everything we need to compute it by hand. Go back to our different sum of squares. TSS or variance of Y = .52 ESS or sum of squared difference between actual data and the model = .25 Re-arranging TSS = RSS + ESS we can see that RSS = .27 Which brings us to: R² = RSS/TSS = .27/.52 = 52% Same as Excel! R² is the regression sum of squares divided by the total variance of Y. It is called the coefficient of determination and can be interpreted as: The variability in Y explained by X So based on this small sample, 52% of Berkshire’s variance is explained by the market, as proxied by the SP500. ## Correlation Correlation, r (or if you prefer Greek, ρ) can be computed in at least 2 ways. It’s the square root of R². r = √R² = √.52 = .72 We can confirm this by computing correlation by hand according to its own formula: Substituting: Looking at the table above we have all the inputs: r = .40 / sqrt(.59 x .52) r = .72 Variance is an unintuitive number. By taking the square root of variance, we arrive at a standard deviation which we can actually use. Similarly, covariance is an intermediate computation lacking intuition. By normalizing it (ie dividing it) by the standard deviations of X and Y we arrive at correlation, a measure that holds meaning to us. It is bounded by -1 and +1. If the correlation is .72 then we can make the following statement: If x is 1 standard deviation above its mean, I expect y to be .72 standard deviations above its own mean. It is a normalized measure of how one variable co-varies versus the other. ## How Beta And Correlation Relate Beta, β, is the slope of the regression equation. Correlation is the square root of R2 or coefficient of determination. Beta actually embeds correlation within it. Look closely at the formulas: Watch what happens when we divide β̄ by r. Whoa. Beta equals correlation times the ratio of the standard deviations. The significance of that insight is about to become clear as we move from our general use of regression to the familiar CAPM regression. From the CAPM formula we can derive the basis of hedge ratios and more! We have done all the heavy lifting at this point. The reward will be a set of simple, handy formulas that have served me throughout my trading career. Let’s continue. ## From Regression To CAPM The famous CAPM pricing equation is a simple linear regression stipulating that the return of an asset is a function of the risk free rate, a beta to the broader market, plus an error term that represents the security’s own idiosyncratic risk. Rᵢ = Rբ + β(Rₘ – Rբ) + Eᵢ where: Rᵢ = security total return Rբ = risk-free rate β = sensitivity of security’s return to the overall market’s excess return (ie the return above the risk-free rate) Eᵢ = the security’s unique return (aka the error or noise term) Since the risk-free rate is a constant, let’s scrap it to clean the equation up. This is the variance equation for this security: Recall that beta is the vol ratio * correlation: We can use this to factor the “market variance” term. Plugging this form of “variance due to the market” back into the variance equation: This reduces to the prized equation: The “risk remaining” formula which is the proportion of a stock’s volatility due to its own idiosyncratic risk. This makes sense. R2 is the amount of variance in a dependant variable attributable to indepedent variable. If we subtract that proportion from 1 we arrive at the “unexplained” or idiosyncratic variance. By taking the square root of that quantity, we are left with unexplained volatility or “risk remaining”. Let’s use what we’ve learned in a concrete example. ## From CAPM To Hedge Ratios Let’s return to Berkshire vs the SP500. Suppose we are long$10mm worth of BRK.B and want to hedge our exposure by going short SP500 futures.

We want to compute:

1. How many dollars worth of SP500 to get short
2. The “risk remaining” on the hedged portfolio

How many dollars of SP500 do we need to short?

Before we answer this lets consider a few ways we can hedge with SP500.

• Dollar weighting

We could simply sell $10mm worth of SP500 futures which corresponds to our$10mm long in BRK.B. Since Berkshire and the SP500 are a similar volatility this is a reasonable approach. But suppose we were long TSLA instead of BRK.B. Assuming TSLA was sufficiently correlated to the market (say .70 like BRK.B), the SP500 hedge would be “too light”.

Why?

Because TSLA is about 3x more volatile than the SP500. If the SP500 fell 1 standard deviation, we expect TSLA to fall .70 standard deviations. Since TSLA’s standard deviations are much larger than the SP500 we would be tragically underhedged. Our TSLA long would lose much more money than our short SP500 position because we are not short enough dollars of SP500.

• Vol weighting

Dollar weighting is clearly naive if there are large differences in volatility between our long and short. Let’s stick with the TSLA example. If TSLA is 3x as volatile as the SP500 then if we are long $10mm TSLA, we need to short$30mm worth of SP500.

Uh oh.

That’s going to be too much. Remember the correlation. It’s only .70. The pure vol weighted hedge only makes sense if the correlations are 1. If the SP500 drops one standard deviation, we expect TSLA to drop only .70 standard deviations, not a full standard deviation. In this case, we will have made too much money on our hedge, but if the market would have rallied 1 standard deviation our oversized short would have been “heavy”. We would lose more money than we gained on our TSLA long. Again, only partially hedged.

• Beta weighting

Alas, we arrive at the goldilocks solution. We use the beta or slope of the linear regression to weight our hedge. Since beta equals correlation * vol ratio we are incorporating both vol and correlation weighting into our hedge!

I made up numbers vols and correlations to complete the summary tables below. The key is seeing how much the prescribed hedge ratios can vary depending on how you weight the trades.

Beta weighting accounts for both relative volatilies and the correlation between names. Beta has a one-to-many relationship to its construction. A beta of .5 can come from:

• A .50 correlation but equal vols
• A .90 correlation but vol ratio of .56
• A .25 correlation but vol ratio of 2

It’s important to decompose betas because the correlation portion is what determines the “risk remaining” on a hedge. Let’s take a look.

How much risk remains on our hedges?

We are long $10,000,000 of TSLA We sell$21,000,000 of SP500 futures as a beta-weighted hedge.

Risk remaining is the volatility of TSLA that is unexplained by the market.

• R2 is the amount of variance in the TSLA position explained by the market.
• 1-R2 is the amount of variance that remains unexplained
• The vol remaining is sqrt(1-R2)

Risk (or vol) remaining = sqrt (1-.72) = 51%

TSLA annual volatility is 45% so the risk remaining is 51% * 45% = 22.95%

22.95% of $10,0000 of TSLA =$2,295,000

So if you ran a hedged position, within 1 standard deviation, you still expect $2,295,000 worth of noise! Remember correlation is symmetrical. The correlation of A to B is the same as the correlation of B to A (you can confirm this by looking at the formula). Beta is not symmetrical because it’s correlation * σdependant / σindependent Yet risk remaining only depends on correlation. So what happens if we flipped the problem and tried to hedge$10,000,000 worth of SP500 with a short TSLA position.

1. First, this is conceptually a more dangerous idea. Even though the correlation is .70, we are less likely to believe that TSLA’s variance explains the SP500’s variance. Math without judgement will impale you on a spear of overconfidence.

2. I’ll work through the example just to be complete.

To compute beta we flip the vol ratio from 3 to 1/3 then multiply by the correlation of .7

Beta of SP500 to TSLA is .333 * .7 = .233

If we are long $10,000,000 of SP500, we sell$2,333,000 of TSLA. The risk remaining is still 51% but it is applied to the SP500 volatility of 15%.

51% x 15% = 7.65% so we expect 7.65% of $10,000,000 or$765,000 of the SP500 position to be unexplained by TSLA.

3. I’m re-emphasizing: math without judgement is a recipe for disaster. The formulas are tools, not substitutes for reasoning.

Changes in Correlation Have Non-Linear Effects On Your Risk

Hedging is tricky. You can see that risk remaining explodes rapidly as correlation falls.

If correlation is as high as .86, you already have 50% risk remaining!

In practice, a market maker may:

1. group exposures to the most related index (they might have NDX, SPX, and IWM buckets for example)
2. offset deltas between exposures as they accumulate
3. and hedge the remaining deltas with futures.

You might create risk tolerances that stop you from say being long $50mm worth of SPX and short$50mm of NDX leaving you exposed the underlying factors which differentiate these indices. Even though they might be tightly correlated intraday, the correlation change over time and your risk-remaining can begin to swamp your edge.

The point of hedging is to neutralize the risks you are not paid to take. But hedging is costly. Traders must always balance these trade-offs in the context of their capital, risk tolerances, and changing correlations.

## Review

I walked slowly through topics that are familiar to many investors and traders. I did this because the grout in these ideas often trigger an insight or newfound clarity of something we thought we understood.

This is a recap of important ideas in this post:

• Variance is a measure of dispersion for a single distribution. Covariance is a measure of dispersion for a joint distribution.
• Just as we take the square root of variance to normalize it to something useful (standard deviation, or in a finance context — volatility), we normalize covariance into correlation.
• Intuition for a positive(negative) correlation: if X is N standard deviations above its mean, Y is r * N standard deviations above(below) its mean.
• Beta is r * the vol ratio of Y to X. In a finance context, it allows it allows us to convert a correlation from a standard deviation comparison to a simple elasticity. If beta = 1.5, then if X is up 2%, I expect Y to be up 3%
• Correlation is symmetrical. Beta is not.
• Ris the variance explained by the independent variable. Risk remaining is the volatility that remains unexplained. It is equal to sqrt(1-R2).
• There is a surprising amount of risk remaining even if correlations are strong. At a correlation of .86, there is 50% unexplained variance!
• Don’t compute robotically. Reason > formulas.

Beware.

Least squares linear regression is only one method for fitting a line. It only works for linear relationships. Its application is fraught with pitfalls. It’s important to understand the assumptions in any models you use before they become load-bearing beams in your process.

References:

The table in this post was entirely inspired by Rahul Pathak’s post Anova For Regression.

For the primer on regression and sum of squares I read these 365 DataScience posts in hte following order:

# There’s Gold In Them Thar Tails: Part 1

If you were accepted to a selective college or job in the 90s, have you ever wondered if you’d get accepted in today’s environment? I wonder myself. It leaves me feeling grateful because I think the younger version of me would not have gotten into Cornell or SIG today. Not that I dwell on this too much. I take Heraclitus at his word that we do not cross the same river twice. Transporting a fixed mental impression of yourself into another era is naive (cc the self-righteous who think they’d be on the right side of history on every topic). Still, my self-deprecation has teeth. When I speak to friends with teens I hear too many stories of sterling resumes bulging with 3.9 GPAs, extracurriculars, and Varsity sport letters, being warned: “don’t bother applying to Cal”.

A close trader friend explained his approach. His daughter is a high achiever. She’s also a prolific writer. Her passion is the type all parents hope their children will be lucky enough to discover. My friend recognizes that the bar is so high to get into a top school that acceptance above that bar is a roulette wheel. With so much randomness lying above a strict filter, he de-escalates the importance of getting into an elite school. “Do what you can, but your life doesn’t depend on the whim of an admissions officer”. She will lean into getting better at what she loves wherever she lands. This approach is not just compassionate but correct. She’s thought ahead, got her umbrella, but she can’t control the weather.

My friend’s insight that acceptance above a high threshold is random is profound. And timely. I had just finished reading Rohit Krishnan’s outstanding post Spot The Outlier, and immediately sent it to my friend.

I chased down several citations in Rohit’s post to improve my understanding of this topic.

In this post, we will tie together:

1. Why the funnels are getting narrower
2. The trade-offs in our selection criteria
3. The nature of the extremes: tail divergence
4. Strategies for the extremes

We will extend the discussion in a later post with:

1. What this means for intuition in general
2. Applications to investing

## Why Are The Funnels Getting Narrower?

The answer to this question is simple: abundance.

In college admissions, the number of candidates in aggregate grows with the population. But this isn’t the main driver behind the increased selectivity.  The chart below shows UC acceptance rates plummeting as total applications outstrip admits.

The spread between applicants and admissions has exploded. UCLA received almost 170k applications for the 2021 academic year! Cal receives over 100k applicants for about 10k spots. Your chances of getting in have cratered in the past 20 years. Applications have lapped population growth due to a familiar culprit: connectivity. It is much easier to apply to schools today. The UC system now uses a single boilerplate application for all of its campuses.

This dynamic exists everywhere. You can apply to hundreds of jobs without a postage stamp. Artists, writers, analysts, coders, designers can all contribute their work to the world in a permissionless way with as little as a smartphone. Sifting through it all necessitated the rise of algorithms — the admissions officers of our attention.

There’s a trade-off between signal and variance. What if Spotify employed an extremely narrow recommendation engine indexed soley on artist? If listening to Enter Sandman only lead you to Metallica’s deepest cuts, the engine is failing to aid discovery. If it indexed by “year”, you’d get a lot more variance since it would choose across genres, but headbangers don’t want to listen to Color Me Badd.  This prediction fails to delight the user.

Algorithms are smarter than my cardboard examples but the tension remains. Our solutions to one problem excarbates another. Rohit describes the dilemma:

The solution to the problem of discovery is better selection, which is the second problem. Discovery problems demand you do something different, change your strategy, to fight to be amongst those who get seen.

There’s plenty of low-hanging fruit to find recommendations that reside between Color Me Badd and St. Anger. But once it’s picked, we are still left with a vast universe of possible songs for the recommendation engine to choose from.

Selection problems reinforce the fact that what we can measure and what we want to measure are two different things, and they diverge once you get past the easy quadrant.

In other words, it’s easy enough to rule out B students, but we still need to make tens of thousands of coinflip-like decisions between the remaining A students. Are even stricter exams an effective way narrow an unwieldy number of similar candidates? Since in many cases predictors poorly map to the target, the answer is probably no. Imagine taking it to the extreme and setting the cutoff to the lowest SAT score that would satisfy Cal’s expected enrollment. Say that’s 1400. This feels wrong for good reasons (and this is not even touching the hot stove topic of “fairness”). Our metrics are simply imperfect proxies for who we want to admit. In mathy language we can say, the best person at Y (our target variable) is not likely to come from the best candidates we screened if the screening criteria, X, is an imperfect correlate of success(Y).

The cost of this imperfect correlation is a loss of diversity or variance. Rohit articulates the true goal of selection criteria (emphasis mine):

Since no exam perfectly captures the necessary qualities of the work, you end up over-indexing on some qualities to the detriment of others. For most selection processes the idea isn’t to get those that perfectly fit the criteria as much as a good selection of people from amongst whom a great candidate can emerge.

This is even true in sports. Imagine you have a high NBA draft pick. A great professional must endure 82 games (plus a long playoff season), fame, money, and most importantly, a sustained level of unprecedented competition. Until the pros, they were kids. Big fish in small ponds. If you are selecting for an NBA player with narrow metrics, even beyond the well-understood requisite screens for talent, then those metrics are likely to be a poor guide to how the player will handle such an outlier life. The criteria will become more squishy as you try to parse the right tail of the distribution.

In the heart of the population distribution, the contribution to signal of increasing selectivity is worth the loss of variance. We can safely rule out B students for Cal and D3 basketball players for the NBA.  But as we get closer to elite performers, at what point should our metrics give way to discretion? Rohit provides a hint:

When the correlation between the variable measured and outcome desired isn’t a hundred percent, the point at which the variance starts outweighing the mean error is where dragons lie!

## Nature Of The Extremes: Tail Divergence

To appreciate why the signal of our predictive metrics become random at the extreme right tail we start with these intuitive observations via LessWrong:

Extreme outliers of a given predictor are seldom similarly extreme outliers on the outcome it predicts, and vice versa. Although 6’7″ is very tall, it lies within a couple of standard deviations of the median US adult male height – there are many thousands of US men taller than the average NBA player, yet are not in the NBA. Although elite tennis players have very fast serves, if you look at the players serving the fastest serves ever recorded, they aren’t the very best players of their time. It is harder to look at the IQ case due to test ceilings, but again there seems to be some divergence near the top: the very highest earners tendto be very smart, but their intelligence is not in step with their income (their cognitive ability is around +3 to +4 SD above the mean, yet their wealth is much higher than this).

The trend seems to be that even when two factors are correlated, their tails diverge: the fastest servers are good tennis players, but not the very best (and the very best players serve fast, but not the very fastest); the very richest tend to be smart, but not the very smartest (and vice versa).

The post uses simple scatterplots to demonstrate. Here are 2 self-explanatory charts.

LessWrong contines: Given a correlation, the envelope of the distribution should form some sort of ellipse, narrower as the correlation goes stronger, and more circular as it gets weaker.

If we zoom into the far corners of the ellipse, we see ‘divergence of the tails’: as the ellipse doesn’t sharpen to a point, there are bulges where the maximum x and y values lie with sub-maximal y and x values respectively:

Say X is SAT score and Y is college GPA. We shoudn’t expect that the person with highest SATs will earn the highest GPA. SAT is an imperfect correlate of GPA. LessWrong’s interpretation is not surprising:

The fact that a correlation is less than 1 implies that other things matter to an outcome of interest. Although being tall matters for being good at basketball, strength, agility, hand-eye-coordination matter as well (to name but a few). The same applies to other outcomes where multiple factors play a role: being smart helps in getting rich, but so does being hard working, being lucky, and so on.

Pushing this even further, if we zoom in on the extreme of a distribution we may find correlations invert! This scatterplot via Brilliant.org shows a positive correlation over the full sample (pink) but a negative correlation for a slice (blue).

This is known as Berkson’s Paradox and can appear when you measure a correlation over a “restricted range” of a distribution (for example, if we restrict our sample to the best 20 basketball players in the world we might find that height is negatively correlated to skill if the best players were mostly point guards).

[I’ve written about Berkson’s Paradox here. Always be wary of someone trying to show a correlation from a cherry-picked range of a distribution. Once you internalize this you will see it everywhere! I’d be charitable to the perpetrator. I suspect it’s usually careless thinking rather than a nefarious attempt to persuade.]

## Strategies For The Extremes

In 1849, assayor Dr. M. F. Stephenson shouted ‘There’s gold in them thar hills’ from the steps of the Lumpkin County Courthouse in a desperate bid to keep the miners in Georgia from heading west to chase riches in California. We know there’s gold in the tails of distributions but our standard filters are unfit to sift for them.

Let’s pause to take inventory of what we know.

1. As the number of candidates or choices increases we demand stricter criteria to keep the field to a manageable size.
2. At some cutoff, in the extreme of a distribution, selection metrics can lead to random or even misleading predictions. 1

3. Evolution in nature works by applying competitve pressures to a diverse population to stimulate adaptation (a form of learning). Diversity is more than a social buzzword. It’s an essential input to progress. Rohit implicitly acknowledges the dangers of inbreeding when he warns against putting folks through a selection process that reflexively molds them into rule-following perfectionists rather than those who are willing to take risks to create something new.

With these premises in place we can theorize strategies for both the selector and the selectee to improve the match between a system’s desired output (the definition of success depends on the context) and its inputs (the criteria the selector uses to filter).

Selector Strategies

We can continue to rely on conventional metrics to filter the meat of the distribution for a pool of candidates. As we get into the tails, our adherence and reverance for measures should be put aside in favor of increasing diversity and variance. Remember the output of an overly strict filter in the tail is arbitrary anyway. Instead we can be deliberate about the randomness we let seep into selections to maximize the upside of our optionality.

Rohit summarizes the philosophy:

Change our thinking from a selection mindset (hire the best 5%) to a curation mindset (give more people a chance, to get to the best 5%).

Practically speaking this means selectors must widen the top of the funnel then…enforce the higher variance strategy of hire-and-train.

Rohit furnishes examples:

• Tyler Cowen’s strategy of identifying unconventional talent and placing small but influential bets on the candidates. This is easier to say than do but Tony Kulesa finds some hints in Cowen’s template.
• The Marine Corps famously funnels wide electing not to focus so much on the incoming qualifications, but rather look at recruiting a large class and banking on attrition to select the right few.
• Investment banks and consulting firms hire a large group of generically smart associates, and let attrition decide who is best suited to stick around.

David Epstein, author of Range and The Sports Gene, has spent the past decade studying the development of talent in sports and beyond. He echoes these strategies:

One practice we’ve often come back to: not forcing selection earlier than necessary. People develop at different speeds, so keep the participation funnel wide, with as many access points as possible, for as long as possible. I think that’s a pretty good principle in general, not just for sports.

I’ll add 2 meta observations to these strategies:

1. The silent implication is the upside of matching the right talent to the right role is potentially massive. If you were hiring someone to bag groceries the payoff to finding the fastest bagger on the planet is capped. An efficient checkout process is not the bottleneck to a supermarket’s profits. There’s a predictable ceiling to optimizing it to the microsecond. That’s not the case with roles in the above examples.

2. Increasing adoption of these strategies requires thoughtful “accounting” design. High stakes busts, whether they are first round draft picks or 10x engineers, are expensive in time and money for the employer and candidate. If we introduce more of a curation mindset, cast wider nets and hire more employees, we need to understand that the direct costs of doing that should be weighed against the opaque and deferred costs of taking a full-size position in expensive employees from the outset.

Accrual accounting is an attempt match a business’ economic mechanics to meaningful reports of stocks and flows so we extract insights that lead to better bets. Fully internalized, we must recognize that some amount of churn is expected as “breakage”. Lost option premiums need to be charged against the options that have paid off 100x. If an organization fails to design its incentive and accounting structures in accordance with curation/optionality thinking it will be unable to maintain its discipline to the strategy.

Selectee Strategies

For the selectee trying to maximise their own potential there are strategies which exploit the divergence in the tails.

To understand, we first recognize, that in any complicated domain, the effort to become the best is not linear. You could devote a few years to becoming an 80th or 90 percentile golfer or chess player. But in your lifetime you wouldn’t become Tiger or Magnus. The rewards to effort decay exponentially after a certain point. Anyone who has lifted weights knows you can spend a year progressing rapidly, only to hit a plateau that lasts just as long.

The folk wisdom of the 80/20 rule captures this succintly: 80% of the reward comes from 20% of the effort, and the remaining 20% of the reward requires 80% effort. The exact numbers don’t matter. Divorced from contexts, it’s more of a guideline.

This is the invisible foundation of Marc Andreesen and Scott Adam’s career advice to level up your skills in multiple domains. Say coding and public speaking or writing plus math. If it’s exponentially easier to get to the 90th percentile than the 99th then consider the arithmetic2.

a) If you are in the 99th percentile you are 1 in 100.

b) If you are top 10% in 2 different (technically uncorrelated) domains then you are also 1 in 100 because 10% x 10% = 1%

It’s exponentially easier to achieve the second scenario because of the effort scaling function.

If this feels too stifling you can simply follow your curiosity. In Why History’s Greatest Innovators Optimized for Interesting, Taylor Pearson summarizes the work of Juergen Schmidhuber which contends that curiousity is the desire to make sense of, or compress, information in such a way that we make it more beautiful or useful in its newly ordered form. If learning (or as I prefer to say – adapting) is downstream from curiousity we should optimize for interesting

Lawrence Yeo unknowingly takes the baton in True Learning Is Done With Agency, with his practical advice. He tells us to truly learn we must:

decouple an interest from its practical value. Instead of embarking on something with an end goal in mind, you do it for its own sake. You don’t learn because of the career path it’ll open up, but because you often wonder about the topic at hand.

…understand that a pursuit truly driven by curiosity will inevitably lend itself to practical value anyway. The internet has massively widened the scope of possible careers, and it rewards those who exercise agency in what they pursue.

## Conclusion

Rohit’s essay anchored Part 1 of this series. I can’t do better than let his words linger before moving on to Part 2.

If measurement is too strict, we lose out on variance.

If we lose out on variance, we miss out on what actually impacts outcomes.

If we miss what actually impacts outcomes, we think we’re in a rut.

But we might not be.

Once you’ve weeded out the clear “no”s, then it’s better to bet on variance rather than trying to ascertain the true mean through imprecise means.

We should at least recognize that our problems might be stemming from selection efforts. We should probably lower our bars at the margin and rely on actual performance [as opposed to proxies for performance] to select for the best. And face up to the fact that maybe we need lower retention and higher experimentation.

In Part 2, we will explore what divergence in the tails can tell us about about life and investing.

# Solving A Compounding Riddle With Black-Scholes

A few weeks ago I was getting on an airplane armed with a paper and pen, ready to solve the problem in the tweet below. And while I think you will enjoy the approach, the real payoff is going to follow shortly after — I’ll show you how to not only solve it with option theory but expand your understanding of the volatility surface. This is going to be fun. Thinking caps on. Let’s go.

## The Question That Launched This Post

From that tweet, you can see the distribution of answers has no real consensus. So don’t let others’ choices affect you. Try to solve the problem yourself. I’ll re-state some focusing details:

• Stock A compounds at 10% per year with no volatility
• Stock B has the same annual expectancy as A but has volatility. Its annual return is binomial — either up 30% or down 10%.
• After 10 years, what’s the chance volatile stock B is higher than A?

You’ll get the most out of this post if you try to solve the problem. Give it a shot. Take note of your gut reactions before you start working through it. In the next section, I will share my gut reaction and solution.

## My Approach To The Problem

### Gut Reaction

So the first thing I noticed is that this is a “compounding” problem. It’s multiplicative. We are going to be letting our wealth ride and incurring a percent return. We are applying a rate of return to some corpus of wealth that is growing or shrinking. I’m being heavy-handed in identifying that because it stands in contrast to a situation where you earn a return, take profits off the table, and bet again. Or situations, where you bet a fixed amount in a game as opposed to a fraction of your bankroll. This particular poll question is a compounding question, akin to re-investing dividends not spending them. This is the typical context investors reason about when doing “return” math. Your mind should switch into “compounding” mode when you identify these multiplicative situations.

So if this is a compounding problem, and the arithmetic returns for both investments are 10% I immediately know that volatile stock “B” is likely to be lower than stock “A” after 10 years. This is because of the “volatility tax” or what I’ve called the volatility drain. Still, that only conclusively rules out choice #4. Since we could rule that without doing any work and over 2,000 respondents selected it, I know there’s a good reason to write this post!

### Showing My Work

Here’s how I reasoned through the problem step-by-step.

Stock A’s Path (10% compounded annually)

Stock B’s Path (up 30% or down 10%)

The fancy term for this is “binomial tree” but it’s an easy concept visually. Let’s start simple and just draw the path for the first 2 years. Up nodes are created by multiplying the stock price by 1.3, down modes are created by multiplying by .90.

Inferences

Year 1: 2 cumulative outcomes. Volatile stock B is 50/50 to outperform
Year 2: There are 3 cumulative outcomes. Stock B only outperforms in one of them.

Let’s pause here because while we are mapping the outcome space, we need to recognize that not every one of these outcomes has equal probability.

2 points to keep in mind:

• In a binomial tree, the number of possibilities is 2ᴺ where N is the number of years. This makes sense since each node in the tree has 2 possible outcomes, the tree grows by 2ᴺ.
• However, the number of outcomes is N + 1. So in Year 1, there are 2 possible outcomes. In year 2, 3 possible outcomes.

Probability is the number of ways an outcome can occur divided by the total number of possibilities.

Visually:

So by year 2 (N=2), there are 3 outcomes (N+1) and 4 cumulative paths (2ᴺ)

We are moving slowly, but we are getting somewhere.

In year 1, the volatile investment has a 50% chance of winning. The frequency of win paths and lose paths are equal. But what happens in an even year?

There is an odd number of outcomes, with the middle outcome representing the number of winning years and the number of losing years being exactly the same. If the frequency of the wins and losses is the same the volatility tax dominates. If you start with $100 and make 10% then lose 10% the following year, your cumulative result is a loss.$100 x 1.1 x .9 = $99 Order doesn’t matter.$100 x .9 x 1.1 = $99 In odd years, like year 3, there is a clear winner because the number of wins and losses cannot be the same. Just like a 3-game series. Solving for year 10 If we extend this logic, it’s clear that year 10 is going to have a big volatility tax embedded in it because of the term that includes stock B having 5 up years and 5 loss years. N = 10 Outcomes (N+1) = 11 (ie 10 up years, 9 up years, 8 up years…0 up years) # of paths (2ᴺ) = 1024 We know that 10, 9, 8,7,6 “ups” result in B > A. We know that 4, 3, 2,1, 0 “ups” result in B < A The odds of those outcomes are symmetrical. So the question is how often does 5 wins, 5 losses happen? That’s the outcome in which stock A wins because the volatility tax effect is so dominant. The number of ways to have 5 wins in 10 years is a combination formula for “10 choose 5”: ₁₀C₅ or in Excel =combin(10,5) = 252 So there are 252 out of 1024 total paths in which there are 5 wins and 5 losses. 24.6% 24.6% of the time the volatility tax causes A > B. The remaining paths represent 75.4% of the paths and those have a clear winner that is evenly split between A>B and B>A. 75.4% / 2 = 37.7% So volatile stock B only outperforms stock A 37.7% of the time despite having the same arithmetic expectancy! This will surprise nobody who recognized that the geometric mean corresponds to the median of a compounding process. The geometric mean of this investment is not 10% per year but 8.17%. Think of how you compute a CAGR by taking the terminal wealth and raising it to the 1/N power. So if you returned$2 after 10 years on a $1 investment your CAGR is 2^(1/10) – 1 = 7.18%. To compute a geometric mean for stock B we invert the math: .9^(1/2) * 1.3^(1/2) -1 = 8.17%. (we’ll come back to this after a few pictures) #### The Full Visual A fun thing to recognize with binomial trees is that the coefficients (ie the number of ways a path can be made that we denoted with the “combination” formula) can be created easily with Pascal’s Triangle. Simply sum the 2 coefficients directly from the line above it. Coefficients of the binomial expansion (# of ways to form the path) Probabilities (# of ways to form each path divided by total paths) Corresponding Price Paths Above we computed the geometric mean to be 8.17%. If we compounded$100 at 8.17% for 10 years we end up with $219 which is the median result that corresponds to 5 up years and 5 down years! ## The Problem With This Solution I solved the 10-year problem by recognizing that, in even years, the volatility tax would cause volatile stock B to lose when the up years and down years occurred equally. (Note that while an equal number of heads and tails is the most likely outcome, it’s still not likely. There’s a 24.6% chance that it happens in 10 trials). But there’s an issue. My intuition doesn’t scale for large N. Consider 100 years. Even in the case where B is up 51 times and down 49 times the volatility tax will still cause the cumulative return of B < A. We can use guess-and-test to see how many winning years B needs to have to overcome the tax for N = 100. N = 100 If we put$1 into A, it grows at 1.1^100 = $13,871 If we put$1 into B and it has 54 winning years and 46 losing years, it will return 1.3^54 * .9^46 = $11,171. It underperforms A. If we put$1 into B and it has 55 winning years and 45 losing years, it will return 1.3^55 * .9^45 = $16,136. It outperforms A. So B needs to have 55 “ups”/45 “downs” or about 20% more winning years to overcome the volatility tax. It’s not as simple as it needs to win more times than stock A, like we found for shorter horizons. We need a better way. ## The General Solution Comes From Continuous Compounding: The Gateway To Option Theory In the question above, we compounded the arithmetic return of 10% annually to get our expectancy for the stocks. Both stocks’ expected value after 10 years is 100 * 1.1^10 =$259.37.

Be careful. You don’t want the whole idea of the geometric mean to trip you up. The compounding of volatility does NOT change the expectancy. It changes the distribution of outcomes. This is crucial.

The expectancy is the same, the distribution differs.

If we keep cutting the compounding periods from 1 year to 1 week to 1 minute…we approach continuous compounding. That’s what logreturns are. Continuously compounded returns.

Here’s the key:

Returns conform to a lognormal distribution. You cannot lose more than 100% but you have unlimited upside because of the continuous compounding. Compared to a bell-curve the lognormal distribution is positively skewed. The counterbalance of the positive skew is that the geometric mean or center of mass of the distribution is necessarily lower than the arithmetic expectancy. How much lower? It depends on the volatility because the volatility tax1 pulls the geometric mean down from the arithmetic mean or expectancy. The higher the volatility, the more positively skewed the lognormal or compounded distribution is. The more volatile the asset is in a positively skewed distribution the larger the right tail grows since the left tail is bounded by zero. The counterbalance to the positive skew is that the most likely outcome is the geometric mean.

I’ll pause here for a moment to just hammer home the idea of positive skew:

If stock B doubled 20% of the time and lost 12.5% the remaining 80% of the time its average return would be exactly the same as stock A after 1 year (20% * $200 + 80% *$87.5 = $110). The arithmetic mean is the same. But the most common lived result is that you lose. The more we crank the volatility higher, the more it looks like a lotto ticket with a low probability outcome driving the average return. Look at the terminal prices for stock B: The arithmetic mean is the same as A,$259.

The geometric or mean or most likely outcome is only $219 (again corresponding to the 8.17% geometric return) The magnitude of that long right tail ($1,379 is > 1200% total return, while the left tail is a cumulative loss of 65%) is driving that 10% arithmetic return.

Compounding is pulling the typical outcome down as a function of volatility but it’s not changing the overall expectancy.

#### A Pause To Gather Ourselves

• We now understand that compounded returns are positively skewed.
• We now understand that logreturns are just compounded returns taken continuously as opposed to annually.
• This continuous, logreturn world is the basis of option math.

### Black-Scholes

The lognormal distribution underpins the Black-Scholes model used for pricing options.

The mean of a lognormal distribution is the geometric mean. By now we understand that the geometric mean is always lower than the arithmetic mean. So in compounded world we understand that most likely outcome is lower than the arithmetic mean.

Geometric mean  = arithmetic mean – .5 * volatility²

The question we worked on is not continuous compounding but if it were, the geometric mean = 10% – .5 * (.20)² = 8%. Just knowing this was enough to know that most likely B would not outperform A even though they have the same average expectancy.

Let’s revisit the original question, but now we will assume continuous compounding instead of annual compounding. The beauty of this is we can now use Black Scholes to solve it!

#### Re-framing The Poll As An Options Question

We now switch compounding frequency from annual to continuous so we are officially in Black-Scholes lognormal world.

Expected return (arithmetic mean)

• Annual compounding: $100 * (1.1)¹⁰ =$259.37
• Continuous compounding (B-S world): 100*e^(.10 * 10) = $271.83 Median return (geometric mean) • Annual compounding:$100 x 1.0817¹⁰ = $219.24 • Continuous compounding (B-S world):$100 * e^(.10 – .5 * .2²) = $222.55 • remember Geometric mean = arithmetic mean – .5 * volatility² • geometric mean < arithmetic mean of course The original question: What’s the probability that stock B with its 10% annual return and 20% volatility outperforms stock A with its 10% annual return and no volatility in 10 years? Asking the question in options language: What is the probability that a 10-year call option on stock B with a strike price of$271.83 expires in-the-money?

If you have heard that “delta” is the probability of “expiring in-the-money” then you think we are done. We have all the variables we need to use a Black-Scholes calculator which will spit out a delta. The problem is delta is only approximately the probability of expiring in-the-money. In cases with lots of time to expiry, like this one where the horizon is 10 years, they diverge dramatically. 2

We will need to extract the probability from the Black Scholes equation. Rest assured, we already have all the variables.

#### Computing The Probability That Stock “B” Expires Above Stock “A”

If we simplify Black-Scholes to a bumper sticker, it is the probability-discounted stock price beyond a fixed strike price. Under the hood of the equation, there must be some notion of a random variable’s probability distribution. In fact, it’s comfortingly simple. The crux of the computation is just calculating z-scores.

I think of a z-score as the “X” coordinate on a graph where the “Y” coordinate is a probability on a distribution. Refresher pic3:

Conceptually, a z-score is a distance from a distribution’s mean normalized by its standard deviation. In Black-Scholes world, z-scores are a specified logreturn’s distance from the geometric mean normalized by the stock’s volatility. Same idea as the Gaussian z-scores you have seen before.

Conveniently, logreturns are themselves normally distributed allowing us to use the good ol’ NORM.DIST Excel function to turn those z-scores into probabilities and deltas.

In Black Scholes,

• delta is N(d1)
• probability of expiring in-the-money is N(d2)
• d1 and d2 are z-scores

Here are my calcs4:

Boom.

The probability of stock B finishing above stock A (ie the strike or forward price of an a $100 stock continuously compounded at 10% for 10 years) is… 37.6%! This is respectably close to the 37.7% we computed using Pascal’s Triangle. The difference is we used the continuous compounding (lognormal) distribution of returns instead of calculating the return outcomes discretely. #### The Lognormal Distribution Is A Lesson In How Compounding Influences Returns I ran all the same inputs through Black Scholes for strikes up to$750.

• This lets us compute all the straddles and butterflies in Black-Scholes universe (ie what market-makers back in the day called “flat sheets”. That means no additional skew parameters were fit to the model or the model was not fit to the market).
• The flys lets us draw the distribution of prices.

A snippet of the table:

I highlighted a few cells of note:

• The 220 strike has a 50% chance of expiring ITM. That makes sense, it’s the geometric mean or arithmetic median.
• The 270 strike is known as At-The-Forward because it corresponds to the forward price of $271.83 derived from continuously compounding$100 at 10% per year for 10 years (ie Seʳᵗ). If 10% were a risk-free rate this would be treated like the 10 year ATM price in practice. Notice it has a 63% delta. This suprises people new to options but for veterans this is expected (assuming you are running a model without spot-vol correlation).
• You have to go to the $330 strike to find the 50% delta option! If you need to review why see Lessons From The .50 Delta Option. This below summary picture adds one more lesson: The cheapest straddle (and therefore most expensive butterfly) occurs at the modal return, about$150. If the stock increased from $100 to$150, you’re CAGR would be 4.1%. This is the single most likely event despite the fact that it’s below the median AND has a point probability of only 1.7%

#### Speaking of Skew

Vanilla Black-Scholes option theory is a handy framework for understanding the otherwise unintuitive hand of compounding. The lognormal distribution is the distribution that corresponds to continuously compounded returns. However, it is important to recognize that nobody actually believes this distribution describes any individual investment. A biotech stock might be bimodally distributed, contingent on an FDA approval. If you price SPX index options with positively skewed model like this you will not last long.

A positively skewed distribution says “on average I’ll make X because sometimes I’ll make multiples of X but most of the time, my lived experience is I’ll make less than X”.

In reality, the market imputes negative skew on the SPX options market. This shifts the peak to the right, shortens the right tail, and fattens the left tail. That implied skew says “on average I make X, I often make more than X, because occasionally I get annihilated”.

It often puzzles beginning traders that adding “put skew” to a market, which feels like a “negative” sentiment, raises the value of call spreads. But that actually makes sense. A call spread is a simple over/under bet that reduces to the odds of some outcome happening. If the spot price is unchanged, and the puts become more expensive because the left tail is getting fatter, then it means the asset must be more likely to appreciate to counterbalance those 2 conditions. So of course the call spreads must be worth more.

## Final Wrap

Compounding is a topic that gives beginners and even experienced professionals difficulty. By presenting the solution to the question from a discrete binomial angle and a continuous Black-Scholes angle, I hope it soldified or even furthered your appreciation for how compounding works.

My stretch goal was to advance your understanding of option theory. While it overlaps with many of my other option theory posts, if it led to even any small additional insight, I figure it’s worth it. I enjoyed sensing that the question could be solved using options and then proving it out.

I want to thank @10kdiver for the work he puts out consistently and the conversation we had over Twitter DM regarding his question. If you are trying to learn basic and intermediate level financial numeracy his collection of threads is unparalled. Work I aspire to. Check them out here: https://10kdiver.com/twitter-threads/

Remember, my first solution (Pascal’s Triangle) only worked for relatively small N. It was not a general solution. The Black-Scholes solution is a general one but required changing “compounded annually” to “compounded continuously”. 10kdiver provided the general solution, using logs (so also moving into continuous compounding) but did not require discussion of option theory.

I’ll leave you with that: