There’s Gold In Them Thar Tails: Part 2

This is Part 2 of a discussion of how sourcing talent or outcomes in the tails or extremes of a distribution call for our selection criteria to embrace more variance than searches in the heart of a distribution. To catch up please read There’s Gold In Them Thar Tails: Part 1.

If you can’t be bothered here’s the gist:

  1. We saw that an explosion of choice whether it’s a job or college applicants, songs to listen to, athletes to recruit has made selection increasingly difficult.
  2. A natural response is to narrow the field by filtering more narrowly. We can do this by making selection criteria stricter or deploying smarter algorithms and recommendation engines.
  3. This leads to increased reliance on legible measurements for filtering.
  4. Goodhart’s law expects that the measures themselves will become the target, increasing the pressure on candidates to optimize for narrow targets that are imperfect proxies or predictors of what the measure was filtering for.
  5. Anytime we filter, we face a trade-off between signal (“My criteria is finding great candidates”) and diversity. This is also known as the bias-variance trade-off.
  6. Diversity is an essential input to progress. Nature’s underlying algorithm of evolution penalizes in-breeding.
  7. In addition to a loss of diversity, signal decays as you get closer to the extremes. This is known as tail divergence. The signal can even flip (ie Berkson’s Paradox).
  8. The point where the signal noise overwhelms the variance in the candidates is an efficient cutoff. Beyond that threshold, selectors should think more creatively than “just raise the bar”.

At the end of part 1, there were strategies for both the selector and the selectees to increase diversity to improve outcomes in the extremes.

If narrower filters are less effective in the tails (ie more noise, weaker correlations between criteria and match quality), we should be intentional about the randomness we introduce to the process. A 1500 SAT is a noisy predictor of “largest alumni donor 20 years from now”. Instead, accept the 1350 SAT from the homeschooled kid in Argentina. Experiment with criteria and let chance retroactively hint at divergent indicators that you would never have thought to test. One of the benefits of such an experiment is that if you are methodical about how you introduce chance you can study the results for a hidden edge. If nobody else has internalized this thinking because they think it’s too risky (it’s not…the signal of the tighter filter had already degraded), then you have an opportunity to leap ahead of your competitors who underestimate the optionality in trying many recipes and keeping the ones that taste good. You tolerate some mayonnaise liver sandwiches before you discover pb&j.

In part 2, we reflect on what tail divergence says about life and investing.


Where Instincts Fail

Tail divergence is the simple observation that attributes that correlate with certain outcomes lose their predictive ability as we get into the extremes. If you are 6’7, you’re better at basketball than most of the population. But you couldn’t step foot on the hardwood with the lowly Rocket’s 12th man. Taken further, Berkson’s Paradox shows that it’s possible for the correlation to flip. LessWrong thinks the flippening may be causal because of too much of a good thing:

Maybe being taller at basketball is good up to a point, but being really tall leads to greater costs in terms of things like agility… Maybe a high IQ is good for earning money, but a stratospherically high IQ has an increased risk of productivity-reducing mental illness. Or something along those lines.

The safest generalization to absorb:

When speculating about the tails of a distribution your intuition is less reliable. 

If you can pinpoint causality, that’s a bonus. Simply realizing your guesses about extremes is random is an advantage. It splits your brain wide open to get your imagination oxygen. 

Behavioral psychology recognizes the usefulness of heuristics to make judgements while highlighting how “biases” such as framing can short-circuit our “System 1” machinery. Intuition is a useful guide when we have deep experience in a domain, but we should seek external data (base rates) or guidance when we stray from the mundane.

If our intellectual adventures take us from “mediocrastian” to “extremistan” then data is not necessarily a helpful tour guide. It can even be harmful if it encourages a false sense of security or a load-bearing assumption that turns out to be hollow 1

A recent example of intuition failing in an extreme scenario still stings. When Covid first started spreading in the US, asset prices and city rents dove lower. Financial markets stabilized and began recovering when the government commit to replacing lost demand with an unprecedented fiscal package for an unprecedented event. My suburban house shot up 15% in value as locked-down city dwellers wanted more space. Seeing the divergence between home price and rentals, I quickly diagnosed the home price bump as a premium needed to absorb a sudden, but transitory urban exodus until we could get a vaccine. While it wasn’t the main consideration for selling the “trade setup” was not lost on me. My intuition in this extreme scenario couldn’t have fathomed that the price would shoot 20% more (and still going, ughh) through where I sold as the lockdowns lifted. My trading intuition degrades less gracefully than I’d like to admit as the orbits get further from financial options. 

Moral Intuition

As technology and science fiction converge, it would be dangerous to lazily extrapolate how we handle routine computer-enabled behavior to edge cases. If you have ever played dark forms of “would you rather?” then you are already familiar with the so-called trolley problem:


credit: abpradio.com

The Conversation explains the so-called trolley problem in the context of self-driving cars:

The car approaches a traffic light, but suddenly the brakes fail and the computer has to make a split-second decision. It can swerve into a nearby pole and kill the passenger, or keep going and kill the pedestrian ahead.

This is spiky terrain. What is the value of a life? This is not a novel dilemma. In Tails Explained, I show how courts use probabilities of accidental (ie rare) deaths to estimate tort damages. What is novel is the scale of these considerations once robots take the wheel. The giant fields of AI safety and ethics are proof that scaling up tort law is not going to cut it. We are forced to explicitly study realms that ancient moralities only needed to consider rhetorically. 

In Spot The Outlier,  Rohit writes:

the systems we’d developed to intuit our way through our lives have difficulty with contrived examples of various trolley problems, but that’s mainly because our intuitions work in the 80% of cases where the world is similar to what we’ve seen before, and if the thought experiment is wildly different (e.g., Nozick’s pleasure machine) our intuitions are no longer a reliable guide.

In The Tails Coming Apart As A Metaphor For Life, Slatestarcodex says:

This is why I feel like figuring out a morality that can survive transhuman scenarios is harder than just finding the Real Moral System That We Actually Use. There’s a potentially impossible conceptual problem here, of figuring out what to do with the fact that any moral rule followed to infinity will diverge from large parts of what we mean by morality.

A wave of exponential automation threatens to capsize our moral rafts. Slatestar invokes one of my favorite paragraphs2 of all-time to make his point. 

When Lovecraft wrote that “we live on a placid island of ignorance in the midst of black seas of infinity, and it was not meant that we should voyage far”, I interpret him as talking about the region from Balboa Park to West Oakland on the map above [This is a metaphor for moral territory he builds in the full post].

Go outside of it and your concepts break down and you don’t know what to do.

The full opening paragraph of Call Of Chtulu deserves your eyes:

The most merciful thing in the world, I think, is the inability of the human mind to correlate all its contents. We live on a placid island of ignorance in the midst of black seas of infinity, and it was not meant that we should voyage far. The sciences, each straining in its own direction, have hitherto harmed us little; but some day the piecing together of dissociated knowledge will open up such terrifying vistas of reality, and of our frightful position therein, that we shall either go mad from the revelation or flee from the deadly light into the peace and safety of a new dark age.

Slatestar edits Lovecraft:

The most merciful thing in the world is how so far we have managed to stay in the area where the human mind can correlate its contents.

This is not an optimistic outlook for our ability to reconcile our based local morality with a species-level perspective. Reasoning about extremes is more futile than we’d like to think. As we  search for outliers, we need humility. 

Even The Math Prescribes Humility

Let’s translate tail divergence to math terms. We discussed how SAT has predictive power of GPA. The issue is that this power loses efficacy as we get to the top-tier of GPAs, just as being tall starts to tell us less about the best basketball players once we are dealing with the sample that has made it to the NBA. 

This loss of signal manifests as a correlation breakdown over some range of the X or explanatory variable. This is the result of the error terms or variance in a regression increasing or decreasing over some range. The fancy word for this is “heteroscedasticity”. 

See this made-up example from 365DataScience:

The variance of the errors visibly changes as we move from small values of X to large values. 

It starts close to the regression line and goes further away. This would imply that, for smaller values of the independent and dependent variables, we would have a better prediction than for bigger values. And as you might have guessed, we really don’t like this uncertainty.

Ordinary least squares (ie OLS) regression is a common technique for computing a correlation. However, equal variance (homoscedasticity) is one of the 5 assumptions embedded in OLS. Tail divergence is evidence that the data set violates this assumption, so we shouldn’t be surprised when the filters we used in the meat of the distributions lose efficacy in the extremes. 

If you broke the regression into 2 separate lines, one for the low to middle range of SAT scores and one for the top decile of SAT scores we could compute different correlations to GPA. If the tails diverge, we would see a lower correlation for the higher range. Correlations even as high as 80% have discouraging amounts of explanatory power. 

For the derivation, see From CAPM To Hedging.

We shouldn’t be surprised when the most successful person from your 8th grade class, wasn’t a candidate for the “most likely to succeed” ribbon. The qualities that informed that vote leave a lot of “risk remaining” when trying to predict the top performers in the wide-open game of life. 

Since the nature of extremes are untamed, we need humility. This is true, but abstract. What does “humility” mean practically? It means making decisions that are robust to the lack of determinism in the tails. In fact, we can construct approaches that actively seek to harness the variance in the tails. 

The world of trading and investing is a perfect sandbox to explore such approaches.

Take Advantage of Poor Tail Intuition In Investing

I know the heading is ironic. 

Let’s see if we can use “option-like” approaches to use the divergence or uncertainty in the tails to our advantage. 

Respect Path

Rohit summarized the argument succinctly:

If measurement is too strict, we lose out on variance. 
If we lose out on variance, we miss out on what actually impacts outcomes.

Tails are unpredictable by the same models that might be well-suited for routine scenarios. In fact, rare outcomes can be stubbornly resistant to description by any models in a complex system. The robust response to this situation is not to lean into our models but to relax the filters in favor of diversity, which increases our chance of capturing an outcome nobody has foreseen, because, by definition, nobody’s model could have predicted (and therefore bid it up) in the first place. 

How do you do that?

2 words: Respect. Path. 

Recall from part 1, that David Epstein’s research-based suggestion:

One practice we’ve often come back to: not forcing selection earlier than necessary. People develop at different speeds, so keep the participation funnel wide, with as many access points as possible, for as long as possible. I think that’s a pretty good principle in general, not just for sports.

What does this mean in a trading context?

This is easy to explain by its opposite. Let’s rewind a decade. Jon Corzine managed to blow up MF Global by focusing on the belief that European bonds (remember the Greek bond crisis?) would pay out in the end and placing that bet with extreme leverage. While the bonds eventually paid out, the margin calls buried MF Global. This is a common story. I chose it because it exemplifies how a lack of humility is the murder weapon. 

The moment you employ leverage, you are worshiping at the altar of path. Corzine refused to make the appropriate sacrifices to the gods. He focused on the terminal value of the bonds. A focus so myopic, Corzine still stubbornly clings to the idea that he was right. [I once went to dinner with an option trader who worked closely with Corzine. He described him as both smart and unfazed in his path-blindness. I’d like to take issue with “smart” but he’s the one giving a fortune away, so I’ll just shut up.] 

He might be rich, but if you were a stakeholder or client in MF Global, he’s a villain. Let’s not be like Jon Corzine. 

Ways To Respect Path

Treat leverage with respect

The most common forms of financial leverage we employ are mortgages. The primary path risk here is needing to re-locate suddenly and potentially needing to sell at a bad time. If there are many potential forks on your horizon, the liquidity in renting can be worth it3.

 

“Rebalance timing luck”

This is a term coined by Corey Hoffstein in his paper The Dumb (Timing) Luck of Smart Beta. First of all, this topic is central to any analysis of performance. You can have 10 different trend-following strategies with the same approximate rules but if they vary in their execution by a single day, the impact of luck can be tyrannical. Imagine one strategy was long oil the day it went negative, another strategy got out of the position one day earlier. Is the difference in performance predictive? It’s a bedeviling issue for allocators trying to parse historical returns. 

If timing is not part of your alpha, then leaving it to chance can swamp the edge you worked so hard to find, capture, and market to investors. This is a recipe for disappointment for either the manager (who gets unlucky) or the investor who chose the fund from a crop of competitors based on noise. 

Respecting path means smoothing the effect of rebalance timing luck. This is commonly done by dividing a single strategy into multiple strategies differing only by their rebalance schedule. The ensemble will average the luck across executions, hopefully adhering the results closer to its intended expression. 

Path vs terminal value thinking

Corzine had a terminal value opinion (“if I hold these bonds to maturity I’ll get paid”). Still, any trade that is marked-to-market must still weather path. Leverage makes the trade acutely fragile with respect to path. Even if his bet was a good one at the time, the expression was negligent because it did not properly reflect his constraints. 

It’s critical that the expression of a bet clings closely to its thesis. If you want to bet on the final outcome of a trade, you need to insulate the expression from path. Similarly, you can bet on path while being indifferent to the final outcome. For example, a momentum investor may devise a rule-based strategy to levitate with an inflating bubble but exit before holding the bag. These participants bet on path not terminal value. The past few years have glorified such a game of hot potato. 

Whether this game of hot potato is really a game of Russian roullete depends on the expression. Many momentum strategies use stops or trailing stops to escape a trade where the trend has petered out or reversed. This expression mimics a long option position. They are creating unbounded upside and limiting their downside. This expression is banking on a dangerous assumption: liquidity. They are constructing a “soft” option presumably because they think it’s cheaper than purchasing a financial or what I call a “hard” or contractual option.

Let’s ignore realized volatility which is a first order determinant of whether the option is cheaper. The biggest problem is gap risk. Soft-option constructions assume continuity. But we know technology breaks, markets close, stocks get halted, countries invade each other, exchanges cancel trades. Pricing gap risk is impossible. That’s why derivative traders say the only hedge for an option is a similar option. Trading strategies are said to be robust to model risk if they contain offsetting exposures to the same model. If you’re short a call option on TSLA the only real hedge  is to be long a different TSLA call. Reliance on the mathematical model cancels out. 

Zooming in on options (feel free to skip and jump down to Investing for Path)

Some market participants focus on terminal value or the “long run” while others are focused on path. The price of options are consensus mechanisms that balance both views. I discussed this in What The Widowmaker Can Teach Us About Trade Prospecting And Fool’s Gold:

The nat gas market is very smart. The options are priced in such a way that the path is highly respected. The OTM calls are jacked, because if we see H gas trade $10, the straddle will go nuclear.

Why? Because it has to balance 2 opposing forces.

        1. It’s not clear how high the price can go in a true squeeze or shortage
        2. The MOST likely scenario is the price collapses back to $3 or $4.
Let me repeat how gnarly this is.
 
The price has an unbounded upside, but it will most likely end up in the $3-$4 range.
 
Try to think of a strategy to trade that.
 
Good luck.
        • Wanna trade verticals? You will find they all point right back to the $3 to $4 range.
        • Upside butterflies which are the spread of call spreads (that’s not a typo…that’s what a fly is…a spread of spreads. Prove it to yourself with a pencil and paper) are zeros.
The market places very little probability density at high prices but this is very jarring to people who see the jacked call premiums.
 
That’s not an opportunity. It’s a sucker bet.

Investors with different time horizons often trade with each other. It’s even possible they have the same long-term views but Investor A thinks X is overbought in the near-term and sells to Investor B who just wants to buy-and-hold. Investor A is hoping to buy X back cheaper. They are trying to time the market and generate trading P/L, expecting to find a more attractive entry to X later. Perhaps A is a trader more than an investor. A is obsessively conscious of near-term opportunity costs or hurdle rates. As an options trader, I am generally more focused on path than terminal value. 

Let’s see how trade expression varies with your lens of terminal value vs path. 

Static Expressions

A static trade expression means you put your trade on and leave it alone until some pre-defined catalyst. For options this is typically expiration. The reason you might do this is you are aware that you cannot predict the path but do not want to be shaken out of the position because you like the odds the market is offering on the terminal value of a proposition. To use natural gas, suppose the gas futures surge to $6 amidst a polar vortex but you think there is a 25% chance the price falls to $4.50 by expiration.

Suppose you can buy a vertical spread that pays 4-1 on that proposition. The bet is positive expectancy so you decide to take it. This is a discrete bet. The worst-case scenario is losing your premium. You can size the trade by feel (I’m willing to risk 1% to make 4%) or some version of Kelly sizing. Instead of trading towards a target amount of risk (whether that’s delta, vega, etc) you budget a fixed dollar amount towards it and let it ride. I refer to this type of bet as “risk-budgeting”.

When “risk-budgeting” a trade you specify a fixed bet size and you do not use leverage or pseudo-leverage (for example taking a short option position which demands margin). The point is to set-it-and-forget-it. 

These types of trades were a small minority of my allocations, but they are the easiest to manage. By design, you are not getting cute with the expression, because you expect the path to your possible outcome to be hairy. This is a self-aware strategy for respecting path.

Dynamic Expressions

Most of my trades were actively managed.  Running a large options portfolio means lots of churn as you whack-a-mole opportunities. You find more attractive positions to warehouse than what’s currently on the books, or perhaps you are adding to get to a more full-size position.

The key is most of the focus is on path not terminal value. Sometimes I’m buying vol because I have a view on volatility, but often I’m buying vol if I think there are going to be more vol buyers. The first kind of buying is a hybrid of path and terminal value thinking, but the second type of vol buying has a momentum mindset. My view on realized vol takes a backseat to my view on flows if I think the option demand will exceed supply at current levels of implied volatility. 

Other dynamic trade expressions:

    1. Implied sentiment

      Another path-aware expression is to bet on the expectations embedded in prices. I might load up on oil calls not because I think oil is going to $200, but because I think the awareness that such a price is possible can emerge due to some catalyst (“saber-rattling”). I’m thinking in terms of path not terminal value when my thesis is “sentiment can go from apathy to fear”. I’m betting on a change in the Overton Window. The change in sentiment can increase call option implied vols and even the futures. But the option trade expression is a purer play than the futures.

      [The number of ways an oil future can rise is greater than the number of drivers to push oil call skew higher, so the call options isolate the thesis better by being directly levered to it. Agustin Lebron’s 3rd Law Of Trading: Only take the risks you are paid to take.]

    2. Owning the wing

      Tail options are on average “expensive” in actuarial terms. But there are several reasons why I do not short them. 

      1. “Average” is hiding a lot of detail. The excess premium in those options can be proportionally small to what those options can be worth conditional on stressed states of the world. Buying them when they are relatively cheap to their own elevated premiums can be worthwhile, especially if those options put you in the driver’s seat when the world starts melting down. If you are the only one with bullets in a warzone, there’s a good chance you have them because the terminal-value-Jon-Corzine crowd underestimated path. Then you can sell the options “closing” at truly outlandish prices. I want the tails because I don’t want to be running a trading business with a prime broker’s trapdoor beneath me. 

      2. I’m not smart enough to know when to sell tail options opening. I buy them when they are relatively cheap (which usually still means expensive to Corzine brains) and I sell them closing when they go nuclear. Like when you throw some insane offer out there and it gets taken. As a rule you don’t want to sell wings to someone who spent more than a few moments thinking about it or used a spreadsheet or model or calculator or star chart. You sell them to people who are forced to buy them. When Goldman blows their customer out they don’t haggle. 

        In practice, ratio put spreads look attractive to terminal value people who like to “buy the one and sell the two” because their breakeven is so “far” out-of-the-money and they get to win on medium drawdowns. I often like to sell the 1 and buy the 2 because conditional on the 1×2 “getting there”, the 2 are going to be untouchable. 

        [The buyer of the one in a 1×2 is happiest in the grinding trend scenario where strike vols underperform the skew.]

      3. In The “No Easy Trade” Principle I explain how implied market parameters do not vary as widely as realized parameters because markets are discounting machines4.

        Markets bet on mean reversion. Vols often underreact when they are rising (or falling) as the regime changes. These turns can be great path trades. They are momentum opportunities to lift or hit slower participants who are anchored to the prior regime. These opportunities are very profitable since you are not only putting the bet on the right way, but you are able to get liquidity from stale actors. (The trouble with many opportunities is getting liquidity — if you know something is going up but everyone else does too, your signal is valid but insufficiently differentiated. Turning every measly 5 lot offer into a new bid makes the market more efficient without extracting a reward for it. In fact, if you do that, you don’t understand expectancy or the principle of maximization. Your job isn’t to correct incorrect markets. It’s to make money. The overlap is imperfect.) The challenge is you somehow need to not be anchored yourself 5.

      4. Humility is recognizing that the craziest event has yet to happen. Market shocks are a feature. They look different every time because we prepare for the last war. The instruments that measure our vitals become the targets themselves. Tail options provide volatility convexity, or exposure to “vol of vol”. You don’t need to know the nature of the next shock to know that you will have wanted vol convexity. See Finding Vol Convexity. 

Combining Expressions

I’ll mention this for completeness but it’s a topic I should probably do a video for. It’s not complicated but it’s a bit technical for a post like this. When running an options book, it’s possible to treat some of the positions dynamically and some of them statically. In practice, I “remove” line items that have well-defined risks from of my position at the most recent mark-to-market value so that I do not incorporate their Greeks into my book. I don’t hedge it with the rest of the pile.

For example, if I notice an out-of-the-money put spread on my books, instead of dynamically managing a position that was short a tail, I’d put the spread in another account and sell the corresponding delta hedge associated with it. Going forward it would not generate any Greeks in my main risk view so there’s no need to hedge (remember hedging is a cost). The risk is sequestered to the premium. Let’s say it’s $75,000 worth of put spreads. The expectancy of the spread is presumably zero, so it’s like having a simple over/under bet on the books. If expiration goes my way I get to make a multiple of that, but I know the worst (and most likely) case is losing $75k which given the size of the book is noise. If my capital swamps the risk, there’s no point in hedging it especially since it’s short a tail that’s sensitive to vol of vol.

 

Investing for path

VCs

Venture capital is a strategy that is robust to path. The fact that the portfolio marks are fairy dust helps, but in this context is not important. Why is venture a strategy that exploits divergence in the tails?

Because from its construction, it admits it doesn’t know much. If you believe you are sampling from start-ups that have a power-law distribution (admittedly a big “if”), then the correct strategy is indeed to “spray and pray”6.

Byrne Hobart piggybacks Jerry Neumann in his explanation:

One of my favorite blog posts on venture returns is Jerry Neumann’s power laws in venture. His key point is that if venture returns follow a power-law distribution, average returns rise indefinitely as you get a bigger sample set. There is no well-defined mean! If you measure adult height, you quickly converge on 5’9” for American men and 5’4” for American women. You will find outliers, but they’re equally common at both ends of the distribution. But if you measure startup investing returns, you’ll keep getting tripped up: flop, failure, failure, flop, Google, fad, fraud, freaky scandal, Facebook…

Does this imply that the ideal strategy for venture is to invest in as many companies as possible? If you’re sampling from a power-law distribution, that’s what you should do. 

Lux Capital partner Josh Wolfe’s approach epitomizes the spirit of searching for gold in the tails. On Invest Like The Best, he explained his investing beliefs:

  • Confident that curiosity, following leads, and relentlessness will lead you to the next idea.
  • Confident you won’t know when or how you happen upon the idea.
  • Confident that the idea lies in the edges of companies that are doing innovative things, often from first principles or science, and very few people are looking there.

These principles propagate from a commitment to benefitting from optionality and positive convexity of non-linear relationships. 

The key line follows:

When analyzing how they found deals it only made linear, narrative sense after the fact.

This is reinforced in On Contrarianism, where I quote Wolfe as well as Marc Andreesen and trader Agustin Lebron on why the best investments start out controversial. The gist is that an idea must be so radical and far-fetched that it doesn’t get bid up while also being possible. The intersection of great ideas after-the-fact that sound dumb before-the-fact is nearly invisible. Most ideas people think are dumb, are indeed, dumb. Venture understands this and systematically wraps a sound process around a low hit rate. 

“Gorilla” Investing

Gorilla investing is another strategy designed to look like a long option. The gist of it is to invest an equal amount in a list of candidates that are competing for a giant market. As the winners start pulling away, you shed the losers and reallocate the proceeds back into the winners. 

Since it rebalances away from losers into winners, it explicitly bets against mean reversion. It’s a divergent strategy that growth investors employ in winner-take-all sectors7.

The strategy requires extensive judgment, but I highlight it as another example of an investing algorithm with roots in epistemic humility. If you want to learn more about this strategy see the notes for Gorilla Game or pick up the book. 

Conclusion

Like venture or Rohit’s advice on recruiting, gorilla investing casts a wide net from a sufficiently narrowed field and lets attrition decide where to allocate more. In Where Does Convexity Come From? I explain that that the essence of convexity is a non-linear p/l resulting from a change in your position size in the same direction as the return of your position. Your exposure to a winning trade grows the more it wins. 

Byrne writes:

Since venture success is defined by dealflow, i.e. by whether or not you have a chance to invest in the hottest companies, the main function of the Series A investment is to get a chance to invest in Series B and Series C and so on. Arguably, the better the fund, the more of its real value today consists of pro-rata rights rather than the investments themselves.

That’s a general case of positive convexity: the better the situation, the higher your exposure.

This is the essence of capturing the upside when our signals struggle to parse winners from an exclusive field. If we cannot predict what will happen in the tails, the next best thing is the ability to increase our exposure to momentum when it’s going our way. This begins with humility and funneling wider than our instincts suggest. From that point, we let actual performance provide us with incremental information on what works and what doesn’t.

Contrast this with a model that takes itself more seriously than tail correlations warrant. The model is filtering prematurely. We don’t look for tomorrow’s star athletes amongst the best 8-year-olds because we know puberty is a reshuffling machine.  

Keep in mind:

  • Correlations break down or invert in the extreme
  • Make your selections robust to path or possibly taking advantage of it. 
  • Systematize finding gold in diversity. There’s a decent chance others won’t be looking there. 

Happy prospecting!


 

From CAPM To Hedging

Let’s start with a question from Twitter:

This is a provocative question. Patrick was clever to disallow Berkshire. In this post, we are going to use this question to launch into the basics of regression, correlation, beta hedging and risk.

Let’s begin.

My Reaction To The Question

I don’t know anything about picking stocks. I do know about the nature of stocks which makes this question scary. Why?

  1. Stocks don’t last forever

    Many stocks go to zero. The distribution of many stocks is positively skewed which means there’s a small chance of them going to the moon and reasonable chance that they go belly-up. The price of a stock reflects its mathematical expectation. Since the downside is bounded by zero and the upside is infinite, for the expectation to balance the probability of the stock going down can be much higher than our flawed memories would guess. Stock indices automatically rebalance, shedding companies that lose relevance and value. So the idea that stocks up over time is really stock indices go up over a time, even though individual stocks have a nasty habit of going to zero. For more see Is There Actually An Equity Premium Puzzle?.

  2. Diversification is the only free lunch

    The first point hinted at my concern with the question. I want to be diversified. Markets do not pay you for non-systematic risk. In other words, you do not get paid for risks that you can hedge. All but the most fundamental risks can be hedged with diversification. See Why You Don’t Get Paid For Diversifiable Risks. To understand how diversifiable risks get arbed out of the market ask yourself who the most efficient holder of a particular idiosyncratic risk is? If it’s not you, then you are being outbid by someone else, or you’re holding the risk at a price that doesn’t make sense given your portfolio choices. Read You Don’t See The Whole Picture to see why.

My concerns reveal why Berkshire would be an obvious choice. Patrick ruled it out to make the question much harder. Berkshire is a giant conglomerate. Many would have chosen it because it’s run by masterful investors Warren Buffet and Charlie Munger. But I would have chosen it because it’s diversified. It is one of the closest companies I could find to an equity index. Many people look at the question and think about where their return is going to be highest. I have no edge in that game. Instead, I want to minimize my risk by diversifying and accepting the market’s compensation for accepting broad equity exposure.

In a sense, this question reminds me of an interview question I’ve heard.

You are gifted $1,000,000 dollars. You must put it all in play on a roulette wheel. What do you do?

The roulette wheel has negative edge no matter what you do. Your betting strategy can only alter the distribution. You can be crazy and bet it all on one number. Your expectancy is negative but the payoff is positively skewed…you probably lose your money but have a tiny chance at becoming super-rich. You can try to play it safe by risking your money on most of the numbers, but that is still negative expectancy. The skew flips to negative. You probably win, but there’s a small chance of losing most of your gifted cash.

I would choose what’s known as a minimax strategy which seeks to minimize the maximum loss. I would spread my money evenly on all the numbers, accept a sure loss of 5.26%.1 The minimax response to Patrick’s question is to find the stock that is the most internally diversified.

Berkshire Vs The Market

I don’t have an answer to Patrick’s question. Feel free to explore the speculative responses in the thread. Instead, I want to dive further into my gut reaction that Berkshire would be a reasonable proxy to the market. If we look at the mean of its annual returns from 1965 to 2001, the numbers are gaudy. Its CAGR was 26.6% vs the SP500 at 11%. Different era. Finding opportunities at the scale Buffet needs to move the needle has been much harder in the past 2 decades. 

Buffet has been human for the past 20 years. This is a safer assumption than the hero stats he was putting up in the last half of the 20th century. 

The mean arithmetic returns and standard deviations validate my hunch that Berkshire’s size and diversification 2 make it behave like the whole market in a single stock. 

Let’s add a scatterplot with a regression. 

If you tried to anticipate Berkshire’s return, your best guess might be its past 20 year return, distributed similarly to its prior volatility. Another approach would be to see this relationship to the SP500 and notice that a portion of its return can simply be explained by the market. It clearly has a positive correlation to the SP500. But just how much of the relationship is explained by SP500? This is a large question with practical applications. Specifically, it underpins how market netural traders think about hedges. If I hedge an exposure to Y with X how much risk do I have remaining? To answer this question we will go on a little learning journey:

  1. Deriving sensitivities from regressions in general
  2. Interpreting the regression
  3. CAPM: Applying regression to compute the “risk remaining of a hedge”

On this journey you can expect to learn the difference between beta and correlation, build intuition for how regressions work, and see how market exposures are hedged. 

Unpacking The Berkshire Vs SP500 Regression

A regression is simply a model of how an independent variable influences a dependant variable. Use a regression when you believe there is a causal relationship between 2 variables. Spurious correlations are correlations that will appear to be causal because they can be tight. The regression math may even suggest that’s the case. I’m sorry. Math is a just a tool. It requires judgement. The sheer number of measurable quanitites in the world guarantees an infinite list of correlations that serve as humor not insight3.

The SP500 is steered by the corporate earnings of the largest public companies (and in the long-run the Main Street economy4) discounted by some risk-aware consensus. Berkshire is big and broad enough to inherit the same drivers. We accept that Berkshire’s returns are partly driven by the market and partly due to its own idiosyncracies.

Satisfied that some of Berkshire’s returns are attributable to the broader market, we can use regression to understand the relationship. In the figure above, I had Excel simply draw a line that best fit the scatterplot with SP500 being the independent variable, or X, and Berkshire returns being the dependant or Y. The best fit line (there are many kinds of regression but we are using a simple linear regression) is defined the same way in line is: by a slope and an intercept. 

The regression equation should remind you of the generic form of a line y = mx + b where m is the slope and b is the intercept. 

In a regression:

y=α+βx

where:

y = dependant variable (Berkshire returns)

x = independent variable (SP500 returns)

α = the intercept (a constant)

β = the slope or sensitivity of the Y variable based on the X variable

If you right-click on a scatterplot in Excel you can choose “Add Trendline”. It will open the below menu where you can set the fitted line to be linear and also check a box to “Display Equation on chart”.

This is how I found the slope and intercept for the Berkshire chart:

y = .6814x + .0307

Suppose the market returns 2%:

Predicted Berkshire return = .6814 * 2% + 3.07%

Predicted Berkshire return = 4.43%

So based on actual data, we built a simple model of Berkshire’s returns as a function of the market. 

It’s worth slowing down to understand how this line is being created. Conceptually it is the line that minimizes the squared errors between itself and the actual data. Since each point has 2 coordinates, we are dealing with the variance of a joint distribution. We use covariance instead of variance but the concepts are analogous. With variance we square the deviations from a mean. For covariance, we multiply the distance of each X and Y in a coordinate from their respective means: (xᵢ – x̄)(yᵢ -ȳ)

Armed with that idea, we can compute the regression line by hand with the following formulas:

β or slope = covar(x,y)/ var(x)

α or intercept = ȳ – β̄x̄

We will look at the full table of this computation later to verify Excel’s regression line. Before we do that, let’s make sure that this model is even helpful. One standard we could use to determine  if the model is useful is if it performs better than the cheapest naive model that says:

Our predicted Berkshire return simply is mean return from sample.

This green arrows in this picture represent the error between this simple model and the actual returns. 

This naive model of summing the squared differences from the mean of Berkshire’s returns is exactly the same as variance. You are computing squared differences from a mean. If you take square root of the average of the squared differences you get a standard deviation. In, this simple model where our prediction is simply the mean our volatility is 16.5% or the volatility of Berkshire’s returns for 20 years. 

In the regression context, the total variance of the dependent variable from its mean is knows as the Total Sum of Squares or TSS

The point of using regression though is we can make a better prediction of Berkshire’s returns if we know the SP500’s returns. So we can compare the mean to the fitted line instead of the actual returns. The sum of those squared differences is known as the Regression Sum Of Squares or RSS. This is the sum of squared deviations between the mean and fitted predictions instead of the actual returns. If there is tremendous overlap between the RSS and TSS, than we think much of the variance in X explains the variance of Y.

The last quantity we can look at is the Error Sum of Squares or ESS. These are the deviations from the actual data to the predicted values represented by our fitted line. This represents the unexplained portion of Y’s variance. 

 

Let’s use 2008’s giant negative return to show how TSS, RSS, and ESS relate.

 

The visual shows:

TSS = RSS + ESS

We can compute the sum of these squared deviations simply from their definitions:

TSS (aka variance) Σ(actual-mean)²
ESS (sum of errors squared) Σ(actual-predicted)²
RSS (aka TSS – ESS) Σ(predicted-mean)²

The only other quantities we need are variances and covariances to compute β or slope of the regression line. 

In the table below:

ŷ = the predicted value of Berkshire’s return aka “y-hat”

x̄ = mean SP500 return aka “x-bar”

ȳ = mean Berkshire return aka “y-bar”

 

 

  β = .40 / .59 = .6814

  α = ȳ – β̄x̄ = 10.6% – .6814 * 11.1% = 3.07%

This yields the same regression equation Excel spit out:

y=α+βx

ŷ = 3.07% + .6814x

R-Squared

We walked through this slowly as a learning exercise, but the payoff is appreciating the R². Excel computed it as 52%. But we did everything we need to compute it by hand. Go back to our different sum of squares.

TSS or variance of Y = .52

ESS or sum of squared difference between actual data and the model = .25

Re-arranging TSS = RSS + ESS we can see that RSS = .27

Which brings us to:

R² = RSS/TSS = .27/.52 = 52% 

Same as Excel!

R² is the regression sum of squares divided by the total variance of Y. It is called the coefficient of determination and can be interpreted as:

The variability in Y explained by X

So based on this small sample, 52% of Berkshire’s variance is explained by the market, as proxied by the SP500. 

Correlation

Correlation, r (or if you prefer Greek, ρ) can be computed in at least 2 ways. It’s the square root of R².

r = √R² = √.52 = .72

We can confirm this by computing correlation by hand according to its own formula:

Substituting:

 

 

 

Looking at the table above we have all the inputs:

r = .40 / sqrt(.59 x .52)

r = .72

Variance is an unintuitive number. By taking the square root of variance, we arrive at a standard deviation which we can actually use.

Similarly, covariance is an intermediate computation lacking intuition. By normalizing it (ie dividing it) by the standard deviations of X and Y we arrive at correlation, a measure that holds meaning to us. It is bounded by -1 and +1. If the correlation is .72 then we can make the following statement:

If x is 1 standard deviation above its mean, I expect y to be .72 standard deviations above its own mean.

It is a normalized measure of how one variable co-varies versus the other. 

How Beta And Correlation Relate

Beta, β, is the slope of the regression equation.

Correlation is the square root of R2 or coefficient of determination.

Beta actually embeds correlation within it.

Look closely at the formulas:

 

 

Watch what happens when we divide β̄ by r.

Whoa.

Beta equals correlation times the ratio of the standard deviations. 

The significance of that insight is about to become clear as we move from our general use of regression to the familiar CAPM regression. From the CAPM formula we can derive the basis of hedge ratios and more!

We have done all the heavy lifting at this point. The reward will be a set of simple, handy formulas that have served me throughout my trading career.

Let’s continue.

From Regression To CAPM 

The famous CAPM pricing equation is a simple linear regression stipulating that the return of an asset is a function of the risk free rate, a beta to the broader market, plus an error term that represents the security’s own idiosyncratic risk. 

Rᵢ = Rբ + β(Rₘ – Rբ) + Eᵢ

where:

Rᵢ = security total return

Rբ = risk-free rate

β = sensitivity of security’s return to the overall market’s excess return (ie the return above the risk-free rate)

Eᵢ = the security’s unique return (aka the error or noise term)

Since the risk-free rate is a constant, let’s scrap it to clean the equation up.

This is the variance equation for this security:

Recall that beta is the vol ratio * correlation:

We can use this to factor the “market variance” term.

Plugging this form of “variance due to the market” back into the variance equation:

This reduces to the prized equation: The “risk remaining” formula which is the proportion of a stock’s volatility due to its own idiosyncratic risk. 

This makes sense. R2 is the amount of variance in a dependant variable attributable to indepedent variable. If we subtract that proportion from 1 we arrive at the “unexplained” or idiosyncratic variance. By taking the square root of that quantity, we are left with unexplained volatility or “risk remaining”. 

Let’s use what we’ve learned in a concrete example.

From CAPM To Hedge Ratios

Let’s return to Berkshire vs the SP500. Suppose we are long $10mm worth of BRK.B and want to hedge our exposure by going short SP500 futures. 

We want to compute:

  1. How many dollars worth of SP500 to get short
  2. The “risk remaining” on the hedged portfolio

How many dollars of SP500 do we need to short?

Before we answer this lets consider a few ways we can hedge with SP500. 

  • Dollar weighting

    We could simply sell $10mm worth of SP500 futures which corresponds to our $10mm long in BRK.B. Since Berkshire and the SP500 are a similar volatility this is a reasonable approach. But suppose we were long TSLA instead of BRK.B. Assuming TSLA was sufficiently correlated to the market (say .70 like BRK.B), the SP500 hedge would be “too light”. 

    Why? 

    Because TSLA is about 3x more volatile than the SP500. If the SP500 fell 1 standard deviation, we expect TSLA to fall .70 standard deviations. Since TSLA’s standard deviations are much larger than the SP500 we would be tragically underhedged. Our TSLA long would lose much more money than our short SP500 position because we are not short enough dollars of SP500. 

  • Vol weighting

    Dollar weighting is clearly naive if there are large differences in volatility between our long and short. Let’s stick with the TSLA example. If TSLA is 3x as volatile as the SP500 then if we are long $10mm TSLA, we need to short $30mm worth of SP500.

    Uh oh. 

    That’s going to be too much. Remember the correlation. It’s only .70. The pure vol weighted hedge only makes sense if the correlations are 1. If the SP500 drops one standard deviation, we expect TSLA to drop only .70 standard deviations, not a full standard deviation. In this case, we will have made too much money on our hedge, but if the market would have rallied 1 standard deviation our oversized short would have been “heavy”. We would lose more money than we gained on our TSLA long. Again, only partially hedged. 

  • Beta weighting

    Alas, we arrive at the goldilocks solution. We use the beta or slope of the linear regression to weight our hedge. Since beta equals correlation * vol ratio we are incorporating both vol and correlation weighting into our hedge! 

    I made up numbers vols and correlations to complete the summary tables below. The key is seeing how much the prescribed hedge ratios can vary depending on how you weight the trades. 


    Beta weighting accounts for both relative volatilies and the correlation between names. Beta has a one-to-many relationship to its construction. A beta of .5 can come from:

    • A .50 correlation but equal vols
    • A .90 correlation but vol ratio of .56
    • A .25 correlation but vol ratio of 2

It’s important to decompose betas because the correlation portion is what determines the “risk remaining” on a hedge. Let’s take a look. 

How much risk remains on our hedges?

We are long $10,000,000 of TSLA

We sell $21,000,000 of SP500 futures as a beta-weighted hedge. 

Risk remaining is the volatility of TSLA that is unexplained by the market.

  • R2 is the amount of variance in the TSLA position explained by the market. 
  • 1-R2 is the amount of variance that remains unexplained
  • The vol remaining is sqrt(1-R2)

Risk (or vol) remaining = sqrt (1-.72) = 51%

TSLA annual volatility is 45% so the risk remaining is 51% * 45% = 22.95%

22.95% of $10,0000 of TSLA = $2,295,000

So if you ran a hedged position, within 1 standard deviation, you still expect $2,295,000 worth of noise!

Remember correlation is symmetrical. The correlation of A to B is the same as the correlation of B to A (you can confirm this by looking at the formula). 

Beta is not symmetrical because it’s correlation * σdependant / σindependent 

Yet risk remaining only depends on correlation. 

So what happens if we flipped the problem and tried to hedge $10,000,000 worth of SP500 with a short TSLA position.

  1. First, this is conceptually a more dangerous idea. Even though the correlation is .70, we are less likely to believe that TSLA’s variance explains the SP500’s variance. Math without judgement will impale you on a spear of overconfidence. 

  2. I’ll work through the example just to be complete. 

    To compute beta we flip the vol ratio from 3 to 1/3 then multiply by the correlation of .7

    Beta of SP500 to TSLA is .333 * .7 = .233

    If we are long $10,000,000 of SP500, we sell $2,333,000 of TSLA. The risk remaining is still 51% but it is applied to the SP500 volatility of 15%. 

    51% x 15% = 7.65% so we expect 7.65% of $10,000,000 or $765,000 of the SP500 position to be unexplained by TSLA. 

  3. I’m re-emphasizing: math without judgement is a recipe for disaster. The formulas are tools, not substitutes for reasoning. 


Changes in Correlation Have Non-Linear Effects On Your Risk

Hedging is tricky. You can see that risk remaining explodes rapidly as correlation falls.

If correlation is as high as .86, you already have 50% risk remaining!

In practice, a market maker may:

  1. group exposures to the most related index (they might have NDX, SPX, and IWM buckets for example)
  2. offset deltas between exposures as they accumulate
  3. and hedge the remaining deltas with futures. 


You might create risk tolerances that stop you from say being long $50mm worth of SPX and short $50mm of NDX leaving you exposed the underlying factors which differentiate these indices. Even though they might be tightly correlated intraday, the correlation change over time and your risk-remaining can begin to swamp your edge. 

The point of hedging is to neutralize the risks you are not paid to take. But hedging is costly. Traders must always balance these trade-offs in the context of their capital, risk tolerances, and changing correlations. 

Review

I walked slowly through topics that are familiar to many investors and traders. I did this because the grout in these ideas often trigger an insight or newfound clarity of something we thought we understood. 

This is a recap of important ideas in this post:

  • Variance is a measure of dispersion for a single distribution. Covariance is a measure of dispersion for a joint distribution.
  • Just as we take the square root of variance to normalize it to something useful (standard deviation, or in a finance context — volatility), we normalize covariance into correlation.
  • Intuition for a positive(negative) correlation: if X is N standard deviations above its mean, Y is r * N standard deviations above(below) its mean. 
  • Beta is r * the vol ratio of Y to X. In a finance context, it allows it allows us to convert a correlation from a standard deviation comparison to a simple elasticity. If beta = 1.5, then if X is up 2%, I expect Y to be up 3%
  • Correlation is symmetrical. Beta is not. 
  • Ris the variance explained by the independent variable. Risk remaining is the volatility that remains unexplained. It is equal to sqrt(1-R2). 
  • There is a surprising amount of risk remaining even if correlations are strong. At a correlation of .86, there is 50% unexplained variance!
  • Don’t compute robotically. Reason > formulas. 

 

Beware.

Least squares linear regression is only one method for fitting a line. It only works for linear relationships. Its application is fraught with pitfalls. It’s important to understand the assumptions in any models you use before they become load-bearing beams in your process. 


References:

The table in this post was entirely inspired by Rahul Pathak’s post Anova For Regression.

For the primer on regression and sum of squares I read these 365 DataScience posts in hte following order:

  1. Getting Familiar with the Central Limit Theorem and the Standard Error

  2. How To Perform A Linear Regression In Python (With Examples!)

  3. The Difference between Correlation and Regression

  4. Sum of Squares Total, Sum of Squares Regression and Sum of Squares Error

  5. Measuring Explanatory Power with the R-squared

  6. Exploring the 5 OLS Assumptions for Linear Regression Analysis
    (I strongly recommend reading this post before diving in on your own. )


 

 

 

There’s Gold In Them Thar Tails: Part 1

If you were accepted to a selective college or job in the 90s, have you ever wondered if you’d get accepted in today’s environment? I wonder myself. It leaves me feeling grateful because I think the younger version of me would not have gotten into Cornell or SIG today. Not that I dwell on this too much. I take Heraclitus at his word that we do not cross the same river twice. Transporting a fixed mental impression of yourself into another era is naive (cc the self-righteous who think they’d be on the right side of history on every topic). Still, my self-deprecation has teeth. When I speak to friends with teens I hear too many stories of sterling resumes bulging with 3.9 GPAs, extracurriculars, and Varsity sport letters, being warned: “don’t bother applying to Cal”.

A close trader friend explained his approach. His daughter is a high achiever. She’s also a prolific writer. Her passion is the type all parents hope their children will be lucky enough to discover. My friend recognizes that the bar is so high to get into a top school that acceptance above that bar is a roulette wheel. With so much randomness lying above a strict filter, he de-escalates the importance of getting into an elite school. “Do what you can, but your life doesn’t depend on the whim of an admissions officer”. She will lean into getting better at what she loves wherever she lands. This approach is not just compassionate but correct. She’s thought ahead, got her umbrella, but she can’t control the weather.

My friend’s insight that acceptance above a high threshold is random is profound. And timely. I had just finished reading Rohit Krishnan’s outstanding post Spot The Outlier, and immediately sent it to my friend.

I chased down several citations in Rohit’s post to improve my understanding of this topic.

In this post, we will tie together:

  1. Why the funnels are getting narrower
  2. The trade-offs in our selection criteria
  3. The nature of the extremes: tail divergence
  4. Strategies for the extremes

We will extend the discussion in a later post with:

  1. What this means for intuition in general
  2. Applications to investing

Why Are The Funnels Getting Narrower?

The answer to this question is simple: abundance.

In college admissions, the number of candidates in aggregate grows with the population. But this isn’t the main driver behind the increased selectivity.  The chart below shows UC acceptance rates plummeting as total applications outstrip admits.

The spread between applicants and admissions has exploded. UCLA received almost 170k applications for the 2021 academic year! Cal receives over 100k applicants for about 10k spots. Your chances of getting in have cratered in the past 20 years. Applications have lapped population growth due to a familiar culprit: connectivity. It is much easier to apply to schools today. The UC system now uses a single boilerplate application for all of its campuses.

This dynamic exists everywhere. You can apply to hundreds of jobs without a postage stamp. Artists, writers, analysts, coders, designers can all contribute their work to the world in a permissionless way with as little as a smartphone. Sifting through it all necessitated the rise of algorithms — the admissions officers of our attention.

Trade-offs in Selection Criteria

There’s a trade-off between signal and variance. What if Spotify employed an extremely narrow recommendation engine indexed soley on artist? If listening to Enter Sandman only lead you to Metallica’s deepest cuts, the engine is failing to aid discovery. If it indexed by “year”, you’d get a lot more variance since it would choose across genres, but headbangers don’t want to listen to Color Me Badd.  This prediction fails to delight the user.

Algorithms are smarter than my cardboard examples but the tension remains. Our solutions to one problem excarbates another. Rohit describes the dilemma:

The solution to the problem of discovery is better selection, which is the second problem. Discovery problems demand you do something different, change your strategy, to fight to be amongst those who get seen.

There’s plenty of low-hanging fruit to find recommendations that reside between Color Me Badd and St. Anger. But once it’s picked, we are still left with a vast universe of possible songs for the recommendation engine to choose from.

Selection problems reinforce the fact that what we can measure and what we want to measure are two different things, and they diverge once you get past the easy quadrant.

In other words, it’s easy enough to rule out B students, but we still need to make tens of thousands of coinflip-like decisions between the remaining A students. Are even stricter exams an effective way narrow an unwieldy number of similar candidates? Since in many cases predictors poorly map to the target, the answer is probably no. Imagine taking it to the extreme and setting the cutoff to the lowest SAT score that would satisfy Cal’s expected enrollment. Say that’s 1400. This feels wrong for good reasons (and this is not even touching the hot stove topic of “fairness”). Our metrics are simply imperfect proxies for who we want to admit. In mathy language we can say, the best person at Y (our target variable) is not likely to come from the best candidates we screened if the screening criteria, X, is an imperfect correlate of success(Y).

The cost of this imperfect correlation is a loss of diversity or variance. Rohit articulates the true goal of selection criteria (emphasis mine):

Since no exam perfectly captures the necessary qualities of the work, you end up over-indexing on some qualities to the detriment of others. For most selection processes the idea isn’t to get those that perfectly fit the criteria as much as a good selection of people from amongst whom a great candidate can emerge.

This is even true in sports. Imagine you have a high NBA draft pick. A great professional must endure 82 games (plus a long playoff season), fame, money, and most importantly, a sustained level of unprecedented competition. Until the pros, they were kids. Big fish in small ponds. If you are selecting for an NBA player with narrow metrics, even beyond the well-understood requisite screens for talent, then those metrics are likely to be a poor guide to how the player will handle such an outlier life. The criteria will become more squishy as you try to parse the right tail of the distribution.

In the heart of the population distribution, the contribution to signal of increasing selectivity is worth the loss of variance. We can safely rule out B students for Cal and D3 basketball players for the NBA.  But as we get closer to elite performers, at what point should our metrics give way to discretion? Rohit provides a hint:

When the correlation between the variable measured and outcome desired isn’t a hundred percent, the point at which the variance starts outweighing the mean error is where dragons lie!

Nature Of The Extremes: Tail Divergence

To appreciate why the signal of our predictive metrics become random at the extreme right tail we start with these intuitive observations via LessWrong:

Extreme outliers of a given predictor are seldom similarly extreme outliers on the outcome it predicts, and vice versa. Although 6’7″ is very tall, it lies within a couple of standard deviations of the median US adult male height – there are many thousands of US men taller than the average NBA player, yet are not in the NBA. Although elite tennis players have very fast serves, if you look at the players serving the fastest serves ever recorded, they aren’t the very best players of their time. It is harder to look at the IQ case due to test ceilings, but again there seems to be some divergence near the top: the very highest earners tendto be very smart, but their intelligence is not in step with their income (their cognitive ability is around +3 to +4 SD above the mean, yet their wealth is much higher than this).

The trend seems to be that even when two factors are correlated, their tails diverge: the fastest servers are good tennis players, but not the very best (and the very best players serve fast, but not the very fastest); the very richest tend to be smart, but not the very smartest (and vice versa). 

The post uses simple scatterplots to demonstrate. Here are 2 self-explanatory charts. 

LessWrong contines: Given a correlation, the envelope of the distribution should form some sort of ellipse, narrower as the correlation goes stronger, and more circular as it gets weaker.

If we zoom into the far corners of the ellipse, we see ‘divergence of the tails’: as the ellipse doesn’t sharpen to a point, there are bulges where the maximum x and y values lie with sub-maximal y and x values respectively:

Say X is SAT score and Y is college GPA. We shoudn’t expect that the person with highest SATs will earn the highest GPA. SAT is an imperfect correlate of GPA. LessWrong’s interpretation is not surprising:

The fact that a correlation is less than 1 implies that other things matter to an outcome of interest. Although being tall matters for being good at basketball, strength, agility, hand-eye-coordination matter as well (to name but a few). The same applies to other outcomes where multiple factors play a role: being smart helps in getting rich, but so does being hard working, being lucky, and so on.

Pushing this even further, if we zoom in on the extreme of a distribution we may find correlations invert! This scatterplot via Brilliant.org shows a positive correlation over the full sample (pink) but a negative correlation for a slice (blue). 

This is known as Berkson’s Paradox and can appear when you measure a correlation over a “restricted range” of a distribution (for example, if we restrict our sample to the best 20 basketball players in the world we might find that height is negatively correlated to skill if the best players were mostly point guards).

[I’ve written about Berkson’s Paradox here. Always be wary of someone trying to show a correlation from a cherry-picked range of a distribution. Once you internalize this you will see it everywhere! I’d be charitable to the perpetrator. I suspect it’s usually careless thinking rather than a nefarious attempt to persuade.]

Strategies For The Extremes

In 1849, assayor Dr. M. F. Stephenson shouted ‘There’s gold in them thar hills’ from the steps of the Lumpkin County Courthouse in a desperate bid to keep the miners in Georgia from heading west to chase riches in California. We know there’s gold in the tails of distributions but our standard filters are unfit to sift for them. 

Let’s pause to take inventory of what we know. 

  1. As the number of candidates or choices increases we demand stricter criteria to keep the field to a manageable size.
  2. At some cutoff, in the extreme of a distribution, selection metrics can lead to random or even misleading predictions. 1

    I’ll add a third point to what we have already established:

  3. Evolution in nature works by applying competitve pressures to a diverse population to stimulate adaptation (a form of learning). Diversity is more than a social buzzword. It’s an essential input to progress. Rohit implicitly acknowledges the dangers of inbreeding when he warns against putting folks through a selection process that reflexively molds them into rule-following perfectionists rather than those who are willing to take risks to create something new.

With these premises in place we can theorize strategies for both the selector and the selectee to improve the match between a system’s desired output (the definition of success depends on the context) and its inputs (the criteria the selector uses to filter). 

Selector Strategies

We can continue to rely on conventional metrics to filter the meat of the distribution for a pool of candidates. As we get into the tails, our adherence and reverance for measures should be put aside in favor of increasing diversity and variance. Remember the output of an overly strict filter in the tail is arbitrary anyway. Instead we can be deliberate about the randomness we let seep into selections to maximize the upside of our optionality. 

Rohit summarizes the philosophy:

Change our thinking from a selection mindset (hire the best 5%) to a curation mindset (give more people a chance, to get to the best 5%).

Practically speaking this means selectors must widen the top of the funnel then…enforce the higher variance strategy of hire-and-train.

Rohit furnishes examples:

  • Tyler Cowen’s strategy of identifying unconventional talent and placing small but influential bets on the candidates. This is easier to say than do but Tony Kulesa finds some hints in Cowen’s template. 
  • The Marine Corps famously funnels wide electing not to focus so much on the incoming qualifications, but rather look at recruiting a large class and banking on attrition to select the right few.
  • Investment banks and consulting firms hire a large group of generically smart associates, and let attrition decide who is best suited to stick around.

David Epstein, author of Range and The Sports Gene, has spent the past decade studying the development of talent in sports and beyond. He echoes these strategies:

One practice we’ve often come back to: not forcing selection earlier than necessary. People develop at different speeds, so keep the participation funnel wide, with as many access points as possible, for as long as possible. I think that’s a pretty good principle in general, not just for sports.

I’ll add 2 meta observations to these strategies:

  1. The silent implication is the upside of matching the right talent to the right role is potentially massive. If you were hiring someone to bag groceries the payoff to finding the fastest bagger on the planet is capped. An efficient checkout process is not the bottleneck to a supermarket’s profits. There’s a predictable ceiling to optimizing it to the microsecond. That’s not the case with roles in the above examples. 

  2. Increasing adoption of these strategies requires thoughtful “accounting” design. High stakes busts, whether they are first round draft picks or 10x engineers, are expensive in time and money for the employer and candidate. If we introduce more of a curation mindset, cast wider nets and hire more employees, we need to understand that the direct costs of doing that should be weighed against the opaque and deferred costs of taking a full-size position in expensive employees from the outset.

    Accrual accounting is an attempt match a business’ economic mechanics to meaningful reports of stocks and flows so we extract insights that lead to better bets. Fully internalized, we must recognize that some amount of churn is expected as “breakage”. Lost option premiums need to be charged against the options that have paid off 100x. If an organization fails to design its incentive and accounting structures in accordance with curation/optionality thinking it will be unable to maintain its discipline to the strategy.  

Selectee Strategies

For the selectee trying to maximise their own potential there are strategies which exploit the divergence in the tails. 

To understand, we first recognize, that in any complicated domain, the effort to become the best is not linear. You could devote a few years to becoming an 80th or 90 percentile golfer or chess player. But in your lifetime you wouldn’t become Tiger or Magnus. The rewards to effort decay exponentially after a certain point. Anyone who has lifted weights knows you can spend a year progressing rapidly, only to hit a plateau that lasts just as long. 

The folk wisdom of the 80/20 rule captures this succintly: 80% of the reward comes from 20% of the effort, and the remaining 20% of the reward requires 80% effort. The exact numbers don’t matter. Divorced from contexts, it’s more of a guideline. 

This is the invisible foundation of Marc Andreesen and Scott Adam’s career advice to level up your skills in multiple domains. Say coding and public speaking or writing plus math. If it’s exponentially easier to get to the 90th percentile than the 99th then consider the arithmetic2.

a) If you are in the 99th percentile you are 1 in 100. 

b) If you are top 10% in 2 different (technically uncorrelated) domains then you are also 1 in 100 because 10% x 10% = 1%

It’s exponentially easier to achieve the second scenario because of the effort scaling function. 

If this feels too stifling you can simply follow your curiosity. In Why History’s Greatest Innovators Optimized for Interesting, Taylor Pearson summarizes the work of Juergen Schmidhuber which contends that curiousity is the desire to make sense of, or compress, information in such a way that we make it more beautiful or useful in its newly ordered form. If learning (or as I prefer to say – adapting) is downstream from curiousity we should optimize for interesting

Lawrence Yeo unknowingly takes the baton in True Learning Is Done With Agency, with his practical advice. He tells us to truly learn we must:

decouple an interest from its practical value. Instead of embarking on something with an end goal in mind, you do it for its own sake. You don’t learn because of the career path it’ll open up, but because you often wonder about the topic at hand.

…understand that a pursuit truly driven by curiosity will inevitably lend itself to practical value anyway. The internet has massively widened the scope of possible careers, and it rewards those who exercise agency in what they pursue.

Conclusion

Rohit’s essay anchored Part 1 of this series. I can’t do better than let his words linger before moving on to Part 2.
 
If measurement is too strict, we lose out on variance.

If we lose out on variance, we miss out on what actually impacts outcomes.

If we miss what actually impacts outcomes, we think we’re in a rut.

But we might not be.

Once you’ve weeded out the clear “no”s, then it’s better to bet on variance rather than trying to ascertain the true mean through imprecise means.

We should at least recognize that our problems might be stemming from selection efforts. We should probably lower our bars at the margin and rely on actual performance [as opposed to proxies for performance] to select for the best. And face up to the fact that maybe we need lower retention and higher experimentation.

Looking Ahead

In Part 2, we will explore what divergence in the tails can tell us about about life and investing. 


 

Notes From Mauboussin’s Who Is On The Other Side?

Excerpts from Michael Mauboussin’s research: Who Is On The Other Side?

In this report, Michael describes a taxonomy of inefficiencies, supported by a rich vein of academic research. The goal is to have a clear idea of why efficiency is constrained and why we believe we have an opportunity to generate an attractive return after an adjustment for risk.


There’s a reward for trying to outperform but most are not equipped to compete for it.

  • The Market for Information and the Market for Assets: In 1980, a pair of finance professors, Sanford Grossman and Joseph Stiglitz, wrote a paper called “On the Impossibility of Informationally Efficient Markets.” They argue that markets cannot be perfectly efficient because there is a cost to gathering information and reflecting it in asset prices and therefore there must be a proportionate benefit in the form of excess returns. Because collecting information is costly, active investors need exploitable mispricings to provide a sufficient incentive to participate. Lasse Pedersen, a professor of finance, says that markets must be “efficiently inefficient.” In this market, investors seek to “buy” information and sell” profit. The market for assets concerns the price at which investors buy and sell fractional stakes in various assets. Some investors trade based on information, others trade on data or drivers not relevant to value, and still others free ride. For instance, investors in portfolios that mirror indexes or follow specific rules rely on active managers for proper price discovery and liquidity. The market’s ability to translate information into price is limited by costs. These are commonly called “arbitrage costs” and include costs associated with identifying and verifying mispricing, implementing and executing trades, and financing and funding securities. These costs create frictions that are commonly understated in academic research. That said, many of these costs have come down over time, which has contributed to greater efficiency in many markets. For example, Regulation Fair Disclosure, implemented in 2000, seeks to quash selective corporate disclosure. In addition, trading costs have dropped precipitously in recent decades as a result of deregulation and advances in technology.
  • This suggests a useful distinction between “prices are right” and “no free lunch. Prices are right means that price is an unbiased estimate of value. No free lunch says that there is no investment strategy that reliably generates excess returns. A common argument for market efficiency is that very few investment managers consistently deliver excess returns. If prices are right, it stands to reason that there is no free lunch. But the opposite is not true. There can be no free lunch even when prices are wrong if the cost and risk of correcting mispricing are sufficiently high. Identifying and exploiting these pockets of inefficiency should be the main focus of active managers
  • To be an active investor, you must believe in inefficiency and efficiency. You need inefficiency to get opportunities and efficiency for those opportunities to turn into returns.

Sources of Edge

Behavioral edge

  • Only a fraction of asset price moves can be directly linked to changes in fundamentals, such as revisions in cash flow or interest rate expectations. This has been established by studies of the biggest moves in the stock market since the 1940s that looked to the media for a fundamental explanation after the fact. In many cases, there is no clear fundamental driver of value.
  • We observe certain patterns in nearly all markets. For example, we have seen bubbles and crashes in a multitude of geographies (e.g.,Americas, Europe, and Asia) and asset classes
  • Beware of Behavioral Finance: The interaction of investors with little information or rationality can yield prices with surprising efficiency. The lesson is that you cannot extrapolate from individuals, who fail to operate according to the rules of rationality, to markets. The reason is that individual errors can cancel out, leading to accurate prices. You can be an overconfident buyer and I can be an overconfident seller and the net result is a correct price. The key is understanding when the wisdom of crowds flips to the madness of crowds. And the essential insight is that it has to do with a violation of one or more of the core conditions for a wise crowd.

    The essential conditions include the presence of investors with sufficiently heterogeneous views and decision rules and having an effective way to aggregate the information. When and how the wisdom of crowds, where markets are efficient, transitions to the madness of crowds, where markets are inefficient. This may be the most important recurring behavioral opportunity.

The importance of  heterogeneous views

    • Blake LeBaron, a professor of economics at Brandeis University and an expert in agent-based modeling, built such a model. He included 1,000 agents with well-defined objectives for portfolio allocations, a risk-free asset, an asset that pays a dividend at a rate calibrated to the empirical record in the last half-century, and 250 active decision rules. The agents made or lost money as they traded and he eliminated those with the lowest levels of wealth. He also evolved the decision rules by removing those the agents did not use and replacing them with new ones. The beauty of LeBaron’s model is we can observe the interaction between diversity and asset prices. LeBaron’s model replicates many of the empirical features of markets, including clustered volatility, variable trading volumes, and fat tails. For the purpose of this discussion, the crucial observation is that sharp rises in the asset price are preceded by a reduction in the number of rules the traders used. LeBaron describes it this way: “During the run-up to a crash, population diversity falls. Agents begin to use very similar trading strategies as their common good performance begins to self-reinforce. This makes the population very brittle, in that a small reduction in the demand for shares could have a strong destabilizing impact on the market. The economic mechanism here is clear. Traders have a hard time finding anyone to sell to in a falling market since everyone else is following very similar strategies. In the Walrasian setup used here, this forces the price to drop by a large magnitude to clear the market. The population homogeneity translates into a reduction in market liquidity.” Because the traders were using the same rules, diversity dropped and they pushed the asset price into bubble territory. At the same time, the market’s fragility rose.

    • The model underscores some important lessons about behavioral inefficiency
      • as the agents lose diversity by imitating one another, the initial impact is that they get richer. This is why betting against a bubble is so hard.
      • Second, the market’s reaction to a reduction in diversity is non-linear. As diversity falls, the market’s fragility rises. But the higher asset price obscures the underlying vulnerability. At a critical point, however, an incremental reduction in diversity leads to a large drop in the asset price. Crowded trades work until they don’t.

How beliefs spread

    • You need to understand a model of how ideas or information propagate across a network. Epidemiologists use a model to describe the spread of disease that is analogous to the spread of beliefs, including fads and fashions. The model considers:
       

      • contagiousness
      • degree of interaction
      • degree of recovery

When seasoned investors stop betting against the investment or investment theme, they contribute to the lack of diversity. With no countervailing opinion voting in the market, decision rules converge and diversity suffers.

Analytical Edge

  • Better analysis or info weightings
    • info weighting
      • requires giving a signal the appropriate strength. Be Bayesian. If a coin comes up head 4x in a row you might extrapolate based on the signal but you will be overconfident because the sample size is small and should not move your prior too much (bias towards the strength and also recency bias can both lead to overreaction)
    • time arbitrage
      • Benartzi and Thaler attempt to explain the historical equity risk premium by combining two ideas. The first is loss aversion, which says humans suffer losses roughly twice as much as they enjoy equivalent gains. That you should be twice as upset at losing $100 as you are happy at winning $100 is inconsistent with classical utility theory. The second idea is myopia, which means “nearsightedness.” This reflects how frequently you look at your investment portfolio. The stock market tends to go up over time, but it rises by fits and starts. Based on nearly a century of data, the probability you will see a gain in your diversified U.S. stock portfolio is roughly 51 percent for a day, 53 percent for a week, and 75 percent for a year. Look out a decade or more and the probability of a profit is very close to 100 percent. Both ideas are well established on their own, but together they address the issue of investor time horizon in a new way. The more frequently an investor looks at his or her portfolio, the more likely he or she is to observe losses and suffer from loss aversion. As a result, an investor examining his or her portfolio all the time requires a higher return to compensate for suffering from losses than one who looks at his or her portfolio infrequently and hence suffers less. A long-term investor is willing to pay a higher price for the same asset than is a short-term investor.71 Evidence from the field suggests that professional investors are not immune from myopic loss aversion.
      • Studies showing that participants in a study playing a positive EV game bet less after a loss despite the game being being positive ev.

Exploiting inefficiencies

  • play games you are relatively better at
  • weight and update info effectively
  • “make time your friend”
  • understand the story embedded in the price
  • be faster, pay attention esp at things that are neglected (info in less covered areas is relatively less likely to be reflected in prices), find edges in secondary or higher-order effects which take longer to reason about
  • exploit technical inefficiencies: Forced buyers and sellers (ie hedgers, restrictions on what a funds can own, margin calls/leverage)

Why Volatility Still Matters To Buy-And-Hold Investors

This post is a quick response to value-investing folks who might not appreciate why volatility is not just a quant or options concern. It is a response to what I think is a narrow interpretation of  Buffet’s dismissal of volatility.

In Buffet: Volatility Is Not A Risk, we find this quote (emphasis mine):

“Volatility is not a measure of risk. And the problem is that the people who have written and taught about volatility do not know how to measure — or, I mean, taught about risk — do not know how to measure risk. And the nice about beta, which is a measure of volatility, is that it’s nice and mathematical and wrong in terms of measuring risk. It’s a measure of volatility, but past volatility does not determine the risk of investing…in stocks, because the prices jiggle around every minute, and because it lets the people who teach finance use the mathematics they’ve learned, they have — in effect, they would explain this a way a little more technically — but they have, in effect, translated volatility into all kinds of — past volatility — in terms of all kinds of measures of risk.”

Volatility is a measure of risk. Is it incomplete? Of course. Literally, nobody thinks it’s the definition of risk with a capital R. No single measure can encapsulate risk or for that matter the merit of any investment.

I’ve discussed these points in various ways before:

I’ve noticed that 10kdiver threads sometimes get push back like “why are you talking about this volatility math, don’t you know Buffet said it doesn’t matter”.

That attitude reflects misunderstanding so in addition to my prior posts, here are a few additional ways to show why volatility matters:

  • In one of my favorite all-time papers, My Top 10 PeevesCliff Asness places this one as #1: “Volatility” Is for Misguided Geeks; Risk Is Really the Chance of a “Permanent Loss of Capital”.His conclusion is less cranky than his characteristic style:

    “I still think this argument is mostly a case of smart people talking in different languages and not disagreeing as much as it sometimes seems.”

    The root of the argument is quants are decomposing risk from return and more deferential to mark-to-market, while Buffet refuses to separate risk from return.

  • Another one comes from a real vs nominal illusion.It is also conveniently addressed in Asness’s Peeves. Specifically #10: “Bonds Have Prices Too”.

    You may hear some people say they want to buy an individual bond rather than a bond fund. They worry that bond fund prices move around and have no real expiration, so when interest rates rise your losses are somehow more real. But if you buy a bond and hold it to maturity you can put your head in the sand, and never lose.

    This is nonsense.

    You have lost in a real sense since the money you are being returned is worth less in a world in which rates have risen to compensate for inflation. The bond fund is effectively taking your loss today rather than later.

    If you sell your bond for a loss, you can reinvest at a higher yield going forward. That’s a similar experience to just being in the bond fund.

    Holding to maturity does not mean you have less risk. It’s an illusion.

  • This brings me to mark-to-market.Having a preference for private assets that are less volatile simply because their marks are stale is bizarre (although it’s understandable if there is a principal-agent problem at play…hmm, couldn’t possibly be that could it?). They are still volatile. The fundamentals of the private business are correlated with the public market volatility.

    Even if you don’t believe your investment should be marked down, then you should be sad you can’t redeem your private investment at par to rebalance into public stocks after the market drops 20%. Giving up liquidity without a premium because it will behaviorally “save you from yourself” sure feels like you sold the option to rebalance at zero.

    Further reading: How Much Extra Return Should You Demand For Illiquidity? (7 min read)

  • Speaking of options…I’ve seen the argument that holding cash means you don’t have to care about volatility. After all, you can dollar-cost-average into drawdowns.

    Um, what?

    To say that holding cash means you don’t have to worry about volatility is to misunderstand the arrow of causation. The reason you have cash is because you are concerned about volatility! Cash is liquidity. It’s the ultimate option (its cost is inflation). What maximizes the value of any option including cash?

    You guessed it. Volatility.

What Part Of Selling Calls Is “Income”?

Selling calls “for income” is not a thing. You can sell a call as compensation for risk but no professional options trader thinks of an option sale as “income”. They might mark-to-model and book the premium over “fair value” as theoretical edge, or simply “theo”. And even then we are talking about pennies. Just a tiny fraction of the stock price that they are long.

Nobody serious can claim the entire premium is income. I’ve discussed this before but if you’re stubborn here’s a few more angles to this.

A simple math example

You’re long a $100 stock.

  • It’s fairly priced because it’s 90% to be 0 and 10% to be $1000.
  • You overwrite by selling the 500 strike call at $45.

Did you earn income?

What if you sold the call for $55?

My problem with the “selling calls for income” crowd…they don’t know the difference.

Some people’s personal utility curves can make even a negative edge seem like an ok hurdle.

A courageous response to my question on Twitter:

There is no problem here. You take your $45 and move on with your life. If you get called away you make 5x, and if your stock goes to $0 you came out with only a 55% loss.

Umm, incinerating money when you think you are investing is actually what I would call a “problem”.

You make $445 10% of the time and lose $55 90% of the time. You are literally better off betting on roulette.1

If you overwrite a call that’s actually worth $1 at a price of $.95 because call markets are faded low for sellers, you are stuck with roulette odds. Factor in your brokerage costs (implicitly or explicitly) and effort.

I’d rather get a free hotel room.

Other framings

  • Instead of selling calls, you can buy less of the stock to have the equivalent delta and use the cash elsewhere.
  • You could buy puts and buy MORE of the stock than you originally intended.

You cannot think about selling calls without thinking of vol in some fashion.

Selling options profitably requires being able to tell the difference between these scenarios, properly accounting for what portion of the sale is “income” vs fairly probability-weighted premium.

If you can do that, go ahead and claim you sell calls for income. That’s the bar. Not “premium arrived in my brokerage account”. I’m trying to show that the decision to trade an option has nothing to do with income and everything to do with the proposition you are being offered.

The reality is that betting against mispriced options is a game of pennies or half- pennies. It’s low signal-to-noise. Realizing and validating the edge requires large sample sizes. If you are overwriting without a deep process you likely have no idea if you have edge and your sample is too small to know.

If that’s not clear, check out my version of trading 101:

Understanding Edge (10 min read)


Solving A Compounding Riddle With Black-Scholes

A few weeks ago I was getting on an airplane armed with a paper and pen, ready to solve the problem in the tweet below. And while I think you will enjoy the approach, the real payoff is going to follow shortly after — I’ll show you how to not only solve it with option theory but expand your understanding of the volatility surface. This is going to be fun. Thinking caps on. Let’s go.

The Question That Launched This Post

From that tweet, you can see the distribution of answers has no real consensus. So don’t let others’ choices affect you. Try to solve the problem yourself. I’ll re-state some focusing details:

  • Stock A compounds at 10% per year with no volatility
  • Stock B has the same annual expectancy as A but has volatility. Its annual return is binomial — either up 30% or down 10%.
  • After 10 years, what’s the chance volatile stock B is higher than A?

You’ll get the most out of this post if you try to solve the problem. Give it a shot. Take note of your gut reactions before you start working through it. In the next section, I will share my gut reaction and solution.

My Approach To The Problem

Gut Reaction

So the first thing I noticed is that this is a “compounding” problem. It’s multiplicative. We are going to be letting our wealth ride and incurring a percent return. We are applying a rate of return to some corpus of wealth that is growing or shrinking. I’m being heavy-handed in identifying that because it stands in contrast to a situation where you earn a return, take profits off the table, and bet again. Or situations, where you bet a fixed amount in a game as opposed to a fraction of your bankroll. This particular poll question is a compounding question, akin to re-investing dividends not spending them. This is the typical context investors reason about when doing “return” math. Your mind should switch into “compounding” mode when you identify these multiplicative situations.

So if this is a compounding problem, and the arithmetic returns for both investments are 10% I immediately know that volatile stock “B” is likely to be lower than stock “A” after 10 years. This is because of the “volatility tax” or what I’ve called the volatility drain. Still, that only conclusively rules out choice #4. Since we could rule that without doing any work and over 2,000 respondents selected it, I know there’s a good reason to write this post!

Showing My Work

Here’s how I reasoned through the problem step-by-step.

Stock A’s Path (10% compounded annually)

Stock B’s Path (up 30% or down 10%)

The fancy term for this is “binomial tree” but it’s an easy concept visually. Let’s start simple and just draw the path for the first 2 years. Up nodes are created by multiplying the stock price by 1.3, down modes are created by multiplying by .90.

Inferences

Year 1: 2 cumulative outcomes. Volatile stock B is 50/50 to outperform
Year 2: There are 3 cumulative outcomes. Stock B only outperforms in one of them.

Let’s pause here because while we are mapping the outcome space, we need to recognize that not every one of these outcomes has equal probability.

2 points to keep in mind:

  • In a binomial tree, the number of possibilities is 2ᴺ where N is the number of years. This makes sense since each node in the tree has 2 possible outcomes, the tree grows by 2ᴺ.
  • However, the number of outcomes is N + 1. So in Year 1, there are 2 possible outcomes. In year 2, 3 possible outcomes.

Probability is the number of ways an outcome can occur divided by the total number of possibilities.

Visually:


So by year 2 (N=2), there are 3 outcomes (N+1) and 4 cumulative paths (2ᴺ)

We are moving slowly, but we are getting somewhere.

In year 1, the volatile investment has a 50% chance of winning. The frequency of win paths and lose paths are equal. But what happens in an even year?

There is an odd number of outcomes, with the middle outcome representing the number of winning years and the number of losing years being exactly the same. If the frequency of the wins and losses is the same the volatility tax dominates. If you start with $100 and make 10% then lose 10% the following year, your cumulative result is a loss.

$100 x 1.1 x .9 = $99

Order doesn’t matter.

$100 x .9 x 1.1 = $99

In odd years, like year 3, there is a clear winner because the number of wins and losses cannot be the same. Just like a 3-game series.

Solving for year 10

If we extend this logic, it’s clear that year 10 is going to have a big volatility tax embedded in it because of the term that includes stock B having 5 up years and 5 loss years.

N = 10
Outcomes (N+1) = 11 (ie 10 up years, 9 up years, 8 up years…0 up years)
# of paths (2ᴺ) = 1024

We know that 10, 9, 8,7,6 “ups” result in B > A.
We know that 4, 3, 2,1, 0 “ups” result in B < A

The odds of those outcomes are symmetrical. So the question is how often does 5 wins, 5 losses happen? That’s the outcome in which stock A wins because the volatility tax effect is so dominant.

The number of ways to have 5 wins in 10 years is a combination formula for “10 choose 5”:

₁₀C₅ or in Excel =combin(10,5) = 252

So there are 252 out of 1024 total paths in which there are 5 wins and 5 losses. 24.6%

24.6% of the time the volatility tax causes A > B. The remaining paths represent 75.4% of the paths and those have a clear winner that is evenly split between A>B and B>A.

75.4% / 2 = 37.7%

So volatile stock B only outperforms stock A 37.7% of the time despite having the same arithmetic expectancy!

This will surprise nobody who recognized that the geometric mean corresponds to the median of a compounding process. The geometric mean of this investment is not 10% per year but 8.17%. Think of how you compute a CAGR by taking the terminal wealth and raising it to the 1/N power. So if you returned $2 after 10 years on a $1 investment your CAGR is 2^(1/10) – 1 = 7.18%. To compute a geometric mean for stock B we invert the math: .9^(1/2) * 1.3^(1/2) -1  = 8.17%. (we’ll come back to this after a few pictures)

The Full Visual

A fun thing to recognize with binomial trees is that the coefficients (ie the number of ways a path can be made that we denoted with the “combination” formula) can be created easily with Pascal’s Triangle. Simply sum the 2 coefficients directly from the line above it.

Coefficients of the binomial expansion (# of ways to form the path)

 

Probabilities (# of ways to form each path divided by total paths)

Corresponding Price Paths

Above we computed the geometric mean to be 8.17%. If we compounded $100 at 8.17% for 10 years we end up with $219 which is the median result that corresponds to 5 up years and 5 down years! 

The Problem With This Solution

I solved the 10-year problem by recognizing that, in even years, the volatility tax would cause volatile stock B to lose when the up years and down years occurred equally. (Note that while an equal number of heads and tails is the most likely outcome, it’s still not likely. There’s a 24.6% chance that it happens in 10 trials).

But there’s an issue. 

My intuition doesn’t scale for large N. Consider 100 years. Even in the case where B is up 51 times and down 49 times the volatility tax will still cause the cumulative return of B < A. We can use guess-and-test to see how many winning years B needs to have to overcome the tax for N = 100.

N = 100

If we put $1 into A, it grows at 1.1^100 = $13,871

If we put $1 into B and it has 54 winning years and 46 losing years, it will return 1.3^54 * .9^46 = $11,171. It underperforms A.

If we put $1 into B and it has 55 winning years and 45 losing years, it will return 1.3^55 * .9^45 = $16,136. It outperforms A.

So B needs to have 55 “ups”/45 “downs” or about 20% more winning years to overcome the volatility tax. It’s not as simple as it needs to win more times than stock A, like we found for shorter horizons.

We need a better way. 

The General Solution Comes From Continuous Compounding: The Gateway To Option Theory

In the question above, we compounded the arithmetic return of 10% annually to get our expectancy for the stocks.

Both stocks’ expected value after 10 years is 100 * 1.1^10 = $259.37.

Be careful. You don’t want the whole idea of the geometric mean to trip you up. The compounding of volatility does NOT change the expectancy. It changes the distribution of outcomes. This is crucial.

The expectancy is the same, the distribution differs.

If we keep cutting the compounding periods from 1 year to 1 week to 1 minute…we approach continuous compounding. That’s what logreturns are. Continuously compounded returns.

Here’s the key:

Returns conform to a lognormal distribution. You cannot lose more than 100% but you have unlimited upside because of the continuous compounding. Compared to a bell-curve the lognormal distribution is positively skewed. The counterbalance of the positive skew is that the geometric mean or center of mass of the distribution is necessarily lower than the arithmetic expectancy. How much lower? It depends on the volatility because the volatility tax1 pulls the geometric mean down from the arithmetic mean or expectancy. The higher the volatility, the more positively skewed the lognormal or compounded distribution is. The more volatile the asset is in a positively skewed distribution the larger the right tail grows since the left tail is bounded by zero. The counterbalance to the positive skew is that the most likely outcome is the geometric mean.

I’ll pause here for a moment to just hammer home the idea of positive skew:

If stock B doubled 20% of the time and lost 12.5% the remaining 80% of the time its average return would be exactly the same as stock A after 1 year (20% * $200 + 80% * $87.5 = $110). The arithmetic mean is the same. But the most common lived result is that you lose. The more we crank the volatility higher, the more it looks like a lotto ticket with a low probability outcome driving the average return.

Look at the terminal prices for stock B:

The arithmetic mean is the same as A, $259.

The geometric or mean or most likely outcome is only $219 (again corresponding to the 8.17% geometric return)

The magnitude of that long right tail ($1,379 is > 1200% total return, while the left tail is a cumulative loss of 65%) is driving that 10% arithmetic return.

Compounding is pulling the typical outcome down as a function of volatility but it’s not changing the overall expectancy.

A Pause To Gather Ourselves

  • We now understand that compounded returns are positively skewed.
  • We now understand that logreturns are just compounded returns taken continuously as opposed to annually.
  • This continuous, logreturn world is the basis of option math. 

Black-Scholes

The lognormal distribution underpins the Black-Scholes model used for pricing options.

The mean of a lognormal distribution is the geometric mean. By now we understand that the geometric mean is always lower than the arithmetic mean. So in compounded world we understand that most likely outcome is lower than the arithmetic mean. 

Geometric mean  = arithmetic mean – .5 * volatility²

The question we worked on is not continuous compounding but if it were, the geometric mean = 10% – .5 * (.20)² = 8%. Just knowing this was enough to know that most likely B would not outperform A even though they have the same average expectancy.

Let’s revisit the original question, but now we will assume continuous compounding instead of annual compounding. The beauty of this is we can now use Black Scholes to solve it!

Re-framing The Poll As An Options Question

We now switch compounding frequency from annual to continuous so we are officially in Black-Scholes lognormal world. 

Expected return (arithmetic mean)

  • Annual compounding: $100 * (1.1)¹⁰ = $259.37
  • Continuous compounding (B-S world): 100*e^(.10 * 10) = $271.83

Median return (geometric mean)

  • Annual compounding: $100 x 1.0817¹⁰ = $219.24
  • Continuous compounding (B-S world): $100 * e^(.10 – .5 * .2²) = $222.55
    • remember Geometric mean  = arithmetic mean – .5 * volatility²
    • geometric mean < arithmetic mean of course

The original question:

What’s the probability that stock B with its 10% annual return and 20% volatility outperforms stock A with its 10% annual return and no volatility in 10 years?

Asking the question in options language:

What is the probability that a 10-year call option on stock B with a strike price of $271.83 expires in-the-money?

If you have heard that “delta” is the probability of “expiring in-the-money” then you think we are done. We have all the variables we need to use a Black-Scholes calculator which will spit out a delta. The problem is delta is only approximately the probability of expiring in-the-money. In cases with lots of time to expiry, like this one where the horizon is 10 years, they diverge dramatically. 2

We will need to extract the probability from the Black Scholes equation. Rest assured, we already have all the variables. 

Computing The Probability That Stock “B” Expires Above Stock “A”

If we simplify Black-Scholes to a bumper sticker, it is the probability-discounted stock price beyond a fixed strike price. Under the hood of the equation, there must be some notion of a random variable’s probability distribution. In fact, it’s comfortingly simple. The crux of the computation is just calculating z-scores.

I think of a z-score as the “X” coordinate on a graph where the “Y” coordinate is a probability on a distribution. Refresher pic3:

Conceptually, a z-score is a distance from a distribution’s mean normalized by its standard deviation. In Black-Scholes world, z-scores are a specified logreturn’s distance from the geometric mean normalized by the stock’s volatility. Same idea as the Gaussian z-scores you have seen before.

Conveniently, logreturns are themselves normally distributed allowing us to use the good ol’ NORM.DIST Excel function to turn those z-scores into probabilities and deltas. 

In Black Scholes,

  • delta is N(d1)
  • probability of expiring in-the-money is N(d2)
  • d1 and d2 are z-scores

Here are my calcs4:

Boom.

The probability of stock B finishing above stock A (ie the strike or forward price of an a $100 stock continuously compounded at 10% for 10 years) is…

37.6%!

This is respectably close to the 37.7% we computed using Pascal’s Triangle. The difference is we used the continuous compounding (lognormal) distribution of returns instead of calculating the return outcomes discretely. 

The Lognormal Distribution Is A Lesson In How Compounding Influences Returns

I ran all the same inputs through Black Scholes for strikes up to $750.

  • This lets us compute all the straddles and butterflies in Black-Scholes universe (ie what market-makers back in the day called “flat sheets”. That means no additional skew parameters were fit to the model or the model was not fit to the market).
  • The flys lets us draw the distribution of prices.

A snippet of the table:

I highlighted a few cells of note:

  • The 220 strike has a 50% chance of expiring ITM. That makes sense, it’s the geometric mean or arithmetic median.
  • The 270 strike is known as At-The-Forward because it corresponds to the forward price of $271.83 derived from continuously compounding $100 at 10% per year for 10 years (ie Seʳᵗ). If 10% were a risk-free rate this would be treated like the 10 year ATM price in practice. Notice it has a 63% delta. This suprises people new to options but for veterans this is expected (assuming you are running a model without spot-vol correlation).
  • You have to go to the $330 strike to find the 50% delta option! If you need to review why see Lessons From The .50 Delta Option.

This below summary picture adds one more lesson:

The cheapest straddle (and therefore most expensive butterfly) occurs at the modal return, about $150. If the stock increased from $100 to $150, you’re CAGR would be 4.1%. This is the single most likely event despite the fact that it’s below the median AND has a point probability of only 1.7%

Speaking of Skew

Vanilla Black-Scholes option theory is a handy framework for understanding the otherwise unintuitive hand of compounding. The lognormal distribution is the distribution that corresponds to continuously compounded returns. However, it is important to recognize that nobody actually believes this distribution describes any individual investment. A biotech stock might be bimodally distributed, contingent on an FDA approval. If you price SPX index options with positively skewed model like this you will not last long. 

A positively skewed distribution says “on average I’ll make X because sometimes I’ll make multiples of X but most of the time, my lived experience is I’ll make less than X”.

In reality, the market imputes negative skew on the SPX options market. This shifts the peak to the right, shortens the right tail, and fattens the left tail. That implied skew says “on average I make X, I often make more than X, because occasionally I get annihilated”. 

It often puzzles beginning traders that adding “put skew” to a market, which feels like a “negative” sentiment, raises the value of call spreads. But that actually makes sense. A call spread is a simple over/under bet that reduces to the odds of some outcome happening. If the spot price is unchanged, and the puts become more expensive because the left tail is getting fatter, then it means the asset must be more likely to appreciate to counterbalance those 2 conditions. So of course the call spreads must be worth more. 

 

Final Wrap

Compounding is a topic that gives beginners and even experienced professionals difficulty. By presenting the solution to the question from a discrete binomial angle and a continuous Black-Scholes angle, I hope it soldified or even furthered your appreciation for how compounding works. 

My stretch goal was to advance your understanding of option theory. While it overlaps with many of my other option theory posts, if it led to even any small additional insight, I figure it’s worth it. I enjoyed sensing that the question could be solved using options and then proving it out. 

I want to thank @10kdiver for the work he puts out consistently and the conversation we had over Twitter DM regarding his question. If you are trying to learn basic and intermediate level financial numeracy his collection of threads is unparalled. Work I aspire to. Check them out here: https://10kdiver.com/twitter-threads/

Remember, my first solution (Pascal’s Triangle) only worked for relatively small N. It was not a general solution. The Black-Scholes solution is a general one but required changing “compounded annually” to “compounded continuously”. 10kdiver provided the general solution, using logs (so also moving into continuous compounding) but did not require discussion of option theory. 

I’ll leave you with that:

Additional Reading 

  • Path: How Compounding Alters Return Distributions (Link)

This post shows how return distributions built from compounding depend on the ratio of trend vs chop.

  • The difficulty with shorting and inverse positions (Link)

    The reason shorting and inverse positions are problematic is intimately tied to compounding math.

 




I Felt Bad For Picking My 3rd Grader Off

In trading, “picking someone off” means trading against a counterparty who would flake on the price they offered you if they knew what you know.

If I lift an offer on a TSLA March call option because I’m bullish, the market-maker on the other side of my trade doesn’t care. They would still sell me the call even if I texted them my rationale. But if I had the divine knowledge that Elon was going to tweet in the next 10 seconds that earnings would be reported in March not February, then I would be knowingly “picking off” the market-maker. As an options trader, you need to defend against pick-offs. You also want alerts if someone else is making a price that is not incorporating material, public info. For example, if an OPEC meeting date was moved there might be a tiny window when you could disguise a calendar spread as a routine roll when really you are trying to pick off the other side before they get the memo.

[In reality, when such news happens, market-makers will “sweep” all the resting customer orders with price limits below or above the option’s new fair value. It’s very difficult to pick off another professional who is consuming a real-time news feed.]

A few categories of pick-offs:

  • Pickoff trades related to changes in dates.

    If earnings are moved from early Feb to early March, then the “earnings volatility” needs to come out of the Feb expiry since you are no longer exposed to it, and the March options which now contain that volatility must appreciate relatively.

  • Pickoffs related to change in carry

    If a stock announces a change in its dividend that will affect the carry embedded in the options. If a dividend is slashed, the calls go up relative to the puts. Market makers need to be on top of how corporate actions affect the inputs into their pricing models.

  • Pickoffs related to changes in baskets

    If an ETF’s constituents change that affects vol of the underlying basket. If an ETF restricts creations, this can lead to the options and ETF becoming mispriced (one day I’ll tell the story of how this personally cost me 6 figures).

Pickoffs In Real Life

The tradition of trading floors is full of insane prop betting stories (NYMEX vets have so many to choose from but my personal favorite was watching one guy successfully pound 20 Coors Lites in an hour in one of the  upstairs offices).

But picking people off can be as easy as making bets with folks who are arithmetically challenged. Most people are. You only need to look up polls by @10kdiver to see that even professional investors, a subset of the population who should be able to compute a return, struggle numerically.

I am overstating the case a bit.

  1. I don’t know how many of those respondents are professional investors. I’d also admit there are times I’m impatient and just take a guess just to see the results. This is probably common behavior.
  2. When confronted with a bet, people’s defenses go up. They are wary of strangers bearing gifts and will assume there’s a catch.

Now that was a long-winded, but hopefully fun, introduction to a story that I will improve your numerical intuition and illuminate a lesson that is central to both investing and engineering problems.

The Proposition

My 3rd-grader came home with a packet of math worksheets from school. Two of the sheets, a total of 10 questions, were incomplete. He said the teacher didn’t require those sheets to be done.

What a bummer.

I thought they were the best questions from the whole packet. I asked him to solve the total of 10 questions but decided to make my request a little spicier. I offered him the following bet:

If you get them all correct, I will give you $5. Otherwise, you owe me $5.

He thought for a moment, then accepted. Before he went off to work on them, I started to wonder if my impulsive proposition was fair.

Is It A Fair Bet?

After eyeballing the questions, I estimated he had a 90% chance of getting any question correct. Another way to say that, is I expect he gets 1 wrong on average. Right off the bat,  I think I’m going to win. It’s not a fair bet.

This led me to compute a couple numbers.

  1. If my estimate of 90% hit rate per question is correct, what’s the chance he gets them all correct?

    .9010 = 35%

    That means he’s almost a 2-1 underdog

  2. What would his hit rate need to be per question to make the bet fair?

    First we need to convert what a fair bet means in math language. This is straightforward. Since it’s an even $5 bet then the fair proposition would not be, on average, he gets 1 wrong, but that he gets all the questions correct 50% of the time. 

    x10 = 50%

    x = .50 1/10

    x = 93.3%

    So if he had a 93.3% chance of getting any single question correct, then he has a 50% chance of winning the bet.

Flipping The Odds In His Favor

I’m not trying to take candy from a 3rd grader, I just wanted to make him more eager to do the questions. So, I tweaked the bet after he returned with the answers. Each of the 2 worksheets had 5 questions each. I decided to batch the bet as follows:

If he gets all the questions correct, he wins.

If he gets all the questions in one batch correct but not the other, it’s a push.

If he gets less than 100% on each batch then he loses the bet. 

How did this tweak alter the proposition?

In the first bet, if he got anything wrong, he lost. With these rules, he only loses if he gets, at least, one wrong in each batch.

We need to analyze all the possible outcomes that can occur with 2 batches.

First, we must ask:

What is the probability he gets all the questions right within a batch of 5 questions?

Assume he has a 90% hit rate again. 

.95 = 59%

So for either batch he has a 59% chance of getting a perfect score and a 41% chance of getting at least one wrong (ie a non-perfect score). 

Now we must consider all the possible outcomes of the proposition and their probabilities.

  1. Perfect score in both batches

    59% x 59% = 34.8%

  2. Perfect score in one batch but not the other

    59% x 41% = 24.2%

  3. At least one wrong in both batches

    41% x 41% = 16.8%

If we sum them all up we get 75.8%.

Wait, these don’t add to 100%, what gives??

We need to weigh these outcomes by the number of ways they can happen.

The possibilities with their probabilities are as follows:

  • Win, Win = 34.8%
  • Win, Lose = 24.2%
  • Lose, Win = 24.2%
  • Lose, Lose = 16.8%

Collapsing the individual possibilities into the proposition’s probabilities we get:

  • Win: 34.8%
  • Push: 48.4% (24.2% + 24.2%)
  • Lose: 16.8%

These probabilities sum to 100% and tell us:

  1. The most likely scenario of the bet is a push, no money exchanged
  2. Otherwise, he wins the bet 2x more than he loses the bet

This is a powerful result. Remember, his hit rate on any individual question was still 90%. By batching, we changed the proposition into a bad bet for him, into a good bet because he gets to diversify or quarantine the risk of a wrong answer.

Even More Diversification

With 2 batches we saw the range of possibilties conforms to the binomial distribution for n = 2:

p2 + 2p(1-p) + (1-p)2

where p = probability of perfect score on a batch

In English, the coefficients: 1 way to win both + 2 ways to push + 1 way to lose both

What if we split the proposition into 5 batches of 2 questions each?

These are the bet scenarios:

  1. Win =  5 pairs of questions correct
  2. Lose = at least one wrong in each of the 5 pairs to lose the bet
  3. Push = and any other combination (i.e. 3 perfect batches + 2 imperfect batches)

How do the batch outcomes roll up to the 3 bet scenarios?

  1. We must count how many ways there are to generate each of the scenarios.
  2. We must probability weight each way.

The summary table shows the work.

[Note the coefficients correspond to the coefficients for the binomial expansion (x+y)5 which is also the row of Pascal’s Triangle for N = 5]

Outcome Scenario # of ways to win probability weights Count x probabilities
5 wins (wins the bet scenario)  win combin(5,5) = 1 0.81⁵ = 34.9% 1 x 34.9% = 34.9%
4 wins, 1 loss (push) push combin(5,4)= 5 .81⁴ x .19 = 8.2% 5 x 8.2% = 41%
3 wins, 2 losses (push) push combin(5,3) = 10 .81³ x .19² = 1.9% 10 x 1.9% = 19%
2 wins, 3 losses (push) push combin(5,2) = 10 .81² x .19³ = .45% 10 x .45% = 4.5%
1 win, 4 losses (push) push combin(5,1) = 5 .81 x .19⁴ = .11% 5 x .11% = .55%
5 losses (lose the bet) lose combin(5,0) = 1 .19⁵ = .025% 1 x .025% = .025%
Total: 2⁵ = 32 100%

The net result of quarantining the questions into 5 groups of 2:

35% chance he wins the bet

65% chance he pushes on the bet

Nearly 0% chance he loses on the bet!

If you took this quarantining logic further and treated each question as its own bath then the new phrasing of the bet would be:

If he gets every question correct, he wins

If he gets every question incorrect, he loses

Any other scenario is a push

The corresponding probabilities:

Win = .910 = 35%

Lose = .110 = E-10 or 0%

So the benefits of separating the bets were mostly achieved above when we batched into groups 5 pairs of questions.

Conclusion

The first required my son to get a string of questions correct. Even though I estimated he had a 90% chance of getting any question correct, by making the net outcome dependent on each chain-link the odds of his success became highly contingent on the length of the chain.

With a chain length of 10, the bet was unfair to him. By changing the bet to be the result of success on smaller chains (the batches) I changed the distribution of outcome. It did not increase his chance of winning, but it reduced his chance of losing by creating offsetting scenarios or “pushes”. In other words, the smaller chains reduced his overall risk without sacrificing his odds of winning. It was a free lunch for him.

When you create long chains of dependency (ahem, positive correlations), an impurity in any link threatens your entire proposition. In these examples, we are dealing with binary outcomes. This is a trivial analysis. With investing, the distributions of the bets are not easily known.  It is well-known, however, that diversifying or not putting all your eggs in one basket is a free lunch. Still, if the investments are highly correlated, you may be fooling yourself and depending on a long chain.

Imagine if the 10 questions my son had were the type of word problem where the answer to each question is the input to the next. That’s a portfolio of highly correlated assets. If you are “yield-farming” crypto stablecoins you have probably thought about this problem. Spreading the risk across many coins can offset many idiosyncratic risks to the protocols. But is there a translucent, hard-to-see chain of correlation tying them all together that only reveals itself when the whole background goes black? That’s systemic risk. Ultimately, the only hedge to such a risk is position sizing at the aggregate level where you sum the gross positions. This is why stress-testing a portfolio to that standard is a quant’s “last level of defense”.

Jarvis, what happens to my portfolio when all correlations go to one?


[If you are in the investing world you will see parallels to this lesson in the ergodicity1 problem.]


Oh yea, how’d the bet with my 3rd grader go?

He got a perfect score on one batch of 5, and he got 1 wrong in the second batch. So the overall bet was a push, and his old man didn’t do so bad in estimating he’d have a 90% hit rate.


Selling Calls: It Might Be Passive, But It Ain’t Income

First something nice. An amuse-bouche:

That was pleasant enough.

Now violence.

You have heard of selling calls for “passive income”. The pitches which promote this idea are using the word “income” in the same sense that I would earn “income” if I sold you my house for $100. The income is a receipt or a cashflow, but this is just mechanical accounting. I have not earned income in any economic sense of the word. A receipt is not income without considering value given vs value received.

Suppose you own a $50 stock. Imagine you sold the $45 strike call for $5. Imagine the scenarios:

  • Stock goes up: Let’s say it goes to $60
    • $10 profit on stock holdings
    • Call option you are short goes up by $10
    • You are assigned on your call option, your stock is called away, leaving you with no position. P/L =0
  • Stock falls but remains above the strike: Let’s say it goes to $47
    • $3 loss on stock holdings
    • Call option you shorted falls to $2. You earn $3 on that leg.
    • Again, you are assigned on your call option, your stock is called away, leaving you with no position. P/L =0
  • Stock falls below the strike: Let’s say it goes to $40
    • $10 loss on stock holdings
    • Call option you shorted expires worthless. You earn $5 on that leg.
    • Since the call is worthless, you still own the stock and you have a net loss of $5

A few things to observe:

  1. You can only lose. This makes sense. You sold an option at its intrinsic value. Visually:
  2. These scenarios are exactly the same as if you held no stock position and you sold the 45 strike put at $0. This is called “put/call parity”.

    Parity means equal. It means a call is a put and a put is a call. Your stock position combined with the option you are long or short determines your effective position.

    • Long stock, short call = short put (this is all covered calls!)
    • Short stock, long call = long put
    • Long stock, long put = long call
    • Short stock, short put = short call

      You can prove this to yourself by making up more scenarios as I did above. Draw those hockey stick diagrams to summarize.

      So when you sell a call against your stock position, you are now saying “I prefer total downside and limited upside”.

  3. Is the call worth selling?

    Nobody says “I prefer total downside and limited upside”. But bond investors choose this all the time. Because the relevant question is about PRICE. Any proposition can be ruined or alluring depending on the price. An option’s price is simply a future state of the world discounted by its probability.

    When you sell an option, you don’t earn income. You just bet against some future state of the world. Whether this was a good idea or not depends on the price. Price is the market-implied odds. The actual odds are an imaginary idea. Price is a flesh-and-blood painting of the idea that you can interact with. Unless your day job is to figure out if the depiction of that idea, the price, is accurate, it’s best to assume it is.

    Suppose, instead of selling that 45 strike call for $5 you could sell it for $6. This parallel shifts the hockey stick $1.00 higher.

    This is a more attractive pay-off, but as a covered call writer, you need to ask yourself…is it attractive enough? Let me answer for you.

    You have no idea.

    What would you need to know to even evaluate the question “is it attractive enough”?

    You’d need to know something about the odds of the stock making an X% move by the expiration date. This is mostly what we mean when we say “volatility”. How will you know those odds? You can’t. You can only guess. And that’s what the price was in the first place. The wisdom-of-crowds guess. Do you have a reason to believe you can beat the line? What do see that option price-setters don’t?

    Professional volatility traders have an opinion as to what the fair value of the option is. If they sell an option for more than its alleged “fair value”, some internal accounting systems may allow them to book the excess premium as “income”. But they would call that “theoretical edge” or “theo”, not income. And even that edge is taken with truckloads of salt. 1

How To Respond To Your Advisor

If [insert “options as income” advisor] thinks you should sell calls ask them:

Is the option overpriced?

They won’t say no. They probably also won’t say yes, since how the hell do they know. They’ll say:

“You’ll be happy if the stock gets there.”

Sure you might be happy if the stock goes to your strike. But that’s cherry-picking the point of maximum happiness for any short option position. It’s literally, the short option position’s homerun scenario. Your broker is selling you on the best-case scenario. The remaining win vs lose scenarios are painfully asymmetric:

  • In a scenario where the stock goes up a lot, you are getting unboundedly sadder.
  • In the case where the stock goes down a lot (assuming you weren’t going to sell no matter what), you are better off by the amount of the premium you sold, which is capped.

Do not benchmark your opinion of the trade to stock-grind-up-to-my-short-strike scenario.

The Main Takeaways

  1. In the single stock game, you cannot afford to NOT get piggish results on the upside since most single stocks have awful long-term returns. You will be sad if you invest in securities with unlimited upside but systematically truncate that upside.
  2. Here’s a link to the document I wish I wrote. It’s highly intuitive. It does better than explain. It shows how you are incinerating money if you are selling calls below what they are worth even when you are “just overwriting”.


Final Word

It’s possible your advisor doesn’t totally grok the concept as laid out in QVR’s document or even what I wrote about. They have been bombarded with so much callsplainin’ that the discourse has been vocally one-sided. This post is probably in vain, but perhaps one RIA at a time, we can move past “selling options for income” as they internalize that:

  • The price of the option is central to the proposition.
  • Since what drives price is complex, any discussion about the attractiveness of overwriting becomes more nuanced.

As far as option promoters and authors who treat an entire premium as passive income? Clowns.

If I’m aggressive in saying that it’s because the overwriting fetish is so widespread, there’s nothing to do but make people feel bad about a naive, unsound practice that hinges on “you’ll be happy anyway, even if you lose”. That’s utter garbage. The difference between a winning poker player and a losing poker player might be a single big blind per hour. You cannot afford to just piss away expectancy.

So when you see these promoters you can safely dismiss them as charlatans. We need less of those these days.

You’re welcome for the very simple, reductionist negative screen. I just saved you many hours of brain damage, a trip to Orlando for that “Make $10k Per Week” options seminar, and the $899 “course materials” emblazoned with a pic of someone who probably looks like me2 with slicked-back hair in a rented Lambo. You can smell the Drakkar Noir from the glossy page.

Actual option traders don’t wear suits. And they don’t tell you to sell calls for income.

Moontower #128

Happy Thanksgiving weekend!

This week’s Money Angle will be interesting to anyone who fits any of these categories:

  1. Has a financial advisor
  2. Reads books with the words “passive income” or “financial freedom”
  3. Knows what an option is

Money Angle

First something nice. An amuse-bouche:

That was pleasant enough.

Now violence.

You have heard of selling calls for “passive income”. The pitches which promote this idea are using the word “income” in the same sense that I would earn “income” if I sold you my house for $100. The income is a receipt or a cashflow, but this is just mechanical accounting. I have not earned income in any economic sense of the word. A receipt is not income without considering value given vs value received.

Suppose you own a $50 stock. Imagine you sold the $45 strike call for $5. Imagine the scenarios:

  • Stock goes up: Let’s say it goes to $60
    • $10 profit on stock holdings
    • Call option you are short goes up by $10
    • You are assigned on your call option, your stock is called away, leaving you with no position. P/L =0
  • Stock falls but remains above the strike: Let’s say it goes to $47
    • $3 loss on stock holdings
    • Call option you shorted falls to $2. You earn $3 on that leg.
    • Again, you are assigned on your call option, your stock is called away, leaving you with no position. P/L =0
  • Stock falls below the strike: Let’s say it goes to $40
    • $10 loss on stock holdings
    • Call option you shorted expires worthless. You earn $5 on that leg.
    • Since the call is worthless, you still own the stock and you have a net loss of $5

A few things to observe:

  1. You can only lose. This makes sense. You sold an option at its intrinsic value. Visually:
  2. These scenarios are exactly the same as if you held no stock position and you sold the 45 strike put at $0. This is called “put/call parity”.

    Parity means equal. It means a call is a put and a put is a call. Your stock position combined with the option you are long or short determines your effective position.

    • Long stock, short call = short put (this is all covered calls!)
    • Short stock, long call = long put
    • Long stock, long put = long call
    • Short stock, short put = short call

      You can prove this to yourself by making up more scenarios as I did above. Draw those hockey stick diagrams to summarize.

      So when you sell a call against your stock position, you are now saying “I prefer total downside and limited upside”.

  3. Is the call worth selling?

    Nobody says “I prefer total downside and limited upside”. But bond investors choose this all the time. Because the relevant question is about PRICE. Any proposition can be ruined or alluring depending on the price. An option’s price is simply a future state of the world discounted by its probability.

    When you sell an option, you don’t earn income. You just bet against some future state of the world. Whether this was a good idea or not depends on the price. Price is the market-implied odds. The actual odds are an imaginary idea. Price is a flesh-and-blood painting of the idea that you can interact with. Unless your day job is to figure out if the depiction of that idea, the price, is accurate, it’s best to assume it is.

    Suppose, instead of selling that 45 strike call for $5 you could sell it for $6. This parallel shifts the hockey stick $1.00 higher.

    This is a more attractive pay-off, but as a covered call writer, you need to ask yourself…is it attractive enough? Let me answer for you.

    You have no idea.

    What would you need to know to even evaluate the question “is it attractive enough”?

    You’d need to know something about the odds of the stock making an X% move by the expiration date. This is mostly what we mean when we say “volatility”. How will you know those odds? You can’t. You can only guess. And that’s what the price was in the first place. The wisdom-of-crowds guess. Do you have a reason to believe you can beat the line? What do see that option price-setters don’t?

    Professional volatility traders have an opinion as to what the fair value of the option is. If they sell an option for more than its alleged “fair value”, some internal accounting systems may allow them to book the excess premium as “income”. But they would call that “theoretical edge” or “theo”, not income. And even that edge is taken with truckloads of salt. 1

How To Respond To Your Advisor

If [insert “options as income” advisor] thinks you should sell calls ask them:

Is the option overpriced?

They won’t say no. They probably also won’t say yes, since how the hell do they know. They’ll say:

“You’ll be happy if the stock gets there.”

Sure you might be happy if the stock goes to your strike. But that’s cherry-picking the point of maximum happiness for any short option position. It’s literally, the short option position’s homerun scenario. Your broker is selling you on the best-case scenario. The remaining win vs lose scenarios are painfully asymmetric:

  • In a scenario where the stock goes up a lot, you are getting unboundedly sadder.
  • In the case where the stock goes down a lot (assuming you weren’t going to sell no matter what), you are better off by the amount of the premium you sold, which is capped.

Do not benchmark your opinion of the trade to stock-grind-up-to-my-short-strike scenario.

The Main Takeaways

  1. In the single stock game, you cannot afford to NOT get piggish results on the upside since most single stocks have awful long-term returns. You will be sad if you invest in securities with unlimited upside but systematically truncate that upside.
  2. Here’s a link to the document I wish I wrote. It’s highly intuitive. It does better than explain. It shows how you are incinerating money if you are selling calls below what they are worth even when you are “just overwriting”.


Final Word

It’s possible your advisor doesn’t totally grok the concept as laid out in QVR’s document or even what I wrote about. They have been bombarded with so much callsplainin’ that the discourse has been vocally one-sided. This post is probably in vain, but perhaps one RIA at a time, we can move past “selling options for income” as they internalize that:

  • The price of the option is central to the proposition.
  • Since what drives price is complex, any discussion about the attractiveness of overwriting becomes more nuanced.

As far as option promoters and authors who treat an entire premium as passive income? Clowns.

If I’m aggressive in saying that it’s because the overwriting fetish is so widespread, there’s nothing to do but make people feel bad about a naive, unsound practice that hinges on “you’ll be happy anyway, even if you lose”. That’s utter garbage. The difference between a winning poker player and a losing poker player might be a single big blind per hour. You cannot afford to just piss away expectancy.

So when you see these promoters you can safely dismiss them as charlatans. We need less of those these days.

You’re welcome for the very simple, reductionist negative screen. I just saved you many hours of brain damage, a trip to Orlando for that “Make $10k Per Week” options seminar, and the $899 “course materials” emblazoned with a pic of someone who probably looks like me2 with slicked-back hair in a rented Lambo. You can smell the Drakkar Noir from the glossy page.

Actual option traders don’t wear suits. And they don’t tell you to sell calls for income.