Momentum Psychology

“when assets are expensive, make assets,” they said.

cc @cobie pic.twitter.com/QKCNm2BGRd

— Corey Hoffstein 🏴‍☠️ (@choffstein) January 18, 2022

His tweet brought my attention to @cobie and his masterful description of psychology.

I also appreciated Josh Brown’s take on the sell-off in so-called growth or momentum names. Here’s an excerpt from Jan 31’s It’s not over yet.

I’m less interested in the real-time action. Focus the evergreen psychology instead:

Where do bounces come from in a midst of a correction?

Sometimes it’s just that stocks have fallen too far for sellers to want to keep selling. This isn’t bullish. In fact, this type of bounce can suck people back in by creating the appearance that the worst is over. Growth stocks in particular. Because belief dies hard and enthusiasm for cutting edge technologies fades slowly, not suddenly. Which mean the give-up process is long and drawn out – even after a stock is cut in half sometimes the worst is still yet to come. The slow bleed after is often worse than the initial shocking drop that preceded it.

Over at Verdad Capital, Dan Rasmussen revisits their “Bubble 500” list of overpriced growth stocks, originally created in the Summer of 2020. It’s filled with money-losing companies working in exciting areas of technology such as electric vehicles and gene editing therapy and so on. Needless to say, this list of bubble stocks has gotten absolutely destroyed year-to-date, after having run straight up in Verdad’s face through the middle of 2021. Dan explains two very important things in his update this week: The first is that sell-offs for growth stocks differ from sell-offs for value stocks in one very important way:

This breakdown is significant, especially for growth stocks. Remember, growth stocks trend, and value stocks mean revert. The psychology is simple. People hear about a hot stock that’s gone up 3x, they buy some, it goes up 2x, they buy more: the whole attraction of buying a hot growth stock is the historic return trajectory. Value stocks are the opposite: you do well buying them when they’re down…

This idea is counterintuitive – that some stocks actually become worse buys as they are falling to lower prices, but the explanation is psychological, not financial. Stocks trading at excessive valuations require a fan base to sustain their share prices. That fan base is often a bandwagon-jumping melange of traders and investors who are attracted to recent gains. Yes, they’ll latch onto the fundamental story, but the fact that the stock has been and currently is going up is the main thing. When the stock breaks, so too does the fandom. And when the fan base moves on to greener pastures or runs out of money, a new fan base will not form for this stock with its chart in decline. Broken growth stocks become orphans. There is no natural place for them to find a home.

Momentum is a divergent strategy while “value” is a mean-reverting strategy. Several years ago the research team at OSAM published edifying papers on how these approaches work. I wrote a summary here:

✍️ Notes on OSAM’s Factors from Scratch (6 min read)

Value works by fading overreaction. Momentum is attributed to underreaction. In a name trending higher, the sellers are discounting the substance of new information too aggressively. In dork world, we call this anchoring. If you pay attention to “anomalies” you may recognize the concept of post-earnings drift as an acute example of anchoring. Wikipedia even has an entry for it.

Fear or FOMO in markets cuts both ways. On the way down, we fear a loss of wealth. On the way up we fear social embarrassment — we aren’t keeping up with our neighbors. We are caught between self-preservation and shame. I wonder if being part of the herd is any consolation on the way down while everyone loses. Or is this just another miserable psychological asymmetry inseparable from speculation?

Anyway, I don’t have much to add. Investing requires you to be honest about your desires, constraints, and emotional tolerance. If you can get honest with yourself, you can initiate a plan that you can stick to. You want to avoid ad-hoc decisions with the bulk of your savings (I’m not gonna poo poo on gambling with 1 or 2% of your wealth, especially if it suppresses wider risk-seeking behavior. Agustin Lebron’s Laws of Trading has a provocative section about “risk set points” that operate like weight set points. If your life becomes too dull in one way you spice it up in another and vice versa. Maybe Alex Honnold’s portfolio is all in bonds 🤷🏽).

Oh and just a quick observation that you can ponder in the context of those flashy, earningless momentum stocks. If you start at $1 and double 6x you get to $64. If the stock drops 50%, you’ve only erased 1 halving.

Be careful knife catching.

Moontower #142

First, a giant thank you for reading this letter. This is Moontower’s 3-year anniversary. This week the 3,000th subscriber joined. Thank you to the 45 people or so who agreed to read that very first issue.

Writing online allowed me to unlock myself in ways that I couldn’t without your support. So truly, thank you.

Ok friends, let’s proceed.

This week, I published the sequel to There’s Gold in Them Thar Tails.

It begins with a recap of part 1:

We saw that an explosion of choice whether it’s a job or college applicants, songs to listen to, athletes to recruit has made selection increasingly difficult.
A natural response is to narrow the field by filtering more narrowly. We can do this by making selection criteria stricter or deploying smarter algorithms and recommendation engines.
This leads to increased reliance on legible measurements for filtering.
Goodhart’s law expects that the measures themselves will become the target, increasing the pressure on candidates to optimize for narrow targets that are imperfect proxies or predictors of what the measure was filtering for.
Anytime we filter, we face a trade-off between signal (“My criteria is finding great candidates”) and diversity. This is also known as the bias-variance trade-off.
Diversity is an essential input to progress. Nature’s underlying algorithm of evolution penalizes in-breeding.
In addition to a loss of diversity, signal decays as you get closer to the extremes. This is known as tail divergence. The signal can even flip (ie Berkson’s Paradox).
The point where the signal noise overwhelms the variance in the candidates is an efficient cutoff. Beyond that threshold, selectors should think more creatively than “just raise the bar”.

Part 1 ends with a discussion of strategies for selectors and selectees.

Part 2 extends the discussion with what tail divergence says about life and investing.

✍️There’s Gold In Them Thar Tails: Part 2 (24 min read)

It’s a long post including footnotes, but there is a large section about options trading that will only appeal to masochists.

The post roadmap:

We begin with the challenge of scaling our moral intuitions up to an age where our ethics must be explicitly coded. AI and automation require making species-level questions less rhetorical.
The prescription is humility. The simple math of regression shows this as correlations break down or invert in the extremes. The CAPM to Hedging post was actually a diversion that ended up being a stand-alone post as I was writing the tiny math section in Part 2.
From there we move into trading and investing strategies to exploit our misunderstanding of extremes. 2 words: respect path. We start with a story of a famous investor/governor who stubbornly didn’t respect path.
We then talk about familiar path-respecting approaches to investing: care with leverage, appreciating “rebalance timing luck” (hugs to@choffstein) and finally thinking about path vs terminal value.
This opens the door to a discussion of trade expressions and the need to map them tightly to our isolated trade premises. I use options to demonstrate 3 path-aware approaches: static, dynamic, combined.
The combined approach is touched on briefly. It’s technical but not complicated and lends itself more to a video (maybe one day). It’s also an example of why it’s useful to understand option structures and the basic arbitrage relationships.
Then we move to general investing styles that respect path: venture and “gorilla” investing. They both know what they don’t know about extreme outcomes and construct strategies that are robust to that reality.I’m not an expert in those approaches but was drawn to their common link of manufacturing convexity. Convexity is not volatility or leverage. It’s the slope of your p/l steepening in the direction of the market because your position size changes.
When your signals are weak as they are for extreme outcomes, you want to preserve convexity into the unknown. If you can do that, you can funnel wider. This can be higher yielding than tuning your signals harder.
Review and concluding remarks — Happy prospecting!

Money Angle

Speaking of Corey Hoffstein:

“when assets are expensive, make assets,” they said.

cc @cobie pic.twitter.com/QKCNm2BGRd

— Corey Hoffstein 🏴‍☠️ (@choffstein) January 18, 2022

His tweet brought my attention to @cobie and his masterful description of psychology.

I also appreciated Josh Brown’s take on the sell-off in so-called growth or momentum names. Here’s an excerpt from Jan 31’s It’s not over yet.

I’m less interested in the real-time action. Focus the evergreen psychology instead:

Where do bounces come from in a midst of a correction?

Sometimes it’s just that stocks have fallen too far for sellers to want to keep selling. This isn’t bullish. In fact, this type of bounce can suck people back in by creating the appearance that the worst is over. Growth stocks in particular. Because belief dies hard and enthusiasm for cutting edge technologies fades slowly, not suddenly. Which mean the give-up process is long and drawn out – even after a stock is cut in half sometimes the worst is still yet to come. The slow bleed after is often worse than the initial shocking drop that preceded it.

Over at Verdad Capital, Dan Rasmussen revisits their “Bubble 500” list of overpriced growth stocks, originally created in the Summer of 2020. It’s filled with money-losing companies working in exciting areas of technology such as electric vehicles and gene editing therapy and so on. Needless to say, this list of bubble stocks has gotten absolutely destroyed year-to-date, after having run straight up in Verdad’s face through the middle of 2021. Dan explains two very important things in his update this week: The first is that sell-offs for growth stocks differ from sell-offs for value stocks in one very important way:

This breakdown is significant, especially for growth stocks. Remember, growth stocks trend, and value stocks mean revert. The psychology is simple. People hear about a hot stock that’s gone up 3x, they buy some, it goes up 2x, they buy more: the whole attraction of buying a hot growth stock is the historic return trajectory. Value stocks are the opposite: you do well buying them when they’re down…

This idea is counterintuitive – that some stocks actually become worse buys as they are falling to lower prices, but the explanation is psychological, not financial. Stocks trading at excessive valuations require a fan base to sustain their share prices. That fan base is often a bandwagon-jumping melange of traders and investors who are attracted to recent gains. Yes, they’ll latch onto the fundamental story, but the fact that the stock has been and currently is going up is the main thing. When the stock breaks, so too does the fandom. And when the fan base moves on to greener pastures or runs out of money, a new fan base will not form for this stock with its chart in decline. Broken growth stocks become orphans. There is no natural place for them to find a home.

✍️ Notes on OSAM’s Factors from Scratch (6 min read)

Be careful knife catching.

From My Actual Life

I wrote this 1 year ago but I’m reprinting it with updated ages. I flew into NYC on St. Patrick’s day this week, 6 years after this story. I spent yesterday at my nephew’s birthday in NJ.

St. Patrick’s Day now reminds me of a story that is now 6 years old…

March 17, 2016. I flew into NYC for a 36-hour business trip. I was hopping around the city meeting with bank derivative sales desks. Routine relationship maintenance. I planned poorly. I was late to every meeting since you can’t cross 5th Ave during the St Patty’s parade.

Anyway, that evening I was at dinner as a client. When I went to the restroom I checked my phone. My family chat was blowing up.

My sister just had a baby.

I hadn’t told my east coast fam I was in NYC because it was just a quick trip. But right then, I called my mom in NJ and stunned her with the knowledge that I was an hour away in NYC.

When I returned to the table, I excused myself from dinner, hopped on a bus to my childhood house in Hazlet, borrowed my mom’s car and drove down to Jersey Shore Medical Center.

It was close to midnight.

When I walked into the hospital room I’ll never forget my sister’s look of ‘what are you doing here?’

I got to meet my new nephew, spent an hour chatting with my sis and her husband, and made it back to NYC with enough time to grab my bags and get back to JFK.

Since then, St Patrick’s Day has meant much more than day drinking.

Happy 6th birthday to my nephew!

There’s Gold In Them Thar Tails: Part 2

This is Part 2 of a discussion of how sourcing talent or outcomes in the tails or extremes of a distribution call for our selection criteria to embrace more variance than searches in the heart of a distribution. To catch up please read There’s Gold In Them Thar Tails: Part 1.

If you can’t be bothered here’s the gist:

We saw that an explosion of choice whether it’s a job or college applicants, songs to listen to, athletes to recruit has made selection increasingly difficult.
A natural response is to narrow the field by filtering more narrowly. We can do this by making selection criteria stricter or deploying smarter algorithms and recommendation engines.
This leads to increased reliance on legible measurements for filtering.
Goodhart’s law expects that the measures themselves will become the target, increasing the pressure on candidates to optimize for narrow targets that are imperfect proxies or predictors of what the measure was filtering for.
Anytime we filter, we face a trade-off between signal (“My criteria is finding great candidates”) and diversity. This is also known as the bias-variance trade-off.
Diversity is an essential input to progress. Nature’s underlying algorithm of evolution penalizes in-breeding.
In addition to a loss of diversity, signal decays as you get closer to the extremes. This is known as tail divergence. The signal can even flip (ie Berkson’s Paradox).
The point where the signal noise overwhelms the variance in the candidates is an efficient cutoff. Beyond that threshold, selectors should think more creatively than “just raise the bar”.

At the end of part 1, there were strategies for both the selector and the selectees to increase diversity to improve outcomes in the extremes.

If narrower filters are less effective in the tails (ie more noise, weaker correlations between criteria and match quality), we should be intentional about the randomness we introduce to the process. A 1500 SAT is a noisy predictor of “largest alumni donor 20 years from now”. Instead, accept the 1350 SAT from the homeschooled kid in Argentina. Experiment with criteria and let chance retroactively hint at divergent indicators that you would never have thought to test. One of the benefits of such an experiment is that if you are methodical about how you introduce chance you can study the results for a hidden edge. If nobody else has internalized this thinking because they think it’s too risky (it’s not…the signal of the tighter filter had already degraded), then you have an opportunity to leap ahead of your competitors who underestimate the optionality in trying many recipes and keeping the ones that taste good. You tolerate some mayonnaise liver sandwiches before you discover pb&j.

In part 2, we reflect on what tail divergence says about life and investing.

Where Instincts Fail

Tail divergence is the simple observation that attributes that correlate with certain outcomes lose their predictive ability as we get into the extremes. If you are 6’7, you’re better at basketball than most of the population. But you couldn’t step foot on the hardwood with the lowly Rocket’s 12th man. Taken further, Berkson’s Paradox shows that it’s possible for the correlation to flip. LessWrong thinks the flippening may be causal because of too much of a good thing:

Maybe being taller at basketball is good up to a point, but being really tall leads to greater costs in terms of things like agility… Maybe a high IQ is good for earning money, but a stratospherically high IQ has an increased risk of productivity-reducing mental illness. Or something along those lines.

The safest generalization to absorb:

When speculating about the tails of a distribution your intuition is less reliable.

If you can pinpoint causality, that’s a bonus. Simply realizing your guesses about extremes is random is an advantage. It splits your brain wide open to get your imagination oxygen.

Behavioral psychology recognizes the usefulness of heuristics to make judgements while highlighting how “biases” such as framing can short-circuit our “System 1” machinery. Intuition is a useful guide when we have deep experience in a domain, but we should seek external data (base rates) or guidance when we stray from the mundane.

If our intellectual adventures take us from “mediocrastian” to “extremistan” then data is not necessarily a helpful tour guide. It can even be harmful if it encourages a false sense of security or a load-bearing assumption that turns out to be hollow ¹.

A recent example of intuition failing in an extreme scenario still stings. When Covid first started spreading in the US, asset prices and city rents dove lower. Financial markets stabilized and began recovering when the government commit to replacing lost demand with an unprecedented fiscal package for an unprecedented event. My suburban house shot up 15% in value as locked-down city dwellers wanted more space. Seeing the divergence between home price and rentals, I quickly diagnosed the home price bump as a premium needed to absorb a sudden, but transitory urban exodus until we could get a vaccine. While it wasn’t the main consideration for selling the “trade setup” was not lost on me. My intuition in this extreme scenario couldn’t have fathomed that the price would shoot 20% more (and still going, ughh) through where I sold as the lockdowns lifted. My trading intuition degrades less gracefully than I’d like to admit as the orbits get further from financial options.

Moral Intuition

As technology and science fiction converge, it would be dangerous to lazily extrapolate how we handle routine computer-enabled behavior to edge cases. If you have ever played dark forms of “would you rather?” then you are already familiar with the so-called trolley problem:

credit: abpradio.com

The Conversation explains the so-called trolley problem in the context of self-driving cars:

The car approaches a traffic light, but suddenly the brakes fail and the computer has to make a split-second decision. It can swerve into a nearby pole and kill the passenger, or keep going and kill the pedestrian ahead.

This is spiky terrain. What is the value of a life? This is not a novel dilemma. In Tails Explained, I show how courts use probabilities of accidental (ie rare) deaths to estimate tort damages. What is novel is the scale of these considerations once robots take the wheel. The giant fields of AI safety and ethics are proof that scaling up tort law is not going to cut it. We are forced to explicitly study realms that ancient moralities only needed to consider rhetorically.

In Spot The Outlier, Rohit writes:

the systems we’d developed to intuit our way through our lives have difficulty with contrived examples of various trolley problems, but that’s mainly because our intuitions work in the 80% of cases where the world is similar to what we’ve seen before, and if the thought experiment is wildly different (e.g., Nozick’s pleasure machine) our intuitions are no longer a reliable guide.

In The Tails Coming Apart As A Metaphor For Life, Slatestarcodex says:

This is why I feel like figuring out a morality that can survive transhuman scenarios is harder than just finding the Real Moral System That We Actually Use. There’s a potentially impossible conceptual problem here, of figuring out what to do with the fact that any moral rule followed to infinity will diverge from large parts of what we mean by morality.

A wave of exponential automation threatens to capsize our moral rafts. Slatestar invokes one of my favorite paragraphs² of all-time to make his point.

When Lovecraft wrote that “we live on a placid island of ignorance in the midst of black seas of infinity, and it was not meant that we should voyage far”, I interpret him as talking about the region from Balboa Park to West Oakland on the map above [This is a metaphor for moral territory he builds in the full post].

Go outside of it and your concepts break down and you don’t know what to do.

The full opening paragraph of Call Of Chtulu deserves your eyes:

The most merciful thing in the world, I think, is the inability of the human mind to correlate all its contents. We live on a placid island of ignorance in the midst of black seas of infinity, and it was not meant that we should voyage far. The sciences, each straining in its own direction, have hitherto harmed us little; but some day the piecing together of dissociated knowledge will open up such terrifying vistas of reality, and of our frightful position therein, that we shall either go mad from the revelation or flee from the deadly light into the peace and safety of a new dark age.

Slatestar edits Lovecraft:

The most merciful thing in the world is how so far we have managed to stay in the area where the human mind can correlate its contents.

This is not an optimistic outlook for our ability to reconcile our based local morality with a species-level perspective. Reasoning about extremes is more futile than we’d like to think. As we search for outliers, we need humility.

Even The Math Prescribes Humility

Let’s translate tail divergence to math terms. We discussed how SAT has predictive power of GPA. The issue is that this power loses efficacy as we get to the top-tier of GPAs, just as being tall starts to tell us less about the best basketball players once we are dealing with the sample that has made it to the NBA.

This loss of signal manifests as a correlation breakdown over some range of the X or explanatory variable. This is the result of the error terms or variance in a regression increasing or decreasing over some range. The fancy word for this is “heteroscedasticity”.

See this made-up example from 365DataScience:

The variance of the errors visibly changes as we move from small values of X to large values.

It starts close to the regression line and goes further away. This would imply that, for smaller values of the independent and dependent variables, we would have a better prediction than for bigger values. And as you might have guessed, we really don’t like this uncertainty.

Ordinary least squares (ie OLS) regression is a common technique for computing a correlation. However, equal variance (homoscedasticity) is one of the 5 assumptions embedded in OLS. Tail divergence is evidence that the data set violates this assumption, so we shouldn’t be surprised when the filters we used in the meat of the distributions lose efficacy in the extremes.

If you broke the regression into 2 separate lines, one for the low to middle range of SAT scores and one for the top decile of SAT scores we could compute different correlations to GPA. If the tails diverge, we would see a lower correlation for the higher range. Correlations even as high as 80% have discouraging amounts of explanatory power.

For the derivation, see From CAPM To Hedging.

We shouldn’t be surprised when the most successful person from your 8th grade class, wasn’t a candidate for the “most likely to succeed” ribbon. The qualities that informed that vote leave a lot of “risk remaining” when trying to predict the top performers in the wide-open game of life.

Since the nature of extremes are untamed, we need humility. This is true, but abstract. What does “humility” mean practically? It means making decisions that are robust to the lack of determinism in the tails. In fact, we can construct approaches that actively seek to harness the variance in the tails.

The world of trading and investing is a perfect sandbox to explore such approaches.

Take Advantage of Poor Tail Intuition In Investing

I know the heading is ironic.

Let’s see if we can use “option-like” approaches to use the divergence or uncertainty in the tails to our advantage.

Respect Path

Rohit summarized the argument succinctly:

If measurement is too strict, we lose out on variance.
If we lose out on variance, we miss out on what actually impacts outcomes.

Tails are unpredictable by the same models that might be well-suited for routine scenarios. In fact, rare outcomes can be stubbornly resistant to description by any models in a complex system. The robust response to this situation is not to lean into our models but to relax the filters in favor of diversity, which increases our chance of capturing an outcome nobody has foreseen, because, by definition, nobody’s model could have predicted (and therefore bid it up) in the first place.

How do you do that?

2 words: Respect. Path.

Recall from part 1, that David Epstein’s research-based suggestion:

One practice we’ve often come back to: not forcing selection earlier than necessary. People develop at different speeds, so keep the participation funnel wide, with as many access points as possible, for as long as possible. I think that’s a pretty good principle in general, not just for sports.

What does this mean in a trading context?

This is easy to explain by its opposite. Let’s rewind a decade. Jon Corzine managed to blow up MF Global by focusing on the belief that European bonds (remember the Greek bond crisis?) would pay out in the end and placing that bet with extreme leverage. While the bonds eventually paid out, the margin calls buried MF Global. This is a common story. I chose it because it exemplifies how a lack of humility is the murder weapon.

The moment you employ leverage, you are worshiping at the altar of path. Corzine refused to make the appropriate sacrifices to the gods. He focused on the terminal value of the bonds. A focus so myopic, Corzine still stubbornly clings to the idea that he was right. [I once went to dinner with an option trader who worked closely with Corzine. He described him as both smart and unfazed in his path-blindness. I’d like to take issue with “smart” but he’s the one giving a fortune away, so I’ll just shut up.]

He might be rich, but if you were a stakeholder or client in MF Global, he’s a villain. Let’s not be like Jon Corzine.

Ways To Respect Path

Treat leverage with respect

The most common forms of financial leverage we employ are mortgages. The primary path risk here is needing to re-locate suddenly and potentially needing to sell at a bad time. If there are many potential forks on your horizon, the liquidity in renting can be worth it³.

“Rebalance timing luck”

This is a term coined by Corey Hoffstein in his paper The Dumb (Timing) Luck of Smart Beta. First of all, this topic is central to any analysis of performance. You can have 10 different trend-following strategies with the same approximate rules but if they vary in their execution by a single day, the impact of luck can be tyrannical. Imagine one strategy was long oil the day it went negative, another strategy got out of the position one day earlier. Is the difference in performance predictive? It’s a bedeviling issue for allocators trying to parse historical returns.

If timing is not part of your alpha, then leaving it to chance can swamp the edge you worked so hard to find, capture, and market to investors. This is a recipe for disappointment for either the manager (who gets unlucky) or the investor who chose the fund from a crop of competitors based on noise.

Respecting path means smoothing the effect of rebalance timing luck. This is commonly done by dividing a single strategy into multiple strategies differing only by their rebalance schedule. The ensemble will average the luck across executions, hopefully adhering the results closer to its intended expression.

Path vs terminal value thinking

Corzine had a terminal value opinion (“if I hold these bonds to maturity I’ll get paid”). Still, any trade that is marked-to-market must still weather path. Leverage makes the trade acutely fragile with respect to path. Even if his bet was a good one at the time, the expression was negligent because it did not properly reflect his constraints.

It’s critical that the expression of a bet clings closely to its thesis. If you want to bet on the final outcome of a trade, you need to insulate the expression from path. Similarly, you can bet on path while being indifferent to the final outcome. For example, a momentum investor may devise a rule-based strategy to levitate with an inflating bubble but exit before holding the bag. These participants bet on path not terminal value. The past few years have glorified such a game of hot potato.

Whether this game of hot potato is really a game of Russian roullete depends on the expression. Many momentum strategies use stops or trailing stops to escape a trade where the trend has petered out or reversed. This expression mimics a long option position. They are creating unbounded upside and limiting their downside. This expression is banking on a dangerous assumption: liquidity. They are constructing a “soft” option presumably because they think it’s cheaper than purchasing a financial or what I call a “hard” or contractual option.

Let’s ignore realized volatility which is a first order determinant of whether the option is cheaper. The biggest problem is gap risk. Soft-option constructions assume continuity. But we know technology breaks, markets close, stocks get halted, countries invade each other, exchanges cancel trades. Pricing gap risk is impossible. That’s why derivative traders say the only hedge for an option is a similar option. Trading strategies are said to be robust to model risk if they contain offsetting exposures to the same model. If you’re short a call option on TSLA the only real hedge is to be long a different TSLA call. Reliance on the mathematical model cancels out.

Zooming in on options (feel free to skip and jump down to Investing for Path)

Some market participants focus on terminal value or the “long run” while others are focused on path. The price of options are consensus mechanisms that balance both views. I discussed this in What The Widowmaker Can Teach Us About Trade Prospecting And Fool’s Gold:

The nat gas market is very smart. The options are priced in such a way that the path is highly respected. The OTM calls are jacked, because if we see H gas trade $10, the straddle will go nuclear.

Why? Because it has to balance 2 opposing forces.

1. 1. 1. It’s not clear how high the price can go in a true squeeze or shortage
    2. The MOST likely scenario is the price collapses back to $3 or $4.

Let me repeat how gnarly this is.

The price has an unbounded upside, but it will most likely end up in the $3-$4 range.

Try to think of a strategy to trade that.

Good luck.

- - - Wanna trade verticals? You will find they all point right back to the $3 to $4 range.
    - Upside butterflies which are the spread of call spreads (that’s not a typo…that’s what a fly is…a spread of spreads. Prove it to yourself with a pencil and paper) are zeros.

The market places very little probability density at high prices but this is very jarring to people who see the jacked call premiums.

That’s not an opportunity. It’s a sucker bet.

Investors with different time horizons often trade with each other. It’s even possible they have the same long-term views but Investor A thinks X is overbought in the near-term and sells to Investor B who just wants to buy-and-hold. Investor A is hoping to buy X back cheaper. They are trying to time the market and generate trading P/L, expecting to find a more attractive entry to X later. Perhaps A is a trader more than an investor. A is obsessively conscious of near-term opportunity costs or hurdle rates. As an options trader, I am generally more focused on path than terminal value.

Let’s see how trade expression varies with your lens of terminal value vs path.

Static Expressions

A static trade expression means you put your trade on and leave it alone until some pre-defined catalyst. For options this is typically expiration. The reason you might do this is you are aware that you cannot predict the path but do not want to be shaken out of the position because you like the odds the market is offering on the terminal value of a proposition. To use natural gas, suppose the gas futures surge to $6 amidst a polar vortex but you think there is a 25% chance the price falls to $4.50 by expiration.

Suppose you can buy a vertical spread that pays 4-1 on that proposition. The bet is positive expectancy so you decide to take it. This is a discrete bet. The worst-case scenario is losing your premium. You can size the trade by feel (I’m willing to risk 1% to make 4%) or some version of Kelly sizing. Instead of trading towards a target amount of risk (whether that’s delta, vega, etc) you budget a fixed dollar amount towards it and let it ride. I refer to this type of bet as “risk-budgeting”.

When “risk-budgeting” a trade you specify a fixed bet size and you do not use leverage or pseudo-leverage (for example taking a short option position which demands margin). The point is to set-it-and-forget-it.

These types of trades were a small minority of my allocations, but they are the easiest to manage. By design, you are not getting cute with the expression, because you expect the path to your possible outcome to be hairy. This is a self-aware strategy for respecting path.

Dynamic Expressions

Most of my trades were actively managed. Running a large options portfolio means lots of churn as you whack-a-mole opportunities. You find more attractive positions to warehouse than what’s currently on the books, or perhaps you are adding to get to a more full-size position.

The key is most of the focus is on path not terminal value. Sometimes I’m buying vol because I have a view on volatility, but often I’m buying vol if I think there are going to be more vol buyers. The first kind of buying is a hybrid of path and terminal value thinking, but the second type of vol buying has a momentum mindset. My view on realized vol takes a backseat to my view on flows if I think the option demand will exceed supply at current levels of implied volatility.

Other dynamic trade expressions:

1. Implied sentiment
  
  Another path-aware expression is to bet on the expectations embedded in prices. I might load up on oil calls not because I think oil is going to $200, but because I think the awareness that such a price is possible can emerge due to some catalyst (“saber-rattling”). I’m thinking in terms of path not terminal value when my thesis is “sentiment can go from apathy to fear”. I’m betting on a change in the Overton Window. The change in sentiment can increase call option implied vols and even the futures. But the option trade expression is a purer play than the futures.
  
  [The number of ways an oil future can rise is greater than the number of drivers to push oil call skew higher, so the call options isolate the thesis better by being directly levered to it. Agustin Lebron’s 3rd Law Of Trading: Only take the risks you are paid to take.]
2. Owning the wing
  
  Tail options are on average “expensive” in actuarial terms. But there are several reasons why I do not short them.
  1. “Average” is hiding a lot of detail. The excess premium in those options can be proportionally small to what those options can be worth conditional on stressed states of the world. Buying them when they are relatively cheap to their own elevated premiums can be worthwhile, especially if those options put you in the driver’s seat when the world starts melting down. If you are the only one with bullets in a warzone, there’s a good chance you have them because the terminal-value-Jon-Corzine crowd underestimated path. Then you can sell the options “closing” at truly outlandish prices. I want the tails because I don’t want to be running a trading business with a prime broker’s trapdoor beneath me.
  2. I’m not smart enough to know when to sell tail options opening. I buy them when they are relatively cheap (which usually still means expensive to Corzine brains) and I sell them closing when they go nuclear. Like when you throw some insane offer out there and it gets taken. As a rule you don’t want to sell wings to someone who spent more than a few moments thinking about it or used a spreadsheet or model or calculator or star chart. You sell them to people who are forced to buy them. When Goldman blows their customer out they don’t haggle.
    
    In practice, ratio put spreads look attractive to terminal value people who like to “buy the one and sell the two” because their breakeven is so “far” out-of-the-money and they get to win on medium drawdowns. I often like to sell the 1 and buy the 2 because conditional on the 1×2 “getting there”, the 2 are going to be untouchable.
    
    [The buyer of the one in a 1×2 is happiest in the grinding trend scenario where strike vols underperform the skew.]
  3. In The “No Easy Trade” Principle I explain how implied market parameters do not vary as widely as realized parameters because markets are discounting machines⁴.
    
    Markets bet on mean reversion. Vols often underreact when they are rising (or falling) as the regime changes. These turns can be great path trades. They are momentum opportunities to lift or hit slower participants who are anchored to the prior regime. These opportunities are very profitable since you are not only putting the bet on the right way, but you are able to get liquidity from stale actors. (The trouble with many opportunities is getting liquidity — if you know something is going up but everyone else does too, your signal is valid but insufficiently differentiated. Turning every measly 5 lot offer into a new bid makes the market more efficient without extracting a reward for it. In fact, if you do that, you don’t understand expectancy or the principle of maximization. Your job isn’t to correct incorrect markets. It’s to make money. The overlap is imperfect.) The challenge is you somehow need to not be anchored yourself ⁵.
  4. Humility is recognizing that the craziest event has yet to happen. Market shocks are a feature. They look different every time because we prepare for the last war. The instruments that measure our vitals become the targets themselves. Tail options provide volatility convexity, or exposure to “vol of vol”. You don’t need to know the nature of the next shock to know that you will have wanted vol convexity. See Finding Vol Convexity.

Combining Expressions

I’ll mention this for completeness but it’s a topic I should probably do a video for. It’s not complicated but it’s a bit technical for a post like this. When running an options book, it’s possible to treat some of the positions dynamically and some of them statically. In practice, I “remove” line items that have well-defined risks from of my position at the most recent mark-to-market value so that I do not incorporate their Greeks into my book. I don’t hedge it with the rest of the pile.

For example, if I notice an out-of-the-money put spread on my books, instead of dynamically managing a position that was short a tail, I’d put the spread in another account and sell the corresponding delta hedge associated with it. Going forward it would not generate any Greeks in my main risk view so there’s no need to hedge (remember hedging is a cost). The risk is sequestered to the premium. Let’s say it’s $75,000 worth of put spreads. The expectancy of the spread is presumably zero, so it’s like having a simple over/under bet on the books. If expiration goes my way I get to make a multiple of that, but I know the worst (and most likely) case is losing $75k which given the size of the book is noise. If my capital swamps the risk, there’s no point in hedging it especially since it’s short a tail that’s sensitive to vol of vol.

Investing for path

VCs

Venture capital is a strategy that is robust to path. The fact that the portfolio marks are fairy dust helps, but in this context is not important. Why is venture a strategy that exploits divergence in the tails?

Because from its construction, it admits it doesn’t know much. If you believe you are sampling from start-ups that have a power-law distribution (admittedly a big “if”), then the correct strategy is indeed to “spray and pray”⁶.

Byrne Hobart piggybacks Jerry Neumann in his explanation:

One of my favorite blog posts on venture returns is Jerry Neumann’s power laws in venture. His key point is that if venture returns follow a power-law distribution, average returns rise indefinitely as you get a bigger sample set. There is no well-defined mean! If you measure adult height, you quickly converge on 5’9” for American men and 5’4” for American women. You will find outliers, but they’re equally common at both ends of the distribution. But if you measure startup investing returns, you’ll keep getting tripped up: flop, failure, failure, flop, Google, fad, fraud, freaky scandal, Facebook…

Does this imply that the ideal strategy for venture is to invest in as many companies as possible? If you’re sampling from a power-law distribution, that’s what you should do.

Lux Capital partner Josh Wolfe’s approach epitomizes the spirit of searching for gold in the tails. On Invest Like The Best, he explained his investing beliefs:

Confident that curiosity, following leads, and relentlessness will lead you to the next idea.
Confident you won’t know when or how you happen upon the idea.
Confident that the idea lies in the edges of companies that are doing innovative things, often from first principles or science, and very few people are looking there.

These principles propagate from a commitment to benefitting from optionality and positive convexity of non-linear relationships.

The key line follows:

When analyzing how they found deals it only made linear, narrative sense after the fact.

This is reinforced in On Contrarianism, where I quote Wolfe as well as Marc Andreesen and trader Agustin Lebron on why the best investments start out controversial. The gist is that an idea must be so radical and far-fetched that it doesn’t get bid up while also being possible. The intersection of great ideas after-the-fact that sound dumb before-the-fact is nearly invisible. Most ideas people think are dumb, are indeed, dumb. Venture understands this and systematically wraps a sound process around a low hit rate.

“Gorilla” Investing

Gorilla investing is another strategy designed to look like a long option. The gist of it is to invest an equal amount in a list of candidates that are competing for a giant market. As the winners start pulling away, you shed the losers and reallocate the proceeds back into the winners.

Since it rebalances away from losers into winners, it explicitly bets against mean reversion. It’s a divergent strategy that growth investors employ in winner-take-all sectors⁷.

The strategy requires extensive judgment, but I highlight it as another example of an investing algorithm with roots in epistemic humility. If you want to learn more about this strategy see the notes for Gorilla Game or pick up the book.

Conclusion

Like venture or Rohit’s advice on recruiting, gorilla investing casts a wide net from a sufficiently narrowed field and lets attrition decide where to allocate more. In Where Does Convexity Come From? I explain that that the essence of convexity is a non-linear p/l resulting from a change in your position size in the same direction as the return of your position. Your exposure to a winning trade grows the more it wins.

Byrne writes:

Since venture success is defined by dealflow, i.e. by whether or not you have a chance to invest in the hottest companies, the main function of the Series A investment is to get a chance to invest in Series B and Series C and so on. Arguably, the better the fund, the more of its real value today consists of pro-rata rights rather than the investments themselves.

That’s a general case of positive convexity: the better the situation, the higher your exposure.

This is the essence of capturing the upside when our signals struggle to parse winners from an exclusive field. If we cannot predict what will happen in the tails, the next best thing is the ability to increase our exposure to momentum when it’s going our way. This begins with humility and funneling wider than our instincts suggest. From that point, we let actual performance provide us with incremental information on what works and what doesn’t.

Contrast this with a model that takes itself more seriously than tail correlations warrant. The model is filtering prematurely. We don’t look for tomorrow’s star athletes amongst the best 8-year-olds because we know puberty is a reshuffling machine.

Keep in mind:

Correlations break down or invert in the extreme
Make your selections robust to path or possibly taking advantage of it.
Systematize finding gold in diversity. There’s a decent chance others won’t be looking there.

Happy prospecting!

If you use options to hedge or invest, check out the moontower.ai option trading analytics platform

Moontower #141

Let’s start with a question from Twitter:

If you had to our your whole net worth into a single company right now (public or private), what would you pick?

Berkshire Hathaway is not allowed as an answer pic.twitter.com/ruCjbC6bZO

— Patrick OShaughnessy (@patrick_oshag) March 4, 2022

This is a provocative question. Patrick was clever to disallow Berkshire.

As I was working on the second part of There’s Gold In Them Thar Tails, I got distracted by that tweet.

My Reaction To The Question

I don’t know anything about picking stocks. I do know about the nature of stocks which makes this question scary. Why?

Stocks don’t last forever
Many stocks go to zero. The distribution of many stocks is positively skewed which means there’s a small chance of them going to the moon and a reasonable chance that they go belly-up. The price of a stock reflects its mathematical expectation. Since the downside is bounded by zero and the upside is infinite, for the expectation to balance the probability of the stock going down can be much higher than our flawed memories would guess. Stock indices automatically rebalance, shedding companies that lose relevance and value. So the idea that stocks up over time is really stock indices go up over a time, even though individual stocks have a nasty habit of going to zero. For more see Is There Actually An Equity Premium Puzzle?.
Diversification is the only free lunch
The first point hinted at my concern with the question. I want to be diversified. Markets do not pay you for non-systematic risk. In other words, you do not get paid for risks that you can hedge. All but the most fundamental risks can be hedged with diversification. See Why You Don’t Get Paid For Diversifiable Risks. To understand how diversifiable risks get arbed out of the market ask yourself who the most efficient holder of a particular idiosyncratic risk is? If it’s not you, then you are being outbid by someone else, or you’re holding the risk at a price that doesn’t make sense given your portfolio choices. Read You Don’t See The Whole Picture to see why.

My concerns reveal why Berkshire would be an obvious choice. Patrick ruled it out to make the question much harder. Berkshire is a giant conglomerate. Many would have chosen it because it’s run by masterful investors Warren Buffet and Charlie Munger. But I would have chosen it because it’s diversified. It is one of the closest companies I could find to an equity index. Many people look at the question and think about where their return is going to be highest. I have no edge in that game. Instead, I want to minimize my risk by diversifying and accepting the market’s compensation for accepting broad equity exposure.

In a sense, this question reminds me of an interview question I’ve heard.

You are gifted $1,000,000 dollars. You must put it all in play on a roulette wheel. What do you do?

The roulette wheel has negative edge no matter what you do. Your betting strategy can only alter the distribution. You can be crazy and bet it all on one number. Your expectancy is negative but the payoff is positively skewed…you probably lose your money but have a tiny chance at becoming super-rich. You can try to play it safe by risking your money on most of the numbers, but that is still negative expectancy. The skew flips to negative. You probably win, but there’s a small chance of losing most of your gifted cash.

I would choose what’s known as a minimax strategy which seeks to minimize the maximum loss. I would spread my money evenly on all the numbers, accept a sure loss of 5.26%. The minimax response to Patrick’s question is to find the stock that is the most internally diversified.

This led me to write a post launching into the basics of regression, correlation, beta hedging and risk. Especially the concept of “risk remaining” which contains practical and surprising intuition.

It is a topic that affects traders and investors. It also ties back poignantly to the ideas of tail divergence I’m writing about in part 2 of There’s Gold In Them Thar Tails.

The little detour involves math but I move slowly and try to offer footholds of intuition along the way.

[Trying to simply difficult topics is absolutely my intention. If I’m writing over your head, I’m doing something wrong. Tell me. Help me help you.]

Here you go:

✍️From CAPM To Hedging (16 min read)

Money Angle

I recently read Laws Of Trading by Agustin Lebron. It’s exceptional.

Here’s my review. There’s a link with my notes at the end.

If I ran a trading firm this would be Day 1 reading. After finding out where the bathroom is and filling out your W-4, you would be handed this book and told to finish it by tomorrow. After 1 year on the job, you are required to re-read it. There are many sentences in this book that serve as somewhat off-hand or connecting, but are deeply insightful. The kinds of things that would not be perceived by a novice but veterans will recognize they are reading something by a deeply experienced professional. As a veteran of options trading, I found this feature makes the book transcend being informative into being delightful. Trading is the art of decision-making turned into a high-rep game. It requires:

multi-level thinking

sound epistemology

discipline

self-awareness

self-honesty

alignment

humility

curiosity

competitiveness

collaboration

creativity

comparing

It is deeply intertwined with technology, math, and economic reasoning. This book is an instant classic. The rules in the book are reductions of vast, hard-fought institutional knowledge and the leading edge of thinking about risk. The author combined his training and experience at Jane Street, a legendary quant trader and market-making firm, with a broad intellectual acumen. Both his engineering background and affinity for liberal arts and philosophy come through to create a guide that transcends a single discipline. Personally, reading this book left me with nostalgia as the type of thinking was the water I swam in when I was at SIG (Jane Street’s lineage traces to SIG alumni who became under-the-radar legends themselves). My second personal feeling is, “damn, I wish I wrote that book.” Except I couldn’t. The author is an elite synthesizer with a nuanced comprehension for coding, organizational behavior, interviewing, and data analysis.

My notes are below. They are sparse compared to the depth of insight crammed into 250 pages. Every chapter covers a “rule”, situates that rule in the context of finance, then applies it to decisions all people face in the course of life.

✍️Notes on The Laws Of Trading (Notion.MoontowerMeta)

Last Call

I’ve watched this a lot. What a natural.

Bursting with his dad's charm https://t.co/OqEVjerQB7

— Kris (@KrisAbdelmessih) March 7, 2022

From CAPM To Hedging

Let’s start with a question from Twitter:

If you had to our your whole net worth into a single company right now (public or private), what would you pick?

Berkshire Hathaway is not allowed as an answer pic.twitter.com/ruCjbC6bZO
— Patrick OShaughnessy (@patrick_oshag) March 4, 2022

This is a provocative question. Patrick was clever to disallow Berkshire. In this post, we are going to use this question to launch into the basics of regression, correlation, beta hedging and risk.

Let’s begin.

My Reaction To The Question

I don’t know anything about picking stocks. I do know about the nature of stocks which makes this question scary. Why?

Stocks don’t last forever
Many stocks go to zero. The distribution of many stocks is positively skewed which means there’s a small chance of them going to the moon and reasonable chance that they go belly-up. The price of a stock reflects its mathematical expectation. Since the downside is bounded by zero and the upside is infinite, for the expectation to balance the probability of the stock going down can be much higher than our flawed memories would guess. Stock indices automatically rebalance, shedding companies that lose relevance and value. So the idea that stocks up over time is really stock indices go up over a time, even though individual stocks have a nasty habit of going to zero. For more see Is There Actually An Equity Premium Puzzle?.
Diversification is the only free lunch
The first point hinted at my concern with the question. I want to be diversified. Markets do not pay you for non-systematic risk. In other words, you do not get paid for risks that you can hedge. All but the most fundamental risks can be hedged with diversification. See Why You Don’t Get Paid For Diversifiable Risks. To understand how diversifiable risks get arbed out of the market ask yourself who the most efficient holder of a particular idiosyncratic risk is? If it’s not you, then you are being outbid by someone else, or you’re holding the risk at a price that doesn’t make sense given your portfolio choices. Read You Don’t See The Whole Picture to see why.

In a sense, this question reminds me of an interview question I’ve heard.

You are gifted $1,000,000 dollars. You must put it all in play on a roulette wheel. What do you do?

I would choose what’s known as a minimax strategy which seeks to minimize the maximum loss. I would spread my money evenly on all the numbers, accept a sure loss of 5.26%.⁸ The minimax response to Patrick’s question is to find the stock that is the most internally diversified.

Berkshire Vs The Market

I don’t have an answer to Patrick’s question. Feel free to explore the speculative responses in the thread. Instead, I want to dive further into my gut reaction that Berkshire would be a reasonable proxy to the market. If we look at the mean of its annual returns from 1965 to 2001, the numbers are gaudy. Its CAGR was 26.6% vs the SP500 at 11%. Different era. Finding opportunities at the scale Buffet needs to move the needle has been much harder in the past 2 decades.

Buffet has been human for the past 20 years. This is a safer assumption than the hero stats he was putting up in the last half of the 20th century.

The mean arithmetic returns and standard deviations validate my hunch that Berkshire’s size and diversification ⁹ make it behave like the whole market in a single stock.

Let’s add a scatterplot with a regression.

If you tried to anticipate Berkshire’s return, your best guess might be its past 20 year return, distributed similarly to its prior volatility. Another approach would be to see this relationship to the SP500 and notice that a portion of its return can simply be explained by the market. It clearly has a positive correlation to the SP500. But just how much of the relationship is explained by SP500? This is a large question with practical applications. Specifically, it underpins how market netural traders think about hedges. If I hedge an exposure to Y with X how much risk do I have remaining? To answer this question we will go on a little learning journey:

Deriving sensitivities from regressions in general
Interpreting the regression
CAPM: Applying regression to compute the “risk remaining of a hedge”

On this journey you can expect to learn the difference between beta and correlation, build intuition for how regressions work, and see how market exposures are hedged.

Unpacking The Berkshire Vs SP500 Regression

A regression is simply a model of how an independent variable influences a dependant variable. Use a regression when you believe there is a causal relationship between 2 variables. Spurious correlations are correlations that will appear to be causal because they can be tight. The regression math may even suggest that’s the case. I’m sorry. Math is a just a tool. It requires judgement. The sheer number of measurable quanitites in the world guarantees an infinite list of correlations that serve as humor not insight¹⁰.

The SP500 is steered by the corporate earnings of the largest public companies (and in the long-run the Main Street economy¹¹) discounted by some risk-aware consensus. Berkshire is big and broad enough to inherit the same drivers. We accept that Berkshire’s returns are partly driven by the market and partly due to its own idiosyncracies.

Satisfied that some of Berkshire’s returns are attributable to the broader market, we can use regression to understand the relationship. In the figure above, I had Excel simply draw a line that best fit the scatterplot with SP500 being the independent variable, or X, and Berkshire returns being the dependant or Y. The best fit line (there are many kinds of regression but we are using a simple linear regression) is defined the same way in line is: by a slope and an intercept.

The regression equation should remind you of the generic form of a line y = mx + b where m is the slope and b is the intercept.

In a regression:

y=α+βx

where:

y = dependant variable (Berkshire returns)

x = independent variable (SP500 returns)

α = the intercept (a constant)

β = the slope or sensitivity of the Y variable based on the X variable

If you right-click on a scatterplot in Excel you can choose “Add Trendline”. It will open the below menu where you can set the fitted line to be linear and also check a box to “Display Equation on chart”.

This is how I found the slope and intercept for the Berkshire chart:

y = .6814x + .0307

Suppose the market returns 2%:

Predicted Berkshire return = .6814 * 2% + 3.07%

Predicted Berkshire return = 4.43%

So based on actual data, we built a simple model of Berkshire’s returns as a function of the market.

It’s worth slowing down to understand how this line is being created. Conceptually it is the line that minimizes the squared errors between itself and the actual data. Since each point has 2 coordinates, we are dealing with the variance of a joint distribution. We use covariance instead of variance but the concepts are analogous. With variance we square the deviations from a mean. For covariance, we multiply the distance of each X and Y in a coordinate from their respective means: (xᵢ – x̄)(yᵢ -ȳ)

Armed with that idea, we can compute the regression line by hand with the following formulas:

β or slope = covar(x,y)/ var(x)

α or intercept = ȳ – β̄x̄

We will look at the full table of this computation later to verify Excel’s regression line. Before we do that, let’s make sure that this model is even helpful. One standard we could use to determine if the model is useful is if it performs better than the cheapest naive model that says:

Our predicted Berkshire return simply is mean return from sample.

This green arrows in this picture represent the error between this simple model and the actual returns.

This naive model of summing the squared differences from the mean of Berkshire’s returns is exactly the same as variance. You are computing squared differences from a mean. If you take square root of the average of the squared differences you get a standard deviation. In, this simple model where our prediction is simply the mean our volatility is 16.5% or the volatility of Berkshire’s returns for 20 years.

In the regression context, the total variance of the dependent variable from its mean is knows as the Total Sum of Squares or TSS.

The point of using regression though is we can make a better prediction of Berkshire’s returns if we know the SP500’s returns. So we can compare the mean to the fitted line instead of the actual returns. The sum of those squared differences is known as the Regression Sum Of Squares or RSS. This is the sum of squared deviations between the mean and fitted predictions instead of the actual returns. If there is tremendous overlap between the RSS and TSS, than we think much of the variance in X explains the variance of Y.

The last quantity we can look at is the Error Sum of Squares or ESS. These are the deviations from the actual data to the predicted values represented by our fitted line. This represents the unexplained portion of Y’s variance.

Let’s use 2008’s giant negative return to show how TSS, RSS, and ESS relate.

The visual shows:

TSS = RSS + ESS

We can compute the sum of these squared deviations simply from their definitions:

TSS (aka variance)	Σ(actual-mean)²
ESS (sum of errors squared)	Σ(actual-predicted)²
RSS (aka TSS – ESS)	Σ(predicted-mean)²

The only other quantities we need are variances and covariances to compute β or slope of the regression line.

In the table below:

ŷ = the predicted value of Berkshire’s return aka “y-hat”

x̄ = mean SP500 return aka “x-bar”

ȳ = mean Berkshire return aka “y-bar”

β = .40 / .59 = .6814

α = ȳ – β̄x̄ = 10.6% – .6814 * 11.1% = 3.07%

This yields the same regression equation Excel spit out:

y=α+βx

ŷ = 3.07% + .6814x

R-Squared

We walked through this slowly as a learning exercise, but the payoff is appreciating the R². Excel computed it as 52%. But we did everything we need to compute it by hand. Go back to our different sum of squares.

TSS or variance of Y = .52

ESS or sum of squared difference between actual data and the model = .25

Re-arranging TSS = RSS + ESS we can see that RSS = .27

Which brings us to:

R² = RSS/TSS = .27/.52 = 52%

Same as Excel!

R² is the regression sum of squares divided by the total variance of Y. It is called the coefficient of determination and can be interpreted as:

The variability in Y explained by X

So based on this small sample, 52% of Berkshire’s variance is explained by the market, as proxied by the SP500.

Correlation

Correlation, r (or if you prefer Greek, ρ) can be computed in at least 2 ways. It’s the square root of R².

r = √R² = √.52 = .72

We can confirm this by computing correlation by hand according to its own formula:

Substituting:

Looking at the table above we have all the inputs:

r = .40 / sqrt(.59 x .52)

r = .72

Variance is an unintuitive number. By taking the square root of variance, we arrive at a standard deviation which we can actually use.

Similarly, covariance is an intermediate computation lacking intuition. By normalizing it (ie dividing it) by the standard deviations of X and Y we arrive at correlation, a measure that holds meaning to us. It is bounded by -1 and +1. If the correlation is .72 then we can make the following statement:

If x is 1 standard deviation above its mean, I expect y to be .72 standard deviations above its own mean.

It is a normalized measure of how one variable co-varies versus the other.

How Beta And Correlation Relate

Beta, β, is the slope of the regression equation.

Correlation is the square root of R² or coefficient of determination.

Beta actually embeds correlation within it.

Look closely at the formulas:

Watch what happens when we divide β̄ by r.

Whoa.

Beta equals correlation times the ratio of the standard deviations.

The significance of that insight is about to become clear as we move from our general use of regression to the familiar CAPM regression. From the CAPM formula we can derive the basis of hedge ratios and more!

We have done all the heavy lifting at this point. The reward will be a set of simple, handy formulas that have served me throughout my trading career.

Let’s continue.

From Regression To CAPM

The famous CAPM pricing equation is a simple linear regression stipulating that the return of an asset is a function of the risk free rate, a beta to the broader market, plus an error term that represents the security’s own idiosyncratic risk.

Rᵢ = Rբ + β(Rₘ – Rբ) + Eᵢ

where:

Rᵢ = security total return

Rբ = risk-free rate

β = sensitivity of security’s return to the overall market’s excess return (ie the return above the risk-free rate)

Eᵢ = the security’s unique return (aka the error or noise term)

Since the risk-free rate is a constant, let’s scrap it to clean the equation up.

This is the variance equation for this security:

Recall that beta is the vol ratio * correlation:

We can use this to factor the “market variance” term.

Plugging this form of “variance due to the market” back into the variance equation:

This reduces to the prized equation: The “risk remaining” formula which is the proportion of a stock’s volatility due to its own idiosyncratic risk.

This makes sense. R² is the amount of variance in a dependant variable attributable to indepedent variable. If we subtract that proportion from 1 we arrive at the “unexplained” or idiosyncratic variance. By taking the square root of that quantity, we are left with unexplained volatility or “risk remaining”.

Let’s use what we’ve learned in a concrete example.

From CAPM To Hedge Ratios

Let’s return to Berkshire vs the SP500. Suppose we are long $10mm worth of BRK.B and want to hedge our exposure by going short SP500 futures.

We want to compute:

How many dollars worth of SP500 to get short
The “risk remaining” on the hedged portfolio

How many dollars of SP500 do we need to short?

Before we answer this lets consider a few ways we can hedge with SP500.

Dollar weighting

We could simply sell $10mm worth of SP500 futures which corresponds to our $10mm long in BRK.B. Since Berkshire and the SP500 are a similar volatility this is a reasonable approach. But suppose we were long TSLA instead of BRK.B. Assuming TSLA was sufficiently correlated to the market (say .70 like BRK.B), the SP500 hedge would be “too light”.

Why?

Because TSLA is about 3x more volatile than the SP500. If the SP500 fell 1 standard deviation, we expect TSLA to fall .70 standard deviations. Since TSLA’s standard deviations are much larger than the SP500 we would be tragically underhedged. Our TSLA long would lose much more money than our short SP500 position because we are not short enough dollars of SP500.
Vol weighting

Dollar weighting is clearly naive if there are large differences in volatility between our long and short. Let’s stick with the TSLA example. If TSLA is 3x as volatile as the SP500 then if we are long $10mm TSLA, we need to short $30mm worth of SP500.

Uh oh.

That’s going to be too much. Remember the correlation. It’s only .70. The pure vol weighted hedge only makes sense if the correlations are 1. If the SP500 drops one standard deviation, we expect TSLA to drop only .70 standard deviations, not a full standard deviation. In this case, we will have made too much money on our hedge, but if the market would have rallied 1 standard deviation our oversized short would have been “heavy”. We would lose more money than we gained on our TSLA long. Again, only partially hedged.
Beta weighting

Alas, we arrive at the goldilocks solution. We use the beta or slope of the linear regression to weight our hedge. Since beta equals correlation * vol ratio we are incorporating both vol and correlation weighting into our hedge!

I made up numbers vols and correlations to complete the summary tables below. The key is seeing how much the prescribed hedge ratios can vary depending on how you weight the trades.

Beta weighting accounts for both relative volatilies and the correlation between names. Beta has a one-to-many relationship to its construction. A beta of .5 can come from:
- A .50 correlation but equal vols
- A .90 correlation but vol ratio of .56
- A .25 correlation but vol ratio of 2

It’s important to decompose betas because the correlation portion is what determines the “risk remaining” on a hedge. Let’s take a look.

How much risk remains on our hedges?

We are long $10,000,000 of TSLA

We sell $21,000,000 of SP500 futures as a beta-weighted hedge.

Risk remaining is the volatility of TSLA that is unexplained by the market.

R² is the amount of variance in the TSLA position explained by the market.
1-R² is the amount of variance that remains unexplained
The vol remaining is sqrt(1-R²)

Risk (or vol) remaining = sqrt (1-.7²) = 71%

TSLA annual volatility is 45% so the risk remaining is 71% * 45% = 32.14%

32.14% of $10,0000 of TSLA = $3,214,000

So if you ran a hedged position, within 1 standard deviation, you still expect $3,214,000 worth of noise!

Remember correlation is symmetrical. The correlation of A to B is the same as the correlation of B to A (you can confirm this by looking at the formula).

Beta is not symmetrical because it’s correlation * σ_{dependant /}σ_independent

Yet risk remaining only depends on correlation.

So what happens if we flipped the problem and tried to hedge $10,000,000 worth of SP500 with a short TSLA position.

First, this is conceptually a more dangerous idea. Even though the correlation is .70, we are less likely to believe that TSLA’s variance explains the SP500’s variance. Math without judgement will impale you on a spear of overconfidence.
I’ll work through the example just to be complete.

To compute beta we flip the vol ratio from 3 to 1/3 then multiply by the correlation of .7

Beta of SP500 to TSLA is .333 * .7 = .233

If we are long $10,000,000 of SP500, we sell $2,333,000 of TSLA. The risk remaining is still 71% but it is applied to the SP500 volatility of 15%.

71% x 15% = 10.71% so we expect 10.71% of $10,000,000 or $1,071,000 of the SP500 position to be unexplained by TSLA.
I’m re-emphasizing: math without judgement is a recipe for disaster. The formulas are tools, not substitutes for reasoning.

Changes in Correlation Have Non-Linear Effects On Your Risk

Hedging is tricky. You can see that risk remaining explodes rapidly as correlation falls.

If correlation is as high as .86, you already have 50% risk remaining!

In practice, a market maker may:

group exposures to the most related index (they might have NDX, SPX, and IWM buckets for example)
offset deltas between exposures as they accumulate
and hedge the remaining deltas with futures.

You might create risk tolerances that stop you from say being long $50mm worth of SPX and short $50mm of NDX leaving you exposed the underlying factors which differentiate these indices. Even though they might be tightly correlated intraday, the correlation change over time and your risk-remaining can begin to swamp your edge.

The point of hedging is to neutralize the risks you are not paid to take. But hedging is costly. Traders must always balance these trade-offs in the context of their capital, risk tolerances, and changing correlations.

Review

I walked slowly through topics that are familiar to many investors and traders. I did this because the grout in these ideas often trigger an insight or newfound clarity of something we thought we understood.

This is a recap of important ideas in this post:

Variance is a measure of dispersion for a single distribution. Covariance is a measure of dispersion for a joint distribution.
Just as we take the square root of variance to normalize it to something useful (standard deviation, or in a finance context — volatility), we normalize covariance into correlation.
Intuition for a positive(negative) correlation: if X is N standard deviations above its mean, Y is r * N standard deviations above(below) its mean.
Beta is r * the vol ratio of Y to X. In a finance context, it allows it allows us to convert a correlation from a standard deviation comparison to a simple elasticity. If beta = 1.5, then if X is up 2%, I expect Y to be up 3%
Correlation is symmetrical. Beta is not.
R²is the variance explained by the independent variable. Risk remaining is the volatility that remains unexplained. It is equal to sqrt(1-R²).
There is a surprising amount of risk remaining even if correlations are strong. At a correlation of .86, there is 50% unexplained variance!
Don’t compute robotically. Reason > formulas.

Beware.

Least squares linear regression is only one method for fitting a line. It only works for linear relationships. Its application is fraught with pitfalls. It’s important to understand the assumptions in any models you use before they become load-bearing beams in your process.

References:

The table in this post was entirely inspired by Rahul Pathak’s post Anova For Regression.

For the primer on regression and sum of squares I read these 365 DataScience posts in hte following order:

Getting Familiar with the Central Limit Theorem and the Standard Error
How To Perform A Linear Regression In Python (With Examples!)
The Difference between Correlation and Regression
Sum of Squares Total, Sum of Squares Regression and Sum of Squares Error
Measuring Explanatory Power with the R-squared
Exploring the 5 OLS Assumptions for Linear Regression Analysis
(I strongly recommend reading this post before diving in on your own. )

If you use options to hedge or invest, check out the moontower.ai option trading analytics platform

Moontower #140

Friends,

I’m in SoCal again this weekend visiting with family and getting a couple nights away with Yinh.

I didn’t get around to publishing part 2 of how to think about finding opportunities in the extremes of the distribution. For background see last week’s There’s Gold In Them Thar Tails: Part 1.

This week I’ll share a thought I jotted down after reading a rousing post by scientist Michael Nielsen. Hopefully, this is a dose of local, actionable inspiration.

Unlock One Another: The Right Compliment At The Right Time (6 min read)

This post is not science. It’s not rigorous. It is a simple belief, both self-evident and load-bearing. Itself the proof of its premise because believing it is my own generative force.

Stated as I see it:

The closest thing we have to a perpetual motion machine is inspiration.

Inspiration creates its own energy for action.
Action creates information.
Information generates inspiration.

Repeat.

A finance-dork way of saying this is inspiration is the cheapest source of capital.

One of the ideas economist Tyler Cowen is recognized for comes from his short post, The high-return activity of raising others’ aspirations, where he writes:

At critical moments in time, you can raise the aspirations of other people significantly, especially when they are relatively young, simply by suggesting they do something better or more ambitious than what they might have in mind. It costs you relatively little to do this, but the benefit to them, and to the broader world, may be enormous.

This is in fact one of the most valuable things you can do with your time and with your life.

I’m interested in education and how people learn. There’s nothing more invigorating than the moment of empowerment in a child’s eye when they realize “they can”. As a parent, my proudest moments are the goofy smiles on the boys’ faces when they found themselves able to do what they didn’t think they could. Swim their first lap, add in their head, not panic when they got stuck on a zipline (my 7-year-old was calmer than I would have been).

Learning is the receipt you get for courage.

Courage is virtue. It takes courage to see clearly. To empathize. To put aside your preconceptions. To not give into malformed ideas about yourself or others without a challenge. To face your insecurities. To step outside your comfort zone.

I’m as fallible as the next person but I try to live in a way that takes what Cowen says seriously. It’s something I try to keep top of mind especially when I can feel my patience fray. That’s when I need to recruit that belief the most. This is part of being charitable. Giving people credit for wanting to be better. Sometimes a jerk is just a jerk. But sometimes a jerk is someone who wants to be better but doesn’t know how. They are scared but don’t know it. Behind that defense mechanism is an insecure soul that once crawled on all fours, just like you. I don’t want to let go of the rope until the last second when it’s clear they want to take me over the cliff with them. Sometimes I do. I can’t live up to my own ideals.

But I and all of us must continue to try. Noah Smith, a writer and professor, explains why (emphasis mine):

I think our society has moved a huge amount in the direction of meritocracy — of being open to talent. I think we’re really good at that at this point. But I think our pursuit of meritocracy has caused us to neglect a few important things. One is ambition; the people whose talent we discover are the people who come to us, who shove their talent in our faces, because their parents instilled drive and ambition and confidence in them. But there are a lot of talented people out there whose abilities never get discovered because no one ever told them they should aim high, or because they didn’t have parents to push them, or because they simply lacked confidence. My brother-in-law grew up poor in a trailer park, no one in his family had ever been to college. But my sister instilled him with a little more ambition, and he just graduated from a top law school. Without the luck of meeting my sister, he might still be in a trailer park! So our system is so focused on setting up these tournaments for ambitious people that we fail to go out and nurture the ambition of people who have undiscovered talent...A successful society rests on a broad foundation of human capital; it does not place all its hopes on a thin sliver of genius. I see too many people in Silicon Valley — both liberals and conservatives — tacitly accept the notion that only a few people have real potential. And maybe that’s because venture-funded software is such a winner-take-all market. I don’t know. But that’s not the attitude that will bring this country a broad industrial renaissance or social revitalization.

Scientist Michael Nielsen offers an idea anyone can borrow. Nielsen contends that if you give specific compliments to people instead of generic platitudes you are capable of doing far more good than you think. It kicks off a spiral of inspiration in its target. It can validate what they think they are good at, a source of energy that pays off 10-fold as they lean even harder into their gifts. And if that recipient didn’t realize they had some special gift in the first place? You just hit’em with a defibrillator. They just gasped to life.

And maybe. For the first time.

I leave you with his essay. It hit hard because my love language is compliments and since I’m not special I assume it is for many people. It’s a simple thing you can do for others. It takes being present. A dash of vulnerability. And a few words.

On Volitional Philanthropy (a short essay!)

by Michael Nielsen

T. E. Lawrence, the English soldier, diplomat and writer, possessed what one of his biographers called a capacity for enablement: he enabled others to make use of abilities they had always possessed but, until their acquaintance with him, had failed to realize. People would come into contact with Lawrence, sometimes for just a few minutes, and their lives would change, often dramatically, as they activated talents they did not know they had.

Most of us have had similar experiences. A wise friend or acquaintance will look deeply into us, and see some latent aspiration, perhaps more clearly than we do ourselves. And they will see that we are capable of taking action to achieve that aspiration, and hold up a mirror showing us that capability in crystalline form. The usual self-doubts are silenced, and we realize with conviction: “yes, I can do this”.

This is an instance of volitional philanthropy: helping expand the range of ways people can act on the world.

I am fascinated by institutions which scale up this act of volitional philanthropy.

Y Combinator is known as a startup incubator. When friends began participating in early batches, I noticed they often came back changed. Even if their company failed, they were more themselves, more confident, more capable of acting on the world. This was a gift of the program to participants [1]. And so I think of Y Combinator as volitional philanthropists.

For a year I worked as a Research Fellow at the Recurse Center. It’s a three-month long “writer’s retreat for programmers”. It’s unstructured: participants are not told what to do. Rather, they must pick projects for themselves, and structure their own path. This is challenging. But the floundering around and difficulty in picking a path is essential for growing one’s sense of choice, and of responsibility for choice. And so creating that space is, again, a form of volitional philanthropy.

There are institutions which think they’re in the volitional philanthropy game, but which are not. Many educators believe they are. In non-compulsory education that’s often true. But compulsory education is built around fundamental denials of volition: the student is denied choice about where they are, what they are doing, and who they are doing it with. With these choices denied, compulsory education shrinks and constrains a student’s sense of volition, no matter how progressive it may appear in other ways.

There is something paradoxical in the notion of helping someone develop their volition. By its nature, volition is not something which can be given; it must be taken. Nor do I think “rah-rah” encouragement helps much, since it does nothing to permanently expand the recipient’s sense of self. Rather, I suspect the key lies in a kind of listening-for-enablement, as a way of helping people discover what they perhaps do not already know is in themselves. And then explaining honestly and realistically (and with an understanding that one may be in error) what it is one sees. It is interesting to ask both how to develop that ability in ourselves, and in institutions which can scale it up.

[1] It is a median effect. I know people who start companies who become first consumed and then eventually diminished by the role. But most people I’ve known have been enlarged.

Note, by the way, that I work at Y Combinator Research, which perhaps colours my impression. On the other hand, I’ve used YC as an example of volitional philanthropy since (I think) 2010, years before I started working for YCR.

Money Angle

The Moontower Money Wiki is a project to help people who don’t know how to invest their savings. I plan to turn these write-ups into a series that starts with the “nature of investing” and holds their hand through implementation. The final form of the series is TBD. Maybe in-person lectures, exercises, videos. I don’t know how this will unfold.

Right now I’m just interested in helping people think better about investing. The first step of that is to help people unlearn the garbage they are bombarded with because of FOMO, punditry, and “democratization” apps laced with dopamine.

Investing is not about “engagement”. It’s actually brain damage if you cannot anchor yourself to goals and plans. People are not wired to navigate random number generators, so we need to form a qualitative basis for why we invest in the first place and how investing actually leads to returns.

Many readers here are sophisticated, so it will be beneath them. Yet, I want to make something even HSers can understand. When I complete a post for it, I will share them in this letter. It takes me longer than I’d like to get anything done so I won’t venture a guess on how often I’ll publish one.

So after all that blathering here’s the most recent write-up:

The Challenge Of Outperformance (Link)

Last Call

Today is also an opportunity for global inspiration.

Our friend Tina and her organization All Hands And Hearts are procuring buses to evacuate children from Ukraine to Poland which is accepting refugees with open arms.

Every little bit helps. Every bus they procure can make many runs back and forth. 300 kids per run. The money is being used for buses, diapers, food and water.

Re-tweet to spread the word as well.

Dear Twitter friends,
🧵
I am very involved with a natural disaster relief charity. We go into action after hurricane, earthquakes, tsunami’s etc. While this is outside of our usual scope, we have decided to go into 🇺🇦 Ukraine and help evacuate orphans by bus to 🇵🇱 Poland.
— Tina (@moreproteinbars) March 2, 2022

I hope you all can consider this very worthy cause. Thank you! And please retweet! https://t.co/aMg89ofNvp
— Tina (@moreproteinbars) March 2, 2022

I just have to shout fellow fintwitter, Jessica. Mostly known for shitposting (and math wizadry when she feels like it), Jess is absolutely boss when it’s time for action.

Thank you for all your support. We raised over $100,000 for Ukraine. Hopefully many of you contimue to donate.

As a side note, most of the big donations were via crypto, so I'll eat some crow on all the bad things I've said about crypto over the years. https://t.co/Xg9wjUaOvY
— Jessica Nutt (@JessicaNutt96) February 28, 2022

From My Actual Life

The world feels big and scary. Everyone deals with it in their own ways. For better or worse, I keep my focus on what I think I can handle but try to do my best within that narrow aperture.

A few personal thoughts I had this week.

Via @nateliason

This is my default state. I still barely look at the news.

I oscillate between feeling guilty about this (this isn't a privilege thing, I was the same way when I was younger) and secure in my ignorance.

I put almost everything in the too hard pile. pic.twitter.com/oUFLNJ5Wiu
— Kris (@KrisAbdelmessih) February 28, 2022

An old issue of Daily Dirtnap once quipped something like this.

Money, family, friends, health, sleep

Choose 3.

When I listen to interviews the intro resume to everyone is a scroll.

I think interviewers should ask the old Gamora question:

What did it cost you?
— Kris (@KrisAbdelmessih) March 4, 2022

I think it's hard to relate to guests without some idea of the toll it took.

And if the point is that they are not relatable maybe trying to learn lessons from "elite performers" is silly in the first place
— Kris (@KrisAbdelmessih) March 4, 2022

Yinh reminds me that you can’t have all the things at the same time. Matthew McConaughey (you should listen to the Greenlights audiobook or see my takeways from his commencement speech…Wooderson’s self-help advice stands with the best) recognizes that as well. He recommends “checking in” with each category intermittently to see how well you are tracking compared to where you’d like to be. It’s inevitable that you will lag in various categories at various times. A smattering of conscious effort, even if contrived, can keep you from orphaning a category you once told yourself matters.

Stay groovy!

Unlock One Another: The Right Compliment At The Right Time

This post is not science. It’s not rigorous. It is a simple belief, both self-evident and load-bearing. Itself the proof of its premise because believing it is my own generative force.

Stated as I see it:

The closest thing we have to a perpetual motion machine is inspiration.

Inspiration creates its own energy for action.
Action creates information.
Information generates inspiration.

Repeat.

A finance-dork way of saying this is inspiration is the cheapest source of capital.

One of the ideas economist Tyler Cowen is recognized for comes from his short post, The high-return activity of raising others’ aspirations, where he writes:

At critical moments in time, you can raise the aspirations of other people significantly, especially when they are relatively young, simply by suggesting they do something better or more ambitious than what they might have in mind. It costs you relatively little to do this, but the benefit to them, and to the broader world, may be enormous.

This is in fact one of the most valuable things you can do with your time and with your life.

Learning is the receipt you get for courage.

But I and all of us must continue to try. Noah Smith, a writer and professor, explains why (emphasis mine):

I think our society has moved a huge amount in the direction of meritocracy — of being open to talent. I think we’re really good at that at this point. But I think our pursuit of meritocracy has caused us to neglect a few important things. One is ambition; the people whose talent we discover are the people who come to us, who shove their talent in our faces, because their parents instilled drive and ambition and confidence in them. But there are a lot of talented people out there whose abilities never get discovered because no one ever told them they should aim high, or because they didn’t have parents to push them, or because they simply lacked confidence. My brother-in-law grew up poor in a trailer park, no one in his family had ever been to college. But my sister instilled him with a little more ambition, and he just graduated from a top law school. Without the luck of meeting my sister, he might still be in a trailer park! So our system is so focused on setting up these tournaments for ambitious people that we fail to go out and nurture the ambition of people who have undiscovered talent...A successful society rests on a broad foundation of human capital; it does not place all its hopes on a thin sliver of genius. I see too many people in Silicon Valley — both liberals and conservatives — tacitly accept the notion that only a few people have real potential. And maybe that’s because venture-funded software is such a winner-take-all market. I don’t know. But that’s not the attitude that will bring this country a broad industrial renaissance or social revitalization.

Scientist Michael Nielsen offers an idea anyone can borrow. Nielson contends that if you give specific compliments to people instead of generic platitudes you are capable of doing far more good than you think. It kicks off a spiral of inspiration in its target. It can validate what they think they are good at, a source of energy that pays off 10-fold as they lean even harder into their gifts. And if that recipient didn’t realize they had some special gift in the first place? You just hit’em with a defibrillator. They just gasped to life.

And maybe. For the first time.

On Volitional Philanthropy (a short essay!)

by Michael Nielsen

This is an instance of volitional philanthropy: helping expand the range of ways people can act on the world.

I am fascinated by institutions which scale up this act of volitional philanthropy.

[1] It is a median effect. I know people who start companies who become first consumed and then eventually diminished by the role. But most people I’ve known have been enlarged.

Moontower #139

If you were accepted to a selective college or job in the 90s, have you ever wondered if you’d get accepted in today’s environment? I wonder myself. It leaves me feeling grateful because I think the younger version of me would not have gotten into Cornell or SIG today. Not that I dwell on this too much. I take Heraclitus at his word that we do not cross the same river twice. Transporting a fixed mental impression of yourself into another era is naive (cc the self-righteous certain they’d be on the right side of history on every topic). Still, my self-deprecation has teeth. When I speak to friends with teens I hear too many stories of sterling resumes bulging with 3.9 GPAs, extracurriculars, and Varsity sport letters, being warned: “don’t bother applying to Cal”.

A close trader friend explained his approach. His daughter is a high achiever. She’s also a prolific writer. Her passion is the type all parents hope their children will be lucky enough to discover. My friend recognizes that the bar is so high to get into a top school that acceptance above that bar is a roulette wheel. With so much randomness lying above a strict filter, he de-escalates the importance of getting into an elite school. “Do what you can, but your life doesn’t depend on the whim of an admissions officer”. She will lean into getting better at what she loves wherever she lands. This approach is not just compassionate but correct. She’s thought ahead, got her umbrella, but she can’t control the weather.

My friend’s insight that acceptance above a high threshold is random is profound. And timely. I had just finished reading Rohit Krishnan’s outstanding post Spot The Outlier, and immediately sent it to my friend.

I chased down several citations in Rohit’s post to improve my understanding of this topic.

In the post, we tie together:

Why the funnels are getting narrower
The trade-offs in our selection criteria
The nature of the extremes: tail divergence
Strategies for the extremes

✍️ There’s Gold In Them Thar Tails: Part 1 (13 min read)

Money Angle

A collection of links

✍️ Matt Levine’s “Everything Is Securities Fraud” Compilation (Link)
by @RabbiJacob16

I’ve suggested multiple times on Twitter that someone should compile Matt Levine’s Money Stuff articles by topic and sell it as a book. RabbiJacob16 compiled all of Levine’s writing on his “Everything is Securities Fraud” theme and made it free.

✍️What explains the rise of AMMs?(17 min read)
by Haseeb Qureshi

This is an ancient post by crypto standards (1.5 yrs old) yet remains a lucid explanation of automated market-making. You’ll find lots of fun reading on Haseeb’s site regardless.

I also published 2 posts that are exhaust from an article I’m writing. I’ll publish that next week.

✍️Notes From Mauboussin’s “Who Is On The Other Side?” (7 min read)

These are notes from a report where:

Michael describes a taxonomy of inefficiencies, supported by a rich vein of academic research. The goal is to have a clear idea of why efficiency is constrained and why we believe we have an opportunity to generate an attractive return after an adjustment for risk.

✍️ On Contrarianism (3 min read)

A collection of quotes and ideas on the importance and difficulty of being contrarian.

From My Actual Life

I saw War on Drugs Friday night at Bill Graham in SF with Jake.

It’s Friday.. yeah pic.twitter.com/MieAkXK0Mk

— Jake (@EconomPic) February 26, 2022

The song Under The Pressure (below) is all rise, but when the drop comes, that’s an ethereal experience live. Both times I’ve seen WoD that moment stays with me.

There’s Gold In Them Thar Tails: Part 1

If you were accepted to a selective college or job in the 90s, have you ever wondered if you’d get accepted in today’s environment? I wonder myself. It leaves me feeling grateful because I think the younger version of me would not have gotten into Cornell or SIG today. Not that I dwell on this too much. I take Heraclitus at his word that we do not cross the same river twice. Transporting a fixed mental impression of yourself into another era is naive (cc the self-righteous who think they’d be on the right side of history on every topic). Still, my self-deprecation has teeth. When I speak to friends with teens I hear too many stories of sterling resumes bulging with 3.9 GPAs, extracurriculars, and Varsity sport letters, being warned: “don’t bother applying to Cal”.

I chased down several citations in Rohit’s post to improve my understanding of this topic.

In this post, we will tie together:

Why the funnels are getting narrower
The trade-offs in our selection criteria
The nature of the extremes: tail divergence
Strategies for the extremes

We will extend the discussion in a later post with:

What this means for intuition in general
Applications to investing

Why Are The Funnels Getting Narrower?

The answer to this question is simple: abundance.

In college admissions, the number of candidates in aggregate grows with the population. But this isn’t the main driver behind the increased selectivity. The chart below shows UC acceptance rates plummeting as total applications outstrip admits.

The spread between applicants and admissions has exploded. UCLA received almost 170k applications for the 2021 academic year! Cal receives over 100k applicants for about 10k spots. Your chances of getting in have cratered in the past 20 years. Applications have lapped population growth due to a familiar culprit: connectivity. It is much easier to apply to schools today. The UC system now uses a single boilerplate application for all of its campuses.

This dynamic exists everywhere. You can apply to hundreds of jobs without a postage stamp. Artists, writers, analysts, coders, designers can all contribute their work to the world in a permissionless way with as little as a smartphone. Sifting through it all necessitated the rise of algorithms — the admissions officers of our attention.

Trade-offs in Selection Criteria

There’s a trade-off between signal and variance. What if Spotify employed an extremely narrow recommendation engine indexed soley on artist? If listening to Enter Sandman only lead you to Metallica’s deepest cuts, the engine is failing to aid discovery. If it indexed by “year”, you’d get a lot more variance since it would choose across genres, but headbangers don’t want to listen to Color Me Badd. This prediction fails to delight the user.

Algorithms are smarter than my cardboard examples but the tension remains. Our solutions to one problem excarbates another. Rohit describes the dilemma:

The solution to the problem of discovery is better selection, which is the second problem. Discovery problems demand you do something different, change your strategy, to fight to be amongst those who get seen.

There’s plenty of low-hanging fruit to find recommendations that reside between Color Me Badd and St. Anger. But once it’s picked, we are still left with a vast universe of possible songs for the recommendation engine to choose from.

Selection problems reinforce the fact that what we can measure and what we want to measure are two different things, and they diverge once you get past the easy quadrant.

In other words, it’s easy enough to rule out B students, but we still need to make tens of thousands of coinflip-like decisions between the remaining A students. Are even stricter exams an effective way narrow an unwieldy number of similar candidates? Since in many cases predictors poorly map to the target, the answer is probably no. Imagine taking it to the extreme and setting the cutoff to the lowest SAT score that would satisfy Cal’s expected enrollment. Say that’s 1400. This feels wrong for good reasons (and this is not even touching the hot stove topic of “fairness”). Our metrics are simply imperfect proxies for who we want to admit. In mathy language we can say, the best person at Y (our target variable) is not likely to come from the best candidates we screened if the screening criteria, X, is an imperfect correlate of success(Y).

The cost of this imperfect correlation is a loss of diversity or variance. Rohit articulates the true goal of selection criteria (emphasis mine):

Since no exam perfectly captures the necessary qualities of the work, you end up over-indexing on some qualities to the detriment of others. For most selection processes the idea isn’t to get those that perfectly fit the criteria as much as a good selection of people from amongst whom a great candidate can emerge.

This is even true in sports. Imagine you have a high NBA draft pick. A great professional must endure 82 games (plus a long playoff season), fame, money, and most importantly, a sustained level of unprecedented competition. Until the pros, they were kids. Big fish in small ponds. If you are selecting for an NBA player with narrow metrics, even beyond the well-understood requisite screens for talent, then those metrics are likely to be a poor guide to how the player will handle such an outlier life. The criteria will become more squishy as you try to parse the right tail of the distribution.

In the heart of the population distribution, the contribution to signal of increasing selectivity is worth the loss of variance. We can safely rule out B students for Cal and D3 basketball players for the NBA. But as we get closer to elite performers, at what point should our metrics give way to discretion? Rohit provides a hint:

When the correlation between the variable measured and outcome desired isn’t a hundred percent, the point at which the variance starts outweighing the mean error is where dragons lie!

Nature Of The Extremes: Tail Divergence

To appreciate why the signal of our predictive metrics become random at the extreme right tail we start with these intuitive observations via LessWrong:

Extreme outliers of a given predictor are seldom similarly extreme outliers on the outcome it predicts, and vice versa. Although 6’7″ is very tall, it lies within a couple of standard deviations of the median US adult male height – there are many thousands of US men taller than the average NBA player, yet are not in the NBA. Although elite tennis players have very fast serves, if you look at the players serving the fastest serves ever recorded, they aren’t the very best players of their time. It is harder to look at the IQ case due to test ceilings, but again there seems to be some divergence near the top: the very highest earners tendto be very smart, but their intelligence is not in step with their income (their cognitive ability is around +3 to +4 SD above the mean, yet their wealth is much higher than this).

The trend seems to be that even when two factors are correlated, their tails diverge: the fastest servers are good tennis players, but not the very best (and the very best players serve fast, but not the very fastest); the very richest tend to be smart, but not the very smartest (and vice versa).

The post uses simple scatterplots to demonstrate. Here are 2 self-explanatory charts.

LessWrong contines: Given a correlation, the envelope of the distribution should form some sort of ellipse, narrower as the correlation goes stronger, and more circular as it gets weaker.

If we zoom into the far corners of the ellipse, we see ‘divergence of the tails’: as the ellipse doesn’t sharpen to a point, there are bulges where the maximum x and y values lie with sub-maximal y and x values respectively:

Say X is SAT score and Y is college GPA. We shoudn’t expect that the person with highest SATs will earn the highest GPA. SAT is an imperfect correlate of GPA. LessWrong’s interpretation is not surprising:

The fact that a correlation is less than 1 implies that other things matter to an outcome of interest. Although being tall matters for being good at basketball, strength, agility, hand-eye-coordination matter as well (to name but a few). The same applies to other outcomes where multiple factors play a role: being smart helps in getting rich, but so does being hard working, being lucky, and so on.

Pushing this even further, if we zoom in on the extreme of a distribution we may find correlations invert! This scatterplot via Brilliant.org shows a positive correlation over the full sample (pink) but a negative correlation for a slice (blue).

This is known as Berkson’s Paradox and can appear when you measure a correlation over a “restricted range” of a distribution (for example, if we restrict our sample to the best 20 basketball players in the world we might find that height is negatively correlated to skill if the best players were mostly point guards).

[I’ve written about Berkson’s Paradox here. Always be wary of someone trying to show a correlation from a cherry-picked range of a distribution. Once you internalize this you will see it everywhere! I’d be charitable to the perpetrator. I suspect it’s usually careless thinking rather than a nefarious attempt to persuade.]

Strategies For The Extremes

In 1849, assayor Dr. M. F. Stephenson shouted ‘There’s gold in them thar hills’ from the steps of the Lumpkin County Courthouse in a desperate bid to keep the miners in Georgia from heading west to chase riches in California. We know there’s gold in the tails of distributions but our standard filters are unfit to sift for them.

Let’s pause to take inventory of what we know.

As the number of candidates or choices increases we demand stricter criteria to keep the field to a manageable size.
At some cutoff, in the extreme of a distribution, selection metrics can lead to random or even misleading predictions. ¹²

I’ll add a third point to what we have already established:
Evolution in nature works by applying competitve pressures to a diverse population to stimulate adaptation (a form of learning). Diversity is more than a social buzzword. It’s an essential input to progress. Rohit implicitly acknowledges the dangers of inbreeding when he warns against putting folks through a selection process that reflexively molds them into rule-following perfectionists rather than those who are willing to take risks to create something new.

With these premises in place we can theorize strategies for both the selector and the selectee to improve the match between a system’s desired output (the definition of success depends on the context) and its inputs (the criteria the selector uses to filter).

Selector Strategies

We can continue to rely on conventional metrics to filter the meat of the distribution for a pool of candidates. As we get into the tails, our adherence and reverance for measures should be put aside in favor of increasing diversity and variance. Remember the output of an overly strict filter in the tail is arbitrary anyway. Instead we can be deliberate about the randomness we let seep into selections to maximize the upside of our optionality.

Rohit summarizes the philosophy:

Change our thinking from a selection mindset (hire the best 5%) to a curation mindset (give more people a chance, to get to the best 5%).

Practically speaking this means selectors must widen the top of the funnel then…enforce the higher variance strategy of hire-and-train.

Rohit furnishes examples:

Tyler Cowen’s strategy of identifying unconventional talent and placing small but influential bets on the candidates. This is easier to say than do but Tony Kulesa finds some hints in Cowen’s template.
The Marine Corps famously funnels wide electing not to focus so much on the incoming qualifications, but rather look at recruiting a large class and banking on attrition to select the right few.
Investment banks and consulting firms hire a large group of generically smart associates, and let attrition decide who is best suited to stick around.

David Epstein, author of Range and The Sports Gene, has spent the past decade studying the development of talent in sports and beyond. He echoes these strategies:

One practice we’ve often come back to: not forcing selection earlier than necessary. People develop at different speeds, so keep the participation funnel wide, with as many access points as possible, for as long as possible. I think that’s a pretty good principle in general, not just for sports.

I’ll add 2 meta observations to these strategies:

The silent implication is the upside of matching the right talent to the right role is potentially massive. If you were hiring someone to bag groceries the payoff to finding the fastest bagger on the planet is capped. An efficient checkout process is not the bottleneck to a supermarket’s profits. There’s a predictable ceiling to optimizing it to the microsecond. That’s not the case with roles in the above examples.
Increasing adoption of these strategies requires thoughtful “accounting” design. High stakes busts, whether they are first round draft picks or 10x engineers, are expensive in time and money for the employer and candidate. If we introduce more of a curation mindset, cast wider nets and hire more employees, we need to understand that the direct costs of doing that should be weighed against the opaque and deferred costs of taking a full-size position in expensive employees from the outset.

Accrual accounting is an attempt match a business’ economic mechanics to meaningful reports of stocks and flows so we extract insights that lead to better bets. Fully internalized, we must recognize that some amount of churn is expected as “breakage”. Lost option premiums need to be charged against the options that have paid off 100x. If an organization fails to design its incentive and accounting structures in accordance with curation/optionality thinking it will be unable to maintain its discipline to the strategy.

Selectee Strategies

For the selectee trying to maximise their own potential there are strategies which exploit the divergence in the tails.

To understand, we first recognize, that in any complicated domain, the effort to become the best is not linear. You could devote a few years to becoming an 80th or 90 percentile golfer or chess player. But in your lifetime you wouldn’t become Tiger or Magnus. The rewards to effort decay exponentially after a certain point. Anyone who has lifted weights knows you can spend a year progressing rapidly, only to hit a plateau that lasts just as long.

The folk wisdom of the 80/20 rule captures this succintly: 80% of the reward comes from 20% of the effort, and the remaining 20% of the reward requires 80% effort. The exact numbers don’t matter. Divorced from contexts, it’s more of a guideline.

This is the invisible foundation of Marc Andreesen and Scott Adam’s career advice to level up your skills in multiple domains. Say coding and public speaking or writing plus math. If it’s exponentially easier to get to the 90th percentile than the 99th then consider the arithmetic¹³.

a) If you are in the 99th percentile you are 1 in 100.

b) If you are top 10% in 2 different (technically uncorrelated) domains then you are also 1 in 100 because 10% x 10% = 1%

It’s exponentially easier to achieve the second scenario because of the effort scaling function.

If this feels too stifling you can simply follow your curiosity. In Why History’s Greatest Innovators Optimized for Interesting, Taylor Pearson summarizes the work of Juergen Schmidhuber which contends that curiousity is the desire to make sense of, or compress, information in such a way that we make it more beautiful or useful in its newly ordered form. If learning (or as I prefer to say – adapting) is downstream from curiousity we should optimize for interesting.

Lawrence Yeo unknowingly takes the baton in True Learning Is Done With Agency, with his practical advice. He tells us to truly learn we must:

decouple an interest from its practical value. Instead of embarking on something with an end goal in mind, you do it for its own sake. You don’t learn because of the career path it’ll open up, but because you often wonder about the topic at hand.

…understand that a pursuit truly driven by curiosity will inevitably lend itself to practical value anyway. The internet has massively widened the scope of possible careers, and it rewards those who exercise agency in what they pursue.

Conclusion

Rohit’s essay anchored Part 1 of this series. I can’t do better than let his words linger before moving on to Part 2.

If measurement is too strict, we lose out on variance.

If we lose out on variance, we miss out on what actually impacts outcomes.

If we miss what actually impacts outcomes, we think we’re in a rut.

But we might not be.

Once you’ve weeded out the clear “no”s, then it’s better to bet on variance rather than trying to ascertain the true mean through imprecise means.

We should at least recognize that our problems might be stemming from selection efforts. We should probably lower our bars at the margin and rely on actual performance [as opposed to proxies for performance] to select for the best. And face up to the fact that maybe we need lower retention and higher experimentation.

Looking Ahead

In Part 2, we will explore what divergence in the tails can tell us about about life and investing.

If you use options to hedge or invest, check out the moontower.ai option trading analytics platform

On Contrarianism

A collection of ideas and quotes on contrarianism

On the necessity of contrarianism for divergent results

“Mimicking the herd invites regression to the mean.”

Here’s a simple axiom to live by: If you do what everyone else does, you’re going to get the same results that everyone else gets. This means that, taking out luck (good or bad), if you act average, you’re going to be average. If you want to move away from average, you must diverge. You must be different. And if you want to outperform others, you must be different and correct. As Munger would say, “How could it be otherwise?”

Charlie Munger

Why the best investments start out controversial

Josh Wolfe said the best performing companies in the portfolio were the companies that were heavily disagreed upon but 1 person enthusiastically saw the value.

The reason outsize returns never follow consensus is explained by Agustin Lebron in his book The Laws Of Trading:

You’ve seen that financial markets are extremely competitive arenas. Good ideas won’t be obvious ones, but rather subtle and hard to find. This requires a process that rewards creative thinking. Paradoxically, ideas that at first glance seem clearly to be good ones probably aren’t. The best, most creative ideas have the strange peculiarity of being half-great and half-terrible.

In the venture capital world, there is a truism that you should never fund obviously good ideas. This makes sense: if something is clearly a good idea for a business, then (a) why do they need your investment, and (b) why isn’t someone already doing it? Looking back to the beginning of this chapter and our definition of edge, you can see the connection. Good venture capital investments are either ones that only you can see, or ones that only you can access.

We are left in the uncomfortable position of trying to create a process and a culture that rewards crazy ideas, at least to some extent, but not so crazy that you waste time and effort tilting at windmills. But then again, no one said this was going to be easy.

This is very difficult because an idea needs to be radical but most radical ideas are failures.

If something is already consensus then money will have already flooded in and the profit opportunity is gone. And so by definition in venture capital, if you are doing it right, you are continuously investing in things that are non-consensus at the time of investment. And let me translate ‘non-consensus’: in sort of practical terms, it translates to crazy. You are investing in things that look like they are just nuts.” “The entire art of venture capital in our view is the big breakthrough for ideas. The nature of the big idea is that they are not that predictable.” “Most of the big breakthrough technologies/companies seem crazy at first: PCs, the internet, Bitcoin, Airbnb, Uber, 140 characters. It has to be a radical product. It has to be something where, when people look at it, at first they say, ‘I don’t get it, I don’t understand it. I think it’s too weird, I think it’s too unusual.

Marc Andreessen

The visual case for contrarianism

These charts are from 25iq and Oaktree respectively.

The danger of consensus thinking broadly

If you pursue ideas and opportunities which are popular and sensible today, you are on the path to being commoditized.

-Taylor Pearson in The Overton Window and How Creative Business Ideas Arise

Costanza

While hilarious, simply doing the opposite of what your instinct tells you doesn’t fully translate to the real world. However, what does work in the real world is doing things which are contrarian and correct. This is what makes great startups, great investments, great decisions. If it was obvious, it would have been done already and there’d be no margin, no opportunity. Lots of things are contrarian, but relatively few are contrarian and correct.

-Blas Moros in Costanza’s Law of Contrast

In Betting

You need to “beat the spread”. Heavy favorites are priced in.
When filling out an NCAA bracket, lotto tickets, or playing Daily Fantasy the expectancy of a strategy is conditional on how much you win when you are correct. If you choose strategies that are likely to be followed by others the fact that you “must split the pot” changes whether it was a good play in the first place.

Assorted quotes

Consensus and alpha don’t mix

-Patrick O’Shaughnessy

What important truths do you believe that few people agree with you on?

-Peter Thiel