my aversion to trading implied skew

First of all, free subs to moontower.ai can access a few tools and reading materials as well as the community but they cannot post and can’t see analytics.

Here’s a question that was posted in the community this week:

I was reading thru an old tweet of yours on trading skew. The tl;dr of the tweet was don’t trade skew… Given I am in a masochistic mood, how would one go about backtesting a skew trading strategy?

I had 2 ideas, which I’d love to get your thoughts on.

Idea 1:

X asset 25d 3M normalized put skew is in the 100th percentile, sell a 25d put strike, delta hedged
hedge the delta daily or at some discrete interval
check how this strat would have performed assuming the trade is held until expiry

Idea 2:

X asset 25d 3M normalized put skew is in the 100th percentile, sell a 25d put strike, delta hedged
wait until normalized skew returns to some threshold, for example 75th percentile
hedge the delta daily but close out the trade as soon as the threshold is hit

Lots of questions, but the main ones are:

for idea 1, does your pnl depend on implied skew vs realized skew (similar to implied vol vs realized vol). How would you measure this?
for idea 2, does your pnl depend on a combo of realized skew (for as long as the trade is held) as well as surface repricing (ie selling at 100th percentile implied skew and closing out at 75th percentile). The thought of measuring this gives true masochists vibes, but how would you?
I wonder if the juice is worth the squeeze? Meaning, assuming you built the foundation to measure/test all the above, is there really any pnl in it / are you better off focusing on VRP trades?

My response:

As a matter of practicality, I think the test should be more in the vein of idea #1.

If you consider skew percentiles, the difference between the the 25th and 75th percentile could be some absolutely small number like 2 vega points. And the level of skew itself measured by percentile is sensitive to the percentile lookback such that the range you are trading over is just quite small. Your interim p/l will be the sum of implied vol change but plus realized delta hedging p/l.

But consider this…let’s say you sell the 25d put and it becomes a 50d put but the skew normalizes. That skew metric is no longer referencing your position. You have a floating vs fixed problem. In other words, you can’t really trade implied skew directly.

Your results are basically going to come down to path. Your interim p/l is going to get marked based on the IV of the fixed strike you have on and that in turn is going to influence the delta you hedged on.

The delta you hedge on is going to have a large impact on your final p/l so it’s not just where does the stock go but what deltas were imputed along the way. For example suppose you run a model with spot/vol correlation embedded in the SP500…this will generate higher OTM put deltas.

If the market trends down you will win to this but vice versa. However, if you used B-S deltas you will get hurt as the market goes down and vice versa. And even then, you will def get hurt on the marks, but if the stock expires near the short strike you will probably still win by expiration even though the mark-to-market path is hairy.

I used to work with a big oil options trader that would on a monthly basis stick a hedged 1-month risk reversal in a separate account and hedge it on B-S deltas. My point is that is an active choice that influences the results. Another choice could be to hedge on deltas that don’t incorporate implied skew at all but just use ATM vols.

Overall, testing the idea, even a monte carlo, is a great way to get a shape of the problem but more importantly because you can see how the parameters you choose impact the p/l path.

I’m not kidding when I say skew trading is masochism. If oil is $75 and has massive put skew and the market drifts down to $55 and the skew gets hammered (so say the 40 puts don’t perform) but you sold the 60 put what skew did is irrelevant. All that will matter is how fast did the stock go to $55 and what deltas were you running on the $60 strike along the way.

The weirder the distribution the crazier this is. I’ve seen nat gas option traders blow out being long put skew on a 15% drop in the underlying because they used too high of an implied option delta and they delta hedged several times on the way down.

Had they they run a lower vol and delta OR hedged less they might have survived. There’s not much lesson from this other than…sometimes a 15% selloff is interpreted by the market as “stabilizing” and sometimes it’s destabilizing and that is what’s gonna dictate the options behavior.

entitled to that alien life

In a moment of procrastination I went to Twitter and found something to spout off on.

It’s about the coverage of the election.

Ha, ha, yea right. I’d rather piss razors than talk about politics, thank you very much.

I did peek at Twitter and though I’ve seen some story about people making 6-figures feeling broke for the millionth time, I failed to contain my need to pop off this time. Actually, the triggering source material is totally innocent. It’s a nice post by Ben Carlson, one of the first finance bloggers I started reading about a decade ago when I first started trying to learn about investing (I’m a trader and trading is not investing).

I tweeted his excerpt with my thoughts. I reprinted a lightly edited version below the screenshot. It’s in reference to HENRYs (high-earner-not-yet-rich), an acronym nobody in the history of the world has ever uttered without a derisive smirk. In some circles, they are known as the “working rich”.

I’m sorry but if you’re actually rich you don’t think of these people as rich and if you’re a regular person you don’t consider these people working class. This acronym is the personal finance version of the “finger cuffs” nickname memorialized in Chasing Amy. You’re under 50, live near the coast, and are worth $5-$15mm. Congratulations, the inflation you moan about also went into your income.

Let’s just get to the sauce already.

https://awealthofcommonsense.com/2024/10/high-earner-not-yet-rich/

The tweet thread:

This post is fine, reasonable posture. …but if I may piggyback off some of the numbers in the post… I want to repeat what the post says but with slightly different emphasis which leads to a large difference in framing.

For example, it talks about people in the top 5% of income or wealth being disappointed, and you’d be better off going in with lower expectations.

When I think the harsh but true reality is being in the top 5% when you measure against the whole population just… isn’t that special.

When I was a kid, being rich was like Robin Leach stuff. It felt totally unattainable. And you know what? It is. Rich, as defined by its day, is unattainable.

Of course, someone will say, “Actually, it is attainable,” and my argument is… it’s not for people who complain that they don’t feel rich or whatever. Because the mindset that comes with Robin Leach-level desire would never accept the top 5% as being entitled to anything. It’s a failure.

So when the article says to lower your expectations, my reframing is: get serious about what being excellent means. You’re not even close if you’re a top 5%er. So either get serious about what it takes to be rich, or be seriously gracious for what you’ve gotten. All the woes of inflation and yadda yadda hit everyone.

You’re still stack-ranked in the same relative sense. Moaning is so unbecoming; it feels like such a confession of how duped you were by thinking school was real life or that X dollars mapped to Y.

I just don’t feel like being rich is something I should ever expect unless I’m so badass that I’d be comfortable being at a table with people that are unmistakably badass.

You’re a PM at a tech company and you don’t feel rich? Of course you don’t. You shouldn’t. You’re not scarce AF. You haven’t taken a massive risk and come out on the other side.

The fact that there are normal people who became very rich by being in the right place at the right time shouldn’t influence your expectation of that likelihood.

(Houses are so expensive where I am not because everyone is a celeb… it’s because I live within 50 miles of “I’ve been at Google/Pinterest/CRM/Dash since pre-IPO.” That’s called hitting the lotto. Most of the people in tech who have been around for 20 years have been bouncing around from one high-paying job to the next but falling short of the exponential. I use this analogy a lot.)

To get where I got in my career required flipping heads 10x in a row and tremendous effort. That earned me comfort. But to become one of the bosses who are ballplayer-rich, I’m probably staring down another chain of 10 in a row. I don’t get that life in this life. I’m lucky to be anywhere near the one I got, but it’s utterly unremarkable, and this is inclusive of being very good at my job.

Smart, hardworking, etc. Table stakes. You’re not special. Comparing your 5%-ness with the masses is nothing more than a poorly adaptive exercise in self-flattering benchmarking.

I suspect the mambas are looking to transcend comparison and would never use it to self-soothe or justify. If you want to feel rich, expect alien powers from yourself instead of expecting that being way above average should entitle you to an alien life.

No looking over the fence at others.

Look in the mirror. Are you doing truly alien shit?

boredom risk

If you know about education and want to write about it, I’m just letting you know you should go for it. There’s an audience. That Principles of Learning jam I put out Wednesday got me a lot of inbound. Which I never would have predicted. It was an exercise in organizing my thoughts on a topic I find inexhaustibly interesting — learning faster and more efficiently. If I wasn’t loving building moontower.ai, I’d be pulling on that thread harder.

[If you are operating at the intersection of machine learning and human learning and want to connect hit me up. I have a close friend at the tip of that spear who I’d be asking for a job if I wasn’t being feral.]

Anyway, parents wanna know about this stuff based on my email inbound. Just sayin.

In that vein, I’m going to share a response I sent to someone who recently enrolled their 7th grader in Math Academy. The parent is concerned that by going faster (even in a selective private school that the child is already in) that boredom could become an issue.

I’ll be honest. I hadn’t considered that angle. But it’s a totally legitimate one considering that MA’s Justin Skycak addresses it directly in:

The Greatest Educational Life Hack: Learning Math Ahead of Time (5 min read)

Justin frames it in terms of risk and reward. That’s a valid approach. But for some kids, it’s still too conservative. Pre-learning is the ultimate option on having doors open that simply won’t any other way because you are compressing time.

In one of Paul Graham’s best essays, How To Make Wealth, he talks about the decision to join a startup in such terms. In Startup = Growth it’s spelled out — “raising money lets you choose your growth rate”. If the kid enjoys going faster, let ‘er rip, they aren’t aware of the option your giving them but they might thank you later.

[I wish I could make copies of myself to do all the stuff I want to do but at the same time, I consciously don’t want to burn the candle at both ends right now. There are doors that are closed because I didn’t go faster when the cost of going faster from a family POV was lower. But I wasn’t inspired to go faster then. Anyway, I’m not writing for therapy here, but if I’m projecting my own illusions you should at least have the disclaimer. ]

I suspect the downside is not especially sensitive to pre-learning anyway. Even if a motivated or math-inclined kid didn’t pre-learn they’re gonna be bored. The teacher will introduce a topic, the kid will get it immediately and still need to wait for others to catch up.

Something I tell my kids, and I don’t express it any type of subversive tone but just as a matter of fact…you can’t let the pace of school dictate what you think is a normal pace. School is built for everyone, but if you are good at sports you wouldn’t expect to move at the pace of the average kid in your class. You’d have a coach and play “up” or on a club team.

The subconscious message we pick up everywhere, especially in school, is that there’s a correct pace. But the error bars around that pace are massive. We know our children so deferring to what’s best for the average when we have specific info (whether they need more help or more stimulation) is wasting info. As always, it’s sound decision-making hygiene to consider the outside view but adjust it for your circumstances.

One last caveat — if I found out my bored kid was working on his fantasy football model underneath this textbook at school because he was bored my reflex would be “Sweet, show me what you got so far. But also you better get an A+.”

Moontower #245

Friends,

Anyway, parents wanna know about this stuff based on my email inbound. Just sayin.

I’ll be honest. I hadn’t considered that angle. But it’s a totally legitimate one considering that MA’s Justin Skycak addresses it directly in:

The Greatest Educational Life Hack: Learning Math Ahead of Time (5 min read)

Money Angle

In a moment of procrastination I went to Twitter and found something to spout off on.

It’s about the coverage of the election.

Ha, ha, yea right. I’d rather piss razors than talk about politics, thank you very much.

Let’s just get to the sauce already.

The tweet thread:

For example, it talks about people in the top 5% of income or wealth being disappointed, and you’d be better off going in with lower expectations.

When I think the harsh but true reality is being in the top 5% when you measure against the whole population just… isn’t that special.

When I was a kid, being rich was like Robin Leach stuff. It felt totally unattainable. And you know what? It is. Rich, as defined by its day, is unattainable.

I just don’t feel like being rich is something I should ever expect unless I’m so badass that I’d be comfortable being at a table with people that are unmistakably badass.

You’re a PM at a tech company and you don’t feel rich? Of course you don’t. You shouldn’t. You’re not scarce AF. You haven’t taken a massive risk and come out on the other side.

The fact that there are normal people who became very rich by being in the right place at the right time shouldn’t influence your expectation of that likelihood.

Smart, hardworking, etc. Table stakes. You’re not special. Comparing your 5%-ness with the masses is nothing more than a poorly adaptive exercise in self-flattering benchmarking.

No looking over the fence at others.

Look in the mirror. Are you doing truly alien shit?

That’s that.

It’s an excuse to insert some Velvet Revolver.

Money Angle For Masochists

Portfolio Vol 101

I trimmed the video from Thursday down to a 15 minute section that gives an education as well as a step by step implementation of computing portfolio vol. There’s even a little detour into dispersion.

On my aversion to trading implied skew

First of all, free subs to moontower.ai can access a few tools and reading materials as well as the community but they cannot post and can’t see analytics.

Here’s a question that was posted in the community this week:

I had 2 ideas, which I’d love to get your thoughts on.

Idea 1:

X asset 25d 3M normalized put skew is in the 100th percentile, sell a 25d put strike, delta hedged
hedge the delta daily or at some discrete interval
check how this strat would have performed assuming the trade is held until expiry

Idea 2:

X asset 25d 3M normalized put skew is in the 100th percentile, sell a 25d put strike, delta hedged
wait until normalized skew returns to some threshold, for example 75th percentile
hedge the delta daily but close out the trade as soon as the threshold is hit

Lots of questions, but the main ones are:

for idea 1, does your pnl depend on implied skew vs realized skew (similar to implied vol vs realized vol). How would you measure this?
for idea 2, does your pnl depend on a combo of realized skew (for as long as the trade is held) as well as surface repricing (ie selling at 100th percentile implied skew and closing out at 75th percentile). The thought of measuring this gives true masochists vibes, but how would you?
I wonder if the juice is worth the squeeze? Meaning, assuming you built the foundation to measure/test all the above, is there really any pnl in it / are you better off focusing on VRP trades?

My response:

As a matter of practicality, I think the test should be more in the vein of idea #1.

Overall, testing the idea, even a monte carlo, is a great way to get a shape of the problem but more importantly because you can see how the parameters you choose impact the p/l path.

From My Actual Life

This week I found the thing my 8-year-old thinks is the funniest thing in the world.

Live action Butthead.

I’ve had never seen this skit before but Max and I were down a rabbit hole of SNL cast members breaking character after we watched a few segments from Nate Bargatze last week.

There is a great moment in the skit above a character breakage. We’ve watched it 15 times this week at least.

The rest of our week was kept light by the fact that Max is now walking around the house imitating Bill Hader doing impressions of Arnold Schwarzenegger.

You don’t wanna miss this one:

Max is also loving Fred Armisen, Kristen Wiig, and especially The Californians skits which are god-tier imo. I gotta catch Max on video doing Arnold…”Cahm ahn, show mee your leedahship cape-ah-beeleeties!”

Stay Groovy

☮️

Volatility term structure from multiple angles (part 1)

The post Dragonfly Eyes served as a broad preamble to our exploration today. Be like a dragonfly — look through multiple lens.

We’re going to expand our thinking about volatility term structure to see why it’s a diamond with several facets — and most interestingly — why multiple ways of looking at it are not all correlated.

We are going to consider volatility term structure in a few ways. The differences will make the value of multiple lenses self-evident. The source of the differences have highly practical ramifications for 3 tasks:

risk management
surface modeling
trade prospecting

I would be surprised if even an experienced trader didn’t walk away folding the Rubik’s cube known as vol term structure in their head. If anything, I’m sure a seasoned trader can find some interview questions embedded in the concepts to bounce off candidates.

If you are a novice trader, you will still benefit. There’s nothing more than arithmetic in here. The value of hacking ideas from several vantage points will be obvious plus you will learn some basic transformation and measures that the more experienced folk take for granted.

About this post

The post is the first of a 2-parter. It won’t all fit in the single email view plus I’ve been under the weather this week. All the background work is done but I’m running on fumes to write it all up.
It’s a semi-Socratic progression of “show don’t tell” which serves to make the lessons your own.
We will use GLD vol data for the past year which was whimsically chosen. I didn’t snoop at the data first.
We get into implications for your own procedures.*
Finally, I talk about how and why I will extend the analysis.

*The word “your” prompts a reasonable question — who’s this for? I’m imagining a trader or risk manager at a prop shop/asset manager or an extremely sophisticated retail option trader. The material comes from pragmatism and experimentation. A durable way of seeing based on lots of pain. This is the stuff of salt mines. The way traders think.

Where does this intersect with quant and formal risk management?

Everywhere.

Quants may have a different language and set of methods for computation but the concerns are the same. I’ve said it before, but the caricature of the ivory tower theoretical physics PhD without street smarts is foreign to me. All of the gigabrain quants I’ve worked with were both practical and exceptional at asking questions. Their priority was reality.

Their knowledge becomes indispensable as risk management scales and portfolios become far more complex cross multiple strategies and asset classes. I have little to add to the code-level minutiae of implementing a large-scale risk OS. I pretty much operated at the frontier of how much I could keep in my head at once but as combinations expand exponentially, well, you’re gonna need a bigger boat.

Let’s open with a “simple” question:

If you buy a 6 month/1 month straddle spread on an equity or ETF, are you long vega?

(To be linguistically clear, you are buying the 6 month straddle and shorting the 1 month)

If I’m asking the question you already know there’s more to this than simply “6 month option vega is > 1 month option vega” so YES.

You can see where I’m going with this if I use an analogy question. If I buy X dollars NVDA and short Y dollars of SPY, am I long the market if X = Y?

The question comes back to beta. Beta is a function of correlation and vol ratio. Just like with equities to a benchmark, the correlation of vol changes across a term structure is usually quite strong. We are mostly concerned with how much the vol moves with with respect to another vol.

We don’t expect 2-year implied vol to move as much as 1-week implied vol. In the NVDA question, beta tells us the ratio of X to Y to be “market neutral”.

Going back to the vol question. It’s true that you are long vega if you buy a 6 month straddle and short a 1 month straddle. You are long “click vega”. If the entire term structure parallel shifted higher 2 points you will win 2 x [net vega].

But vols don’t generally move in lockstep across the term structure. Instead, it’s common to weight the vols by their sensitivity to a fulcrum month and then re-scale all your monthly vegas by the weights.

So the answer to the riddle might be YES you are long vega but it’s not necessarily the most helpful answer. If vol parallel shifts up, you will definitely win and the idea that you are long vega was certainly true in this unusual dynamic. It is reasonable to expect 1-month vol to increase faster the 6-month vol.

This brings us to our next question.

If the front month vol increases by 1 point, how much does the 6-month vol need to increase to merely break even?

Hint: For an ATM (well technically at-the-forward) straddle the only things that affect the vega are spot price and DTE

This is an equity or ETF so the spot price is the same for both months. Note this would not be true for futures which have a curve of different underlyings.

The difference in vega is proportional to sqrt(dte). In this case the sqrt (6 / 1) = 2.45

[If this is not clear, recall the ATF straddle approximation from The MAD Straddle is straddle = .8Sσ√T.

Straddle vega is just change in straddle price per 1 point change in vol. We just re-arrange the formula:

straddle/σ = .8S√T

Since we are comparing 2 months, .8* S cancels out and we are left with the vegas being proportional to √T]

If 1 month vol increase by 1 point, then the back month vol needs to increase by 1/2.45 or .41 to keep the straddle spread price constant (ignoring theta).

If 6-month vol increases by more (less) than .41, we make (lose) money on the vol expansion.

Whether we are long or short vega is more ambiguous than it appears from our headline “click vega” measure.

√t Vega Scaling

Just like we beta-weighting a basket of stocks allows us to group directional exposure into equivalent SPX delta, it’s common to weight vega as a function of √dte. You may choose a “fulcrum” term such as 180 days to anchor the definition of vega and then re-scale each month’s vega by √(DTE/180)

This is review from Understanding Vega Risk:

This kind of scaling allows us to summarize the position with a statement like:

“If 6-month vol increases (decreases) by 1 point, I expect to lose (make) $13k”

This would not be obvious if you are looking at the sum of raw vega.

√t scaling is not pulled out of a hat. It corresponds to a world where straddle spreads are constant (again, ignoring theta). Armed with that straddle approximation formula, it is simple to prove that to yourself.

There’s nothing gospel about this weighting scheme. In the moontower.ai app users can view daily vol changes scaled to several choices of tenors. The point is that by normalizing at all, it is easy to see which straddle spreads changed. Implied volatility is a shortcut to get to a price and prices are what your p/l depends on. If you have a time spread on, the change in its price is the thing you care about.

Any vol weighting scheme you choose will not be perfectly accurate (if it were you could literally predict how the vols would change relative to each other, in which case you should have already chucked your phone in the ocean from the hammock of your private island). But it’s a gigantic improvement in risk monitoring from raw vega. Of course, any summary measure is a trade-off between convenience and resolution. The explicit trade-off with √t scaling:

Benefit: Easy to see changes in time spread prices across the term structure. Highly intuitive and interpretable.

Drawback: Inaccurate. Said differently — there’s lots of room to improve the accuracy of the weights from empirical data which will lead to better understanding of vega risk.

Personally, I think dialing in better weights would increase cognitive load. You’ll need to think about what regime or lookback the updating weights are drawn from. The inaccuracies of the constant straddle spread assumption are “the devil I already know”.

[At the risk of wasting digital ink, it should be obvious that designing metrics always depends user context. Building infra and dashboards is an exercise in being a product manager that serves many clients — traders, risk managers, accounting, and back office.]

√t scaling is a meaningful but rough improvement in measuring vega by treating weighting each month differently. Constant straddle spread is an implicit dynamic embedded in the calculation. It’s an assumption. So how do vols across a term structure actually move relative to one another?

Let’s see what we can discover by hackin’ on some data.

Viewing term structure

We are used to looking at charts like this:

That’s a snapshot of the SPX IV term structure. It’s an ascending shape. We used the term “steepness” to refer to the ratio of 1M/6M vol. In this case it is less than 1.0 since 6-month vol is premium to 1-month.

It’s a common shape (in oil we referred to this as the “droopy penis”). It’s relatively quiet, near vols are subdued, back months expect mean reversion to longer-term averages.

In early August, this would have been sharply descending with a steepness much greater than 1.0 as the near term stress and high realized vol was baked into the front of the IV term structure and sloping down to vols which suggest the markets will eventually calm down.

Instead of a snapshot, a time series can help us capture all that motion. We are going to use about 1 year of GLD data (10/2/23-9/23/24) for the rest of this post.

This is a daily time series of 10d, 30d, 90d, 6M IV. The darker blue line is the 10d IV and the sparkly blue line is the 6m IV. You can see how the 10d IV is itself quite volatile sometimes sagging below the 6m IV (“the droopy penis”) and sometimes shooting way above it into a backwardated or inverted term structure.

This is an instructive way to see term structure behavior albeit highly zoomed out.

Let’s use another chart for a closer look. We will include 2 time series:

The 1M/6M vol ratio
The 1M/6M implied forward vol

A forward vol represents the implied amount of volatility that exists between the 1M and 6M expirations. (It sounds complicated but 1 month VIX future settles to “what will the VIX, a 30d forward looking measure, be in 1 month”)

You can use the free moontower.ai forward vol calculator to play with the idea and read about the concept.

The central point is that forward vol is another way to consider the difference in relative volatility between 2 expirations. Seeing things in multiple ways is the focus of this post.

What stands out about this chart?

Here’s what I see:

The forward vol is sometimes high and sometimes low regardless of the ratio.

Let’s say you get an 80 on a test, but there’s a prediction market on your final grade in the class that is trading for 90. The market is implying that you are going to be acing the rest of your tests.

Conceptually the math here is similar — when the front month vol is low and the vol ratio is far below 1.0 (steep term structure) the forward vol can and often does look quite high. The market makes it expensive to lock in cheap volatility for a long time even when the vols are low (the expense will manifest as “roll down”).

But look at early March — not only was the vol ratio low, the forward got crushed. If you only look at vol ratio, you missed this.

Implied forwards are an orthogonal or complementary measure of relative volatility that is additive to your perspective.

I’m a big fan of waiting for imperfectly correlated signals lining up to size up.

Remember, dragonfly eyes.

Next week we will continue with part 2 where we:

deconstruct the nature of this relationship further
consider what the differences between lenses means for finding opportunities and measuring risk
substantiate both the how and why of extending the analysis
- I plan to actually publish the extended analysis as well, but it won’t fit in next week’s letter on top of the rest of this

moontower.ai testimonial and some of my thoughts

One of our pro subs sent us a note this weekend:

Subscriber update:

I’ve had a really successful first month with my moontower built trades. I’ve just been buying cheap vol verticals and diagonals, time spreads and selling expensive credit spreads, fine-tuning with the skew screen, playing the macro uptrend.

Been helping me see the surface better, especially because never knew how to look at the surface before. Same with skew.

This is my novice approach (8 year experience, retail).

Thanks for constantly improving the site.

A challenge that we totally anticipated in opening moontower to subscribers is retail options trading tools are incentivized to dumb down options. We are always trying to balance the challenge of simplification with the reality of options — they are always about volatility.

The primer and mission plan documents are a bridge between a vol-lens understanding of options and actually using them. They outline a step-by-step progression that constructs your vol opinion so you can marry it to your directional bias.

If you are bullish and vol is high you can sell OTM puts or covered calls. If the vol was cheap you could buy puts and stock or you can buy OTM calls. If vol is cheap and call skew is expensive you can buy call spreads. Regardless of your directional bias, if the vol stands out is cheap or expensive you can create an expression of the the trade that aligns both a stock and vol opinion.

If the vol isn’t interesting or highly ambiguous, you’ll see that too in our cross-sectional lens. Passing on a trade that you would otherwise would have done if you had less info is a profitable counterfactual. With the tools I’ve shown people “If you like selling X, you must LOVE selling Y” which is another way to say — “hey that thing you want to sell is relatively cheap and now you can see why”.

So when our subscriber sent that message, I wanted to know specifically how the tools helped. Here’s the response [symbols redacted]:

To be honest. I literally just followed the explainers you guys made. I focus on cheap long term, cheap short term, and expensive short term. Then check the skew for 30 day or 180 day then 60 day.

I found [ABC] and [XYZ] (underlying I will trade often cuz of liquidity) both were in the cheap long term basket. ITM was cheap by skew. Then looked at short term skew and found it was expensive OTM, so sold short term, hence the diagonals. [XYZ] is always on my radar, but I never really new what was cheap or expensive. I loved that the software quantitatively showed me that. [ABC] is not always on my radar, but the site pointed me in the right direction and I did the same thing.

Also traded an [QRS] iron condor for profit. Found it was expensive short term ATM. Saw that it was cheaper relatively OTM. Boom iron condor for me.

Then I saw [XYZ]’s profile changed. Short term became cheap ATM. OTM became relatively more expensive. Bought an inverse condor. Did not hold it to profit. Sold it cuz of bleed. Next day it moved into a profit zone (my bad).

I have some [DEF] calendars that are green, but not closed. Software helped me formulate the hypothesis of selling elevated vol before the election and buying post election vol. We’ll see on that one.

Had a very profitable [LMN] credit spread. Again, software showed me that ATM was expensive and 25 Delta was relatively cheap, so I did a credit spread between those deltas. Then I had a profitable debit spread just from buying cheaper skew and selling more expensive skew all in an elevated IV time series. That also worked for me.

The main things I feel I did right was 1) not try to fight an broad market up trend. 2) opened my mind 3) took some risk employing stuff that I am learning (but always wanted to understand better).

This reply highlights a point I make to people that ask me if it’s the right tool for them.

If you are serious about using options the direct cost of the sub is irrelevant. It’s a dime on 100 equity option contracts over the course of a year. Just as the cost of a book is irrelevant. The true cost is in the time to invest in examining markets. But if you’re serious, you are already through that gate.

There is no version of actual options or any trading that doesn’t take effort. This is an irreducible truth. I’ve had consulting calls with people who hated that message — but so and so says X is possible. And just like the realtor who promises the highest price for listing your house, it’s more profitable to tell people what they want to hear.

I’ve refunded those consulting calls graciously. I’m not trying to convert anyone.

If you are serious, moontower.ai is here to help.

We offer 2 pricing plans:

$100/month billed annually

$150/month billed quarterly

A premium subscription to moontower.substack is included for free ($180/yr value).

The Principles of Learning Fast

Friends,

In September I wrote about signing my 6th grader and myself up for courses on mathacademy.com. It’s been a month and we’re addicted and competing with each other to level up in our respective leagues by gaining XP. A unit of XP approximates “one minute of focused effort by a serious but imperfect student”.

[I’ve turned a bunch of readers onto this site just as I was turned on to it by another reader and now I got peeps texting me questions about it or telling me about their kids progress. You love to see it. Random side note — I have a good friend who just moved from my neighborhood to Austin because he’s deep in the education/AI intersection and the weird city is the scenius for education experimentation. I mentioned it to him and let’s just say he knew all about it from different angles. What he told me only got me even more stoked about what mathacademy is doing fwiw.]

In that post, I pasted links to 30 articles that I planned to read by the site’s chief quant Justin Skycak after already doing a fair bit of reading on the blog. I’ve plowed thru the 30 articles and then some, which is still just a fraction of what’s on there.

I’m personally highly interested in the entire topic of using AI to develop talent and learn at rates that were previously unthinkable. I have a large unfinished document with years of insights that I’ve pulled together from various sources that probably won’t see the light of day. For education I’m a big fan of writers like Scott H. Young, Cedric Chin, Matt Bateman, and Freddie deBoer. You can search the substack for all the times I’ve referenced their work and I have plenty more in backlog. I’ve also harped on the degree to which SIG’s education was extremely well-mapped out from a pedagogical point of view. It wasn’t until I heard Todd Simkin explain the educational influences that informed how they taught did I appreciate the extent to which education theory underpinned their methods.

See:

🔗Educational Ideas Inspired By Seymour Papert’s Constructionism (Moontower)
🔗Notes From Todd Simkin On The Knowledge Project (Moontower)
🔗General & Childhood Education Articles (Moontower)

I’m adding Justin to my list of must-reads. After spending most of Sunday with the blog, I’ve synthesized a much more condensed version of Principles of Learning except it’s fully based on Justin’s insights.

I reached out to him when I first discovered the site and made my interest in what he’s doing as plain as possible. I told him:

I think being born on 3rd is to get exposure to someone when you are young who shows just how self imposed our speed limits are.

He hadn’t heard it put that way before.

I harp on this stuff. You’ve seen it on my affirmations page.

The wealth you give a youth is self-efficacy. A chance to match their abilities to the needs of communities they find themselves in as they get older. Autonomy and confidence through competence.

When I say “speed limit” I’m not referring to speed only, or even necessarily. It’s more about limits in general. In athletics, you can’t be Lebron no matter what you do. But whatever your limit is, it’s further than you think. It goes without saying that finding your limit requires brutal effort and commitment…but however far that gets you, personalized instruction will get you even further.

If a great teacher/mentor/coach will get you further than the frontier that caps out at a given level of effort, then that role has insane leverage. The very act of pushing through a previously-conceived frontier will increase your motivation and effort as you see what’s possible.

There was a Washington Post article several years ago referring to “America’s most advanced math program” in Pasadena. The kids were crushing the AP Calc BC exam in 8th grade.

Who were the teachers?

The founders of mathacademy.com

The Math Academy began as a tutoring program run by husband-and-wife duo Jason and Sandy Roberts before being formally adopted into the PUSD curriculum in 2017.

Seen narrowly, mathacademy is an AI program that helps you learn math faster.

I think this is to miss what’s coming.

The instruction portion of the personalized coach is being automated.

I’m fairly convinced that we aren’t too far from knowledge not just being democratized (I mean Wikipedia already exists) but structured for delivery on incredibly effective, personalized rails.

Before someone’s reactance reflex gets all buzzy, I don’t mean “education” will be solved by a robot. Instruction is simply one component of education. Motivation, support, guidance, as well the type of story-telling and conversation that relates classroom learning to the world and others is as human-based an activity as a warm hug. But if the price for personalized instruction craters, the secondary effects are going to be large and visible.

At scale, we are going to find out just how many kids are capable of finishing Calc BC by grade 8 or publishing a novel in middle school. We hear those stories now and we dismiss them as “genius” or “privileged”.

But what if a low price for personalized instruction tells us we’re wrong about this? There will always be examples of genius or privilege. But if stories of insane achievement start multiplying amongst broken-English immigrants or other groups who are not advantaged in any way EXCEPT in motivation than you’ll know that the things Justin is writing about turned out to be true.

The price of personalized instruction falling is not a panacea. The cost is only a bottleneck after basic needs like stability and safety are met. But the cost is an active bottleneck for all but the rich once those needs are met. Even expensive schools are only incrementally better on truly personalized instruction (their primary advantage might be the compression of the classroom range to a higher functioning average but that’s not the same as personalized instruction so much as a release from tolerating a small number of disproportionally disruptive students).

I’m fascinated by mathacademy because of what it telegraphs — a future of cheap personalized instruction. I’m not picturing slicker edtech apps here. This is a glimpse of something different.

Libraries were free. The internet is free, convenient, and wider reaching. Sal Khan is a prophet who built on its rails. Well, the tracks are being upgraded.

The trains are going to go faster.

The full document can be found as a moontower guide:

🎓Principles of Learning Fast

It’s a living document that I’ll add to over time.

This condensed version hits most of the highlights based on what I’ve read so far. I pontificate like a blowhard at the end a bit more.

Maximizing the Learning Rate: A Neuroscience-Informed Approach to Education

The objective function of educational strategy outlined below is to maximize the learning rate—helping students acquire and retain knowledge more effectively. There are certainly great programs for independent learning out there but the objective in this discussion is to leverage technology and cog sci to progress through levels of mastery faster.

What Neuroscience Has Taught Us About the Brain

These are some of the most durable findings in cognitive science.

Neuroplasticity: The brain’s ability to rewire itself through new experiences is one of the most significant findings in neuroscience. Neuroplasticity means that the brain continually adjusts its neural connections in response to new learning. This allows learners to develop new skills and adapt to challenges. Methods like deliberate practice particularly the “effortful repetition” and “successive refinement” aspects repeatedly strengthen neural pathways until tasks become second nature (Talent Development vs Traditional Schooling).
Dopamine and Motivation: Neuroscience has shown that dopamine, a neurotransmitter, plays a critical role in motivation and reward-based learning. When learners experience success, dopamine is released, reinforcing the behavior and encouraging continued effort. This makes motivation a crucial component of the learning process, as it directly influences how willing a learner is to persevere through challenges.
Working Memory and Its Limitations: The brain’s working memory, or the ability to hold and manipulate information temporarily, is limited. Overloading this system can impede learning, as the brain can only focus on a few pieces of information at once. Techniques like chunking—breaking down complex tasks into smaller, more manageable units—can help mitigate this overload (When Should You Do Math in Your Head vs Writing It Out on Paper?).
The Science of Forgetting: One of the most critical insights from cognitive psychology is the concept of forgetting curves. The theory, which dates back to Hermann Ebbinghaus’s pioneering research, shows that learners forget newly acquired information rapidly unless there is some form of reinforcement. The brain’s natural tendency to forget is often visualized in a forgetting curve, which steeply declines in the hours or days after learning.

Forgetting Curves and Memory Decay: Ebbinghaus’s forgetting curve demonstrates that without review or rehearsal, retention of new knowledge drops quickly over time. However, the rate of forgetting slows down when learners engage in retrieval practice and spaced repetition—both of which can flatten the curve, leading to more durable retention.

Spaced Repetition Leads to Automaticity: Over time, repeated retrieval practice pushes learners toward automaticity—the ability to recall information effortlessly. Once information is retrieved enough times across spaced intervals, it becomes deeply embedded in long-term memory. Efficiency is achieved through repeated activation and myelination – a process where neural pathways are coated with a substance called myelin, increasing the speed and efficiency of signal transmission.

This is the key loop:

Retrieval practice > Automaticity > Reduced demand on working memory >

The learner frees up cognitive resources for more complex tasks, facilitating better problem-solving and higher-order thinking.

“Automaticity frees up cognitive resources that would otherwise be consumed by basic recall tasks, allowing for higher-order cognitive tasks to take place in the working memory.”

Implications for Implementation

The implications of neuroscience and research on forgetting curves for learning are vast. Here’s how these insights translate into effective learning strategies:

Retrieval Practice and Minimizing Forgetting: The act of retrieving information from memory, rather than passively reviewing material, significantly boosts retention. Each successful retrieval attempt strengthens neural pathways and makes the knowledge more durable. As learners engage in retrieval, they disrupt the forgetting curve and prolong the retention of knowledge (Which Cognitive Psychology Findings Are Solid, That Can Be Used to Help Students Learn Better?).
Spaced Repetition for Long-Term Retention: By structuring review sessions at increasingly spaced intervals, learners allow time for memory consolidation. This reduces the steep decline of the forgetting curve, especially in the early stages of learning. Over time, the intervals between repetitions can be extended without significant loss in retention, enabling efficient long-term learning. The use of spaced repetition systems (SRS) has demonstrated significant improvements in student performance (Optimized, Individualized Spaced Repetition in Hierarchical Knowledge Structures).

Common Misconceptions in Learning

It’s easy to fall into widely accepted beliefs about how people learn, but research has debunked many of these ideas. Here are a few myths that might surprise you:

Learning Styles: Contrary to popular belief, the idea that individuals have specific “learning styles” (e.g., visual, auditory, kinesthetic) and that teaching should be tailored to these styles is unsupported by research. While students may have preferences, these preferences do not significantly improve learning outcomes. Instead, using varied teaching methods that engage multiple senses enhances learning for all students (Why is the EdTech Industry So Damn Soft?). Veritasium has also called this the “biggest myth in education”.
The Myth of Productive Struggle: While allowing learners to struggle through difficult problems might seem beneficial, research has shown that this is often counterproductive, particularly for novices. Without proper guidance, prolonged struggle leads to frustration and disengagement. Scaffolding and explicit instruction provide the necessary support to avoid cognitive overload and enable meaningful progress (What’s the Best Way to Teach Math: Explicit Instruction or Less Guided Learning?).
Discovery Learning vs. Direct Instruction: The idea that students should learn concepts through self-discovery has been largely debunked, especially for beginners. Direct instruction, which provides clear guidance and support, has proven far more effective in most learning scenarios. Discovery learning works well for experts but can leave novices overwhelmed and unproductive, a paradoxical finding known as the “expertise reversal effect”. (The Pedagogically Optimal Way to Learn Math).
The Illusion of Comprehension: Learners often mistake familiarity with material for true understanding—a phenomenon known as the illusion of comprehension. Just because something feels familiar doesn’t mean the learner can apply it effectively. Combatting this requires practices like retrieval practice and interleaving, which force deeper engagement with the material (Which Cognitive Psychology Findings Are Solid, That Can Be Used to Help Students Learn Better?).

What Pedagogy Research Has Taught Us

Pedagogy research provides practical strategies that align with neuroscience insights, helping us understand how to optimize learning environments:

Deliberate Practice: One of the most well-established findings in educational research is the importance of deliberate practice. Unlike passive or rote learning, deliberate practice focuses on honing specific skills through effortful repetition and immediate feedback. This approach helps students achieve automaticity, where foundational skills become second nature and free up cognitive resources for more complex problem-solving. This is why “deliberate practice” is regarded as the most effective training technique across talent domains (The Pedagogically Optimal Way to Learn Math).
Worked Examples to Reduce Cognitive Load: Especially in subjects like mathematics, worked examples are invaluable for novice learners. By showing step-by-step problem-solving processes, worked examples reduce cognitive load, allowing learners to focus on understanding the process rather than inventing solutions. This strategy is effective in reducing overwhelm, a key barrier to learning (Which Cognitive Psychology Findings Are Solid, That Can Be Used to Help Students Learn Better?).
Active Learning for Deeper Understanding: Research consistently shows that active learning—engaging students in activities like problem-solving, discussion, and teaching others—leads to better retention and understanding than passive learning methods like lectures. However, this active engagement must be paired with direct instruction, especially for novices, to prevent cognitive overload (Why is the EdTech Industry So Damn Soft?).
Interleaving Practice: Interleaving, or mixing different topics or skills within a study session, forces the brain to continually retrieve and apply information, strengthening neural connections. While it may feel harder for learners, this desirable difficulty improves long-term retention and the ability to transfer knowledge to new contexts (Which Cognitive Psychology Findings Are Solid, That Can Be Used to Help Students Learn Better?).

Connecting It All: The Flywheel of Competence, Confidence, and Motivation

When neuroscience and pedagogy principles are applied in tandem, they create a reinforcing cycle that propels students toward continuous growth and mastery:

Competence: Effective learning techniques, such as deliberate practice and retrieval practice, build competence. As learners master fundamental skills, they achieve automaticity, allowing them to perform basic tasks effortlessly, freeing up mental resources for tackling more advanced problems (Automaticity for Cognitive Efficiency).
Confidence: With growing competence comes confidence. When learners see themselves succeeding—whether it’s mastering a math concept or improving in a skill—they are more likely to tackle new challenges with a positive mindset. This confidence feeds into their willingness to engage with difficult tasks (Recreational Mathematics: Why Focus on Projects Over Puzzles).
Motivation: Confidence breeds motivation. As students become more confident in their abilities, they are more driven to continue learning. This motivation reinforces their engagement in deliberate practice, completing the flywheel and leading to greater competence over time. Accountability, whether through structured learning programs or paid educational platforms, also plays a role in keeping learners committed to their goals.

Key points and clarifications from select posts

Recreational Mathematics: Why Focus on Projects Over Puzzles (2 min read)
There’s only so much fun you can have trying to follow another person’s footsteps to arrive at a known solution. There’s only so much confidence you can build from fighting against a problem that someone else has intentionally set up to be well-posed and elegantly solvable if you think about it the right way.
The Situation with AI in STEM Education (11 min read)
The major limitation of LLMs in education is their reliance on student-initiated questions. Effective teachers don’t simply answer questions; they guide students through a structured learning process, scaffolding information and addressing knowledge gaps. LLMs, like ChatGPT, primarily respond to prompts, lacking the pedagogical ability to anticipate a student’s needs or direct their learning path.

The promise of AI in education overemphasizes the role of “explanation”. Scaffolding and learning management are equally important. He cautions against prioritizing AI’s ability to engage in conversational dialogue over its capacity to deliver well-structured, personalized learning experiences.
Optimized, Individualized Spaced Repetition in Hierarchical Knowledge Structures (22 min read)
Theoretical Maximum Learning Efficiency In physics, nothing can travel faster than the speed of light. It is the theoretical maximum speed that any physical object can attain. A universal constant. In the context of spaced repetition, there is an analogous concept: theoretical maximum learning efficiency which posits that in a perfectly encompassed body of knowledge, it’s theoretically possible to achieve mastery through continuously learning new, progressively advanced topics without ever explicitly reviewing old material. This idea, while theoretical, underscores the power of leveraging knowledge interconnectedness.

Importance of Encompassing Graphs (as opposed to prerequisite graphs) : Unlike prerequisite graphs which show learning dependencies, encompassing graphs map how practicing advanced topics reinforces prior knowledge. Constructing these graphs is a laborious, manual process requiring significant domain expertise, highlighting the importance of expert-designed learning pathways.
Talent Development vs Traditional Schooling (12 min read)

Orthogonality of Talent Development and Schooling: Traditional schooling, with its age-based grouping and standardized curricula, often fails to effectively nurture talent. This stark contrast emphasizes the need for specialized approaches outside the traditional classroom setting. Talent development is not only different from schooling, but in many cases completely orthogonal to schooling: “For one portion of our sample, talent development and schooling were almost two separate spheres of their life. … Usually the student made the adjustments, resolving the conflict by doing all that was a part of schooling and then finding the additional time, energy, and resources for talent development. … Mathematicians found and worked through special books and engaged in special projects and programs outside of school. Sometimes the schools or particular teachers made minor adjustments to dissipate the conflict. Mathematicians were sometimes excused from a class they were too advanced for and allowed to work on their own in the library. Sometimes they were accelerated one grade as a concession to their outside learning. … Whether the individual or the school made these adjustments, it was clear that these adjustments minimized conflict but did little to assist in talent development. The individual was able to work at both schooling and talent development, although with minimum interaction. … Talent development and schooling were isolated from one another. Schooling did not assist in talent development, but in these instances it did not interfere with talent development.”

Individualized Instruction in Talent Development: Unlike the group-focused approach of schools, talent development thrives on personalized instruction, tailoring learning tasks to individual needs and ensuring mastery before moving on. This distinction underscores the importance of personalized learning pathways in maximizing potential.

A useful reminder from You Will Never Achieve Your Goals Unless You Transform Yourself Into a Person Who is Capable of Achieving Them:

You want to do something that sets you apart? You’re going to have to work harder than most.

Actually, let’s re-print the entire post:

The #1 confusion that I hear when people ask me about math, ML/AI, startups, etc., is they think there’s a way to achieve outsized success without putting in an outsized amount of work.

You want to do something that sets you apart? You’re going to have to work harder than most. There is no way around it.

You think you can get good at math by watching YouTube videos?

Develop cutting-edge ML/AI by asking ChatGPT to code it up for you?

Put a dent in the universe working 40 hours per week?

If you think any of those things, then you will never achieve your goals because you will never transform yourself into a person who is capable of achieving them.

And guess what? It’s not enough to simply work hard.

To achieve outsized success, it’s critical to not only put in enough time/effort, but also to work productively.

You have to work hard AND work smart.

And furthermore, work in a direction where you have some competitive advantage (or, at least, you’re not at a disadvantage).

Part of this work involves engaging in activities that maximize the likelihood of you getting some lucky breaks.

You have to work to maximize your luck surface area.

I have friend from my college days that always used to say ridiculous catch-phrases with his personal mix of cheekiness and seriousness. Like if you didn’t for the 5th set of squats he’d just dismiss you with “I guess I’ll just take you off my list of successful people for today”.

It was a phrase a bunch of us still parrot to this day in a joking way. Oh you didn’t moisturize after your shower? Off the list. Didn’t drink your coffee black? Just go to bed now and try again tomorrow.

The grindset earns its parody. But we don’t mock mediocrity because it’s suffered enough. We apologize for it readily but rarely our own. But we might bend over backward to apologize for others’ mediocrity. The reasons can range from genuine concern to signaling to grift. I don’t want to paint the motivation with a broad brush. Regardless of the motivation, excessive sympathizing without actually getting your hands dirty is just patronizing. If you care, then help.

A lot of education problems are not education problems so much as just family or stability problems. That can range from abuse to just having parents that are consistently crap decision-makers. Public school, as maligned as it often is, can be a refuge. A chance to get inspired by a great teacher or an authority figure whose influence counteracts the brainworms that might come from home life.

If you come from stability, this is hard to see and might even sound offensive. But kids are not possessions. It’s why we have laws to protect them but you’re free to stick your silverware in the microwave if you want. Where the lines of interference lie are legal matters (and by extension political — it comes with living in a representative government…shrug).

My belief is that education experimentation is good because making progress on instruction is a high-leverage activity. The fact that it is not evenly distributed because the spread is rate-limited by non-educational obstacles is not a reasonable objection to innovation. Of course, you can be skeptical or bearish. Hell, the education world is mass of twisted hot metal. But resignation to extending an uninspiring status quo or accepting low standards is anything but progressive.

[There’s probably some smart-sounding argument that goes “you can’t fix education because you can’t fix society” to which you can only wonder, then what are we even doing here?]

When it comes to teaching and coaching there’s a delicate balance of toughness and love. It’s like parenting to be honest. It’s hard because it often hurts plus its mired in bureaucracy.

On the supply side, great teachers might be scarce because the right mix of tough but fair + smart is just scarce in the population and now we have to choose a subset of those people. If you want great teachers you’re asking for a legion of special individuals. Attracting special individuals requires a special effort to recruit, train, support, and enable. We get what we are willing to pay for. I don’t have a full understanding of the frictions that make our spend inefficient, but addressing them is independent of trying to make inroads with technologies that improve instructional efficiency.

🍰Justin has plenty of criticism on technology by the way: Why is the EdTech Industry So Damn Soft? (11 min read)

If you don’t see technology as being more than incrementally useful (at least on a longer-time scale) then you’ve given up.

Because we aren’t going to get much better without it.

(Collectively at least. The resourceful are going to have robot tutors if they can’t afford human ones. When you get down to it, it’s your move either way.)

Backsolving Your Ride On Earth

We are in the midst of interviewing developers for moontower.ai (link to job description) to join our team of 3 on PT basis. While I’ve probably forgotten more about options than a healthy person should ever learn, I know nothing about company-building. Emi Gal, my co-founder, has built 2 companies. He founded a software biz in while still in college in the late 2000s. His hunch was “serving ads in internet videos” was going to be a big thing. Which sounds ridiculous today, but prescient when YouTube was only 2 years old, streaming was grainy, and smartphones were about to crown from Apple’s fallopian tube. He sold that first company after a decade and is currently the founder and CEO of Ezra where Yinh and I get our full-body MRIs (if you want a discount I know a guy).

When I wrote the Culture of 37signals, I mentioned how when I stumbled upon their manifesto and company handbook it reminded me so much of Emi and his business principles:

Emi’s a huge fan of them. He’s read all their books (and recommended Rework and Shape Up to me) but we also talked about how he came to many of the same conclusions while running his first company. I think that’s why reading the 37signals philosophy conjured Emi so strongly — the focused, can-do, undistracted spirit wrapped in a deep care for a holistic well-being which enables you to be excellent, rather than being at odds with professional commitment & performance.

Today, I’ll share another article we keep pinned in our shared digital workspace.

https://sahillavingia.com/work (8 min read)

Sahil is the founder of Gumroad. This article is an extremely candid look at how he arrived at his business approach. It echoes many of 37Signals’ values. It’s also up front about the drawbacks.

A few of the major points:

Gumroad’s “Freedom at all Costs” model prioritizing flexibility, autonomy, and work-life balance over traditional corporate structures and rapid growth.
- Flexibility and compensation: Gumroad offers competitive hourly rates ranging from $50 to $250, depending on the role. Employees track their hours and invoice weekly. Daniel Vassallo is an entrepreneur who gave them 10 hours a week for $120k/year.
- Minimum viable culture: Gumroad’s culture is intentionally lean, lacking traditional perks and social events, focusing instead on providing flexibility and autonomy.
Emphasis on written communication (37Signals and Amazon are famous for this)
- “Instead of having meetings, people ‘talk’ to each other via GitHub, Notion, and (occasionally) Slack, expecting responses within 24 hours.”
- “Everyone writes well, and writes a lot.”
Potential drawbacks:
- Limited growth opportunities: Gumroad’s structure offers limited traditional career advancement paths.
- Lack of traditional benefits: Gumroad does not offer benefits like healthcare or laptops
- The remote-first, asynchronous nature of work is isolating for many.

The topic of remote vs in-person gets people riled up. People talk their own book. Maybe not as directly as say a Class A office building investor who desperately needs ~~asses in seats~~ tenants but there are just many businesses that rely on large workforces. If you need a large workforce, your hiring standards are going to reflect that. Which means reaching down into mediocrity. To people who need to be babysat in-person. This is not a matter of judging, it’s just reality. Many people aren’t intrinsically motivated. They feel alienated by work that means nothing more than a paycheck. Many, maybe even most, didn’t even have a chance.

[You are very fortunate in this world to become properly matched to how you make a living so I don’t want to sound harsh. The whole topic of matching is deeply important — I think it’s the thing all parents hope for — give your kids lots of exposure to stuff so they have the best chance of matching to what they are good at. It’s not a silver bullet, but the lack of satisfactory matching will be a ball-and-chain for life. It’s not a recipe for thriving.]

My view is so anodyne it’s a stretch to call it a view — every business lies on a spectrum of how critical it is to be in-person or not. For businesses that straddle the line, it’s a matter of trade-offs. If you make them explicit as Sahil does, then your team will be a self-selected group whose preference sliders are in agreement. With more choices on how we can work, individuals can better match to a cadence that fits their personal frontier just like some people prefer startups to big companies. Being well-matched to the cadence and culture of your work environment seems like a key ingredient for loyalty and productivity.

My spicier take is that folks who get triggered by the wider array of work options have hazing mentality. “I had to struggle through crappy options, you should too.” Weird. When I wanted a dishwasher it was a luxury, now that innovation has made it a commodity, I’m…annoyed?

Have you ever sensed that some people get offended that someone else might choose to work less or remotely at the cost of more money or advancement? As if this is a form of entitlement even though this “lazy” person’s preferences still come with a cost. I call it learned helplessness on the part of the triggered. They grinded through a bunch of regretful choices and want to inflict the cruelty on others. To be charitable, I think in their heart, they are mad at their own narrow desire. They feel trapped maybe even duped by them, but they’re so pot-committed that they rationalize that this is the only way.

The greatest freedom is to be easy to please. You know someone who is high maintenance. But then it must follow than you know someone lower maintenance. It follows that there must always be someone less burdened than you by their desires.

There is someone out there who thinks they are outscoring you while you are blissfully unaware of the sport they’re playing. Jordan invented fake rivals tactically. Consciously. Some people create them to protect their egos and choices unconsciously.

My absolute favorite line from Mad Men (and possibly any TV show) : r/madmen

The lyrics to this song have always been goated:

When you get down to it, the labor market is just that — a market. Price, inclusive of concessions in how one can work, comes down to scarcity and bargaining position. Sometimes the rockstar costs a ton of money. Sometimes they’d work for less if they can come and go as they please. When bosses whine about employees, it just sounds like they’re bidding below the market and frustrated they aren’t getting filled.

It’s like moaning about stocks being overpriced so you can’t get a better risk-adjusted passive return. I get it, you want things to be easier. Get in line.

On a related note, Paul Millerd just released his new book: Good Work.

I’ve known Paul for years now. I’ve written quite a bit about his first book Pathless Path which is deeply insightful and personal.

There’s a bit of a backstory to the new book’s subtitle: “reclaiming your inner ambition”. If you follow Paul, you’ll know he pushes back against life scripts in a major way. If you aren’t paying attention, you think he sounds like a slacker urging people not to work. In the book he recounts the call we had back in 2021 (we actually threw the video camera on which I believe was the first time I ever “recorded” for podcast. It was totally off-the-cuff.) I told him what I saw — that he was deeply ambitious. Stopped him in his tracks. He had not thought of himself that way.

I saw someone who was very deliberate about his choices. It takes a lot of nerve to do that. Nerve is ambition. The gall to believe you can get what you want from your time here.

(A lot of what is coded as “ambition” is actually a retreat from ambition. Paper-clip maximization as path of least resistance. Until it hits the ultimate ceiling — finding out its weights have been tuned to a local maxima.

There’s a time to be a hammer and time to navel-gaze. It’s the diabolical explore/exploit trade-off, the one-armed bandit problem, whatever you want to call it. I wish I had the wisdom to toggle between them well. It’s hard to judge even in hindsight nevermind real-time. One of these life-is-indifferent-to-your-desire-for-a-precise-recipe truths. If you are aware that you lean too much towards navel-gazing like I do, or being a hammer then you can likely benefit by consciously compensating.)

Paul went through a health scare in his 20s.

Those have a way of focusing you.

You realize time is limited and everyone is too concerned with themselves to actually care about you. I mean this in a good way — like nobody notices your bad hair day. As Morgan Housel learned as a valet, nobody cares about the person driving a Lambo. All they do is imagine themselves in a Lambo. At best we are props in the stories other people tell.

Once you figure this out, you can get on with the actual business of designing your life. Choosing your priorities instead of compiling the default program loaded by your upbringing or what society has put in front of your face which is neither random nor timeless — they’re just the messages that had enough financial ROI to justify their transmission.

Paul was on a traditional route. The MIT —> consultant —> grad school pipeline. There’s no problem with that unless you equate that with getting an A in life and find yourself disappointed when you discover there’s no grades, there’s no teacher, there’s no gold star. Or worse, that a gold star is BMW that you serve instead of it serving you.

And that’s the point. Is your time on earth serving the stuff you want to serve or serving an appearance in a world where nobody’s paying attention anyway?

Paul just backsolved. He loves travel. Having a family. And control over his compromises. It’s just an equation. Do the work that is valuable enough to sustain what you want that also aligns with your talent.

[Again, the importance of matching — Paul is very smart and can do lots of work that pays super well, but because he focused this ability into designing exactly the life he wants instead of choosing the money-is-pure-optionality-except-I’ll-never-exercise-any-of-the-options-because-I’m-hostage-to-collecting-options trap, he deployed his ability surgically to customize his experience on this grand ride called earth. Health scares remind you that options have expiry dates.]

Paul doesn’t pretend its easy. You must be ruthless about unlearning unexamined desires that come from our greatest but also overrated fear — social rejection. But anything worth something is not easy. You’re gonna bust your ass either way. You shouldn’t question if it’s worth it. If you do, you might be living someone else’s life.

I’ve been messing with Google’s NotebookLM this week.

☕Aside: This NotebookLM thing is pretty cool. You can upload up to 50 links/books/papers/video/documents and it will synthesize briefings and study guides from all the material. You can spar with it. Ask it questions. I fed it a book and asked it to surface all the paradoxes and ironies in the authors’ arguments and it pointed out several thoughtful contradictions. I gave it my Q3 review post from Wednesday and had it turn it into a conversational podcast. It even did a solid job of pronouncing my name. You can just give it Wikipedia articles and have it materialize the content into an audio interview. Andrej did this and uploaded the episodes to Spotify.

https://x.com/karpathy/status/1841594123381571863

Moontower #244

Friends,

When I wrote the Culture of 37signals, I mentioned how when I stumbled upon their manifesto and company handbook it reminded me so much of Emi and his business principles:

Today, I’ll share another article we keep pinned in our shared digital workspace.

https://sahillavingia.com/work (8 min read)

Sahil is the founder of Gumroad. This article is an extremely candid look at how he arrived at his business approach. It echoes many of 37Signals’ values. It’s also up front about the drawbacks.

A few of the major points:

Gumroad’s “Freedom at all Costs” model prioritizing flexibility, autonomy, and work-life balance over traditional corporate structures and rapid growth.
- Flexibility and compensation: Gumroad offers competitive hourly rates ranging from $50 to $250, depending on the role. Employees track their hours and invoice weekly. Daniel Vassallo is an entrepreneur who gave them 10 hours a week for $120k/year.
- Minimum viable culture: Gumroad’s culture is intentionally lean, lacking traditional perks and social events, focusing instead on providing flexibility and autonomy.
Emphasis on written communication (37Signals and Amazon are famous for this)
- “Instead of having meetings, people ‘talk’ to each other via GitHub, Notion, and (occasionally) Slack, expecting responses within 24 hours.”
- “Everyone writes well, and writes a lot.”
Potential drawbacks:
- Limited growth opportunities: Gumroad’s structure offers limited traditional career advancement paths.
- Lack of traditional benefits: Gumroad does not offer benefits like healthcare or laptops
- The remote-first, asynchronous nature of work is isolating for many.

The lyrics to this song have always been goated:

It’s like moaning about stocks being overpriced so you can’t get a better risk-adjusted passive return. I get it, you want things to be easier. Get in line.

On a related note, Paul Millerd just released his new book: Good Work.

I’ve known Paul for years now. I’ve written quite a bit about his first book Pathless Path which is deeply insightful and personal.

I saw someone who was very deliberate about his choices. It takes a lot of nerve to do that. Nerve is ambition. The gall to believe you can get what you want from your time here.

Paul went through a health scare in his 20s.

Those have a way of focusing you.

And that’s the point. Is your time on earth serving the stuff you want to serve or serving an appearance in a world where nobody’s paying attention anyway?

I’ve been messing with Google’s NotebookLM this week.

Money Angle

My friend Taylor sent me this terrific paper:

Investing In The Unknown and Unknowable (2006)
Richard Zeckhauser

It opens with the story of David Ricardo made a fortune buying British government bonds just four days before the Battle of Waterloo, even though he had no special military insight. His success wasn’t based on analyzing the military odds but on understanding market inefficiencies:

Competition was thin
The seller was eager
He bet on the fact that his windfall, if Napoleon lost, would be much greater than what he stood to lose if Napoleon won.

I talked about this idea in my own way on Corey’s podcast. It’s counterintuitive but you can actually have more confidence in your judgement when you are evaluating a hairy situation when you realize it’s unlikely that your counterparty knows more than you do.

The Ricardo example is pretty good analog for why I bought teeny oil puts in early 2020 before the Covid shutdowns. If I’m wrong, it costs a minuscule bid-ask spread, and the put seller doesn’t know anything more than I do about the risk of a pandemic.

If a situation lends itself to analyzing reams of data I’m likely to stay away. I’ll know my effort is substandard to the effort the other side would make in taking the bet. When oil went negative, historical data is not a useful guide. The playing field is more level. It’s my reasoning against someone else. It’s not that I know something, it’s that the disparity about what I can know compared to the counterparty is small and in fact I might have the edge. So if the risk reward is favorable and the logic at worse is a toss-up, then my confidence is higher relative to a typical situation where someone else might have crunched all the permutations. My relative advantage (disadvantage) is higher (lower) in the low-info world.

The paper’s meta-lesson is how important the concept of adverse selection is. If you recall A Jane Street Alum Teaches Trading, Ricki Heicklen makes the case that understanding adverse selection is the most important part of trading. It is the thing that Jane, SIG, etc are obessed with — what is my edge conditional on getting filled? It might still be positive but it’s always less, in mathematical expectation, than a world in which you don’t get filled.

She has a great post with lots of day-to-day examples too:

Toward a Broader Conception of Adverse Selection (10 min read)

The Zeckhauser paper is terrific. I’ll just share this one excerpt with my emphasis:

Let us posit that you are 100% sure that an asset is worth more to you than to the person who holds it, indeed 50% more. But assume that she knows the true value to her, and that it is uniformly distributed on [0,100], that is, her value is equally likely to be 0, 1, 2, … 100. In a famous game due to Bazerman and Samuelson (1983), hereafter BS, you are to make a single bid. She will accept if she gets more than her own value. What should you bid?

When asked in the classroom, typical bids will be 50 or 60, and few will bid as low as 20. Students reason that the item will be worth 50 on average to her, hence 75 to them. They bid to get a tidy profit. The flaw in the reasoning is that the seller will only accept if she will make a profit. Let’s make you the bidder. If you offer 60, she will not sell if her value exceeds 60. This implies that her average value conditional on selling will be 30, which is the value of the average number from 0 to 60. Your expected value will be 1.5 times this amount, or 45. You will lose 15 on average, namely 60-45, when your bid is accepted. It is easy to show that any positive bid loses money in expectation.

The moral of this story is that people, even people in decision analysis and finance classrooms, where these experiments have been run many times, are very poor at taking account of the decisions of people on the other side of the table. There is also a strong tendency to draw the wrong inference from this example, once its details are explained. Many people conclude that you should never deal with someone else who knows the true value, when you know only the distribution. In fact, BS offer an extreme example, almost the equivalent of an optical illusion. You might conclude that when your information is very diffuse and the other side knows for sure, you should not trade even if you have a strong absolute advantage. That conclusion is wrong. For example, if the seller’s true value is uniform on [1,2] and you offer 2, you will buy the object for sure, and its expected value will be 1.5 times 1.5 = 2.25. The difference between this example and the one with the prior on [0,1] is that here the effective information discrepancy is much smaller. To see this, think of a uniform distribution from [100,101]; there is virtually no discrepancy. (In fact, bidding 2 is the optimal bid for the [1,2] example, but that the extreme bid is optimal also should not be generalized.)

The general lesson is that people are naturally very poor at drawing inferences from the fact that there is a willing seller on the other side of the market. Our instincts and early training lead us not to trust the other guy, because his interests so frequently diverge from ours. If someone is trying to convince you that his second-hand car is wondrous, skepticism and valuing your own information highly helps. However, in their study of the heuristics that individuals employ to help them make decisions, Tversky and Kahneman (1974) discovered that individuals tend to extrapolate heuristics from situations where they make sense to those where they do not.

On page 24, Zeckhauser has a more proactive spin reminding us that the problem is symmetrical — it’s rare for the other side to play an optimal strategy, which can offer opportunities for informed investors.

Money Angle For Masochists

An answer to prior reader mailbag question:

❓What’s your take on this—SPX 1M implied correlation trading at the 2nd percentile vs the last three years, while NQ is in the 25th percentile? Meanwhile, SPX IV is in the 30th percentile and NQ is in the 40th. My initial thought is that the component implied vols (IV) are inflated, and we’re seeing realized vol (RV) underperform IV this earnings cycle. Could short SPX vol could be getting hedged with NQ vol, potentially in anticipation of NVDA’s earnings?

I’d back up and start with what you’re measuring. Are your implied correlation numbers stripped of earnings vols? SPX might have a larger proportion of earnings events in the coming month, which could be skewing the numbers. Correlation tends to look lower when earnings are approaching because single-stock vols rise, making it crucial to use base vols (vols stripped of earnings) to get a cleaner estimate.

In practice, models might not account for this as there’s a lot of upstream judgement in cleaning the implied corrs for earnings where traders might prefer to look at dirty metrics and just understand how they can be wrong. Dispersion traders will have a “memory” of high or low dirty correlations around historical earnings seasons and how they played out.

Here’s a terrific interview with fellow vol trader bud Gary Selz that he recorded this summer with Jeff Malec at RCM.

Volatility Vultures: Hunting for Options Talent with Gary Selz of Zero Delta (YouTube)

In this episode of the Derivative we chat with Gary Selz, CIO and Co-Portfolio Manager of Zero Delta Funds. Gary shares his background growing up in Chicago and studying electrical engineering at Northwestern University. He discovered options trading through a financial engineering course and was introduced to a Chicago prop trading firm. Gary discusses his experience training as a new trader at the prop firm. He explains how traders are given time and support to learn before getting their own book to trade. Gary reflects on the diverse career paths that can lead traders to prop shops, from poker players to accountants. The conversation covers Gary’s transition from trading to investing his own money in volatility strategies. This led him to co-found Zero Delta Funds and launch a fund seeking talented volatility traders from across the globe and not always where you’d expect. Gary highlights their process of finding under-the-radar traders internationally and evaluating their sophistication. Gary and Jeff discuss various aspects of options trading, including the evolution of the market landscape. They analyze single stock versus index volatility trading. Gary shares insights on the current opportunity set and speculates on potential future market catalysts. Come join us as we dig deeper into option and vol trading and into the mindset of successful volatility traders.

Stay Groovy

☮️

Moontower Weekly Recap