You Think You’re Trading Vol….But Are You Even?

Option amateurs underappreciate the role of funding in pricing derivatives. Professional options traders need to be obsessed with funding costs because they are trading for tiny, often sub-penny, margins.

Here’s a simple example to demonstrate the tyrannical effect of funding on pricing:

What is a 1-year American at-the-forward call option on a non-div paying, 20% implied vol, $100 stock worth?

You need to feed the model an interest rate to get an answer. You look at the yield curve and see a 5% rate (making this up) for 1 year. This yields a forward price of $105 (we can hand-wave simple vs compounded rates for this purpose).

Imagine the bid-ask for this call is 40 cents wide $7.80 – $8.20

If you buy on the bid and sell on the offer you make a .40 profit. Easy-peasy.

Now imagine you buy the bid and hedge the position until expiry. What implied vol did you buy?

The first thing to recognize is that you will be shorting the stock to hedge. Assuming it’s easy to borrow, you are still not going to receive a 5% rate on the cash proceeds. Your prime broker needs to earn its margin. If 5% is the risk-free rate, let’s assume they pay you 4.5% on cash balances. Conversely, the prime broker will lend at 5.5% (this is known as the “long rate” and it’s the rate you finance long positions at). If you sell the call on the offer you will need to pay that rate to finance the shares you buy.

Uh oh.

If you buy the call you need to use a 4.5% rate in the model to back out an implied vol and if you sell the call you need to use a 5.5% rate in the model. You can see where this is going.

  • If you buy the call on the bid you are paying 20.06% implied vol.
  • If you sell the call on the offer you are selling 19.95% implied vol.

(Check the math if you want)

You think you’re trading vol but because of the bid-ask spread on your funding rate, you are basically trading the same implied vol even if you buy the bid and sell the ask. Rho is the sensitivity of the option price for a 1% change in the interest rate. The vega of an option is the sensitivity of its price for a 1-point change in volatility.

The rho of this call option is 46 cents vs a vega of 40 cents.

A 1% difference in funding rate (ie 4.5% vs 5.5%) is an institutional level bid-ask. It can be much worse for retail.

If you are trying to make markets you think you’re trading vol but are you even?

Pricing and carrying longer-dated options is crucially dependent on funding costs and the bid-ask spreads might not even be wide enough to compensate a market maker for their funding spread. Another way of saying this: the market-maker with such a 1% wide funding rate is making a 20% “choice” market in the vol. If the bid-ask was tighter they would be bidding a higher vol than they were offering!

(Again this assumes they hold and manage the position as opposed to spreading the options off by say buying one call and selling another or having the privileged position of just getting ping-ponged on their posted bid-ask all day)

It’s Not The Merit It’s The Price

My past self makes me cringe.1

I remember a weekend Yinh and I spent in Big Sur before having kids. We stayed at a resort/hotel place for free in exchange for listening to the timeshare spiel. I’m just pushing back on every point, complaining about the math this poor lady on the bottom-of-the-realtor-totem-pole is conveniently ignoring. Looking back, I’m genuinely sorry to have been acting myself in that moment.

When you feel your blood pressure rising you can channel some grace by just thinking of someone you know who would be smooth in that situation. The aspirational move here is just smile and nod. I had the situation exactly backward — it was me who was embarrassing himself, not her with the canned pitch as pushy and nonsensical as it was.

Luckily I have this moon letter thing as an outlet for my teeth-grinding financial complaints. I’m over the timeshare sales thing (well, actually I just pay for a room and save myself the grief. I admit this feels more like a hair dryer solution 2 than addressing the root of my anger) and onto another — I can’t stand when a life insurance salesperson pretends they are doing god’s work by telling me about their widow client’s big settlement. I’m not against buying insurance — I have car insurance and life insurance. But I’m against motte-and-bailey persuasion techniques. If a widow getting paid is deemed a self-congratulatory act of corporate benevolence then Warren Buffet is the priest of puts, a hokey paragon of virtue, backstopping markets with the heart of a patriot. Ok.

Defending life insurance by focusing on the settlements that get paid out is as silly as branding calls sold as income. And for the same reason — there is no consideration of price. Let’s compare:

Defense of insurance: “Look at the settlement the policyholder received. It has so many zeros in it.”

Rebuttal: That would be true even if the insurance cost twice as much. So the issue isn’t whether there would be a settlement it’s the proposition on the whole.

Defense of covered calls: “The premium you collect is extra income, and if the calls go in-the-money you’ll be happy anyway”

Rebuttal: This would be true if I sold the calls for 1/2 the price that I actually sold them for.

In other words, both of these defenses are empty words because they skirt the defining point:

It’s not the merit of the idea — it’s the price.

The wrong price will ruin any proposition. Ideas without prices are worthless. “It’s a good idea to brush your teeth.” But if brushing your teeth took 8 hours a day, you’re better off pulling them all and getting implants.

“It’s a good idea to get insurance” has the invisible qualifier “assuming the price is reasonable”. From there we can debate “reasonable” and we should. But I assure you the percentage of time spent in a life insurance consultation that’s devoted to decomposing its cost is not commensurate to how important it is in the decision.

Money Angle

Let’s harp on this “merit cannot exist independent of price” idea. We’ll return to insurance for a moment.

The griftiness of insurance sales as a function of complexity is an inverted U curve. Term insurance is not complex, it’s highly competitive and low margin. Private placements, which I’ve written about, are sold to very wealthy people who likely have a CFO-type managing their money. It’s the midwit crowd from all ends of the income spectrum that express their snowflake exceptionalism in exactly the wrong place and end up paying for their agents’ kids’ private school tuition.

Many insurance products are complex and seriously difficult to understand — every now and then I’ll take a hard look at one and just think, “they expect the average person to comprehend what’s actually going on inside this black box?!” And of course, the answer is “no”. That’s actually the point.

Here’s a tip — run away if you can’t understand the insurance product better than the salesperson. This is not as high a bar as you think. Salespeople are experts at sales not financial engineering. If they weren’t selling annuities they’d be selling cars or homes. (It’s a blanket statement so there are exceptions — but you know who will agree with me the most? Nerdy advisors who don’t have perfect teeth. This is the old Taleb bit “surgeons shouldn’t look like surgeons”.)

When I look at insurance products, especially structured products, I look for the options embedded in them. The costs for these options is opaque. Many of them have analogs in the listed options markets, but ultimately the ones buried in insurance policies resemble illiquid flex options with long-dated maturities and substantial padding added to their prices. If you wanted to be rigorous about valuing an insurance policy you’d need to know everything from the value of these hidden options to how much credit risk to discount the various issuer’s policies by. Apples-to-apples comparisons are impossible. This de-commoditizes the products giving unscrupulous salepeople ample room to practice their dark art.

An aside about options thinking

I know someone who negotiates and prices leases for commercial office space. They work on huge leases with clients like FAANG. One of the things they mentioned was how they would try to embed provisions in leases which were basically hard-to-price options. The person also spent a couple years with an options market-making group and is generally very quantitative — I would use the person for math help regularly.

I also know of a few wildly successful option traders who did quite well in personal RE investing by structuring options with potential sellers (one of these stories was focused on an ex-colleague of mine which was discussed in a certain big city’s media post-GFC).

And one more related bit — an option manager I know is friends with a fund manager who deals exclusively in the pre-IPO share market. This is a class of funds that provide liquidity to late-stage VC portfolio company employees. The manager was able to help the fund manager by showing them how a particular option embedded in their structures was deeply mispriced.

A final aside on the usefulness of option thinking…in Option Theory As A Pillar Of Decision-Making, I include this:

Getting to The Price

A current example of the need to assess a proposition by understanding its price comes from the boom in covered-call ETFs. Jason Zweig of the WSJ recently published:

Why Investors Are Piling Into Funds That Promise Not to Beat the Stock Market (paywalled)

After great returns last year, covered-call funds are all the rage among income-oriented investors. But their high yields aren’t a free lunch.

The article covers the explosion in AUM in covered-call funds like the JPMorgan Equity Premium Income ETF (JEPI) or Global X Nasdaq 100 Covered Call ETF (QYLD).

These ETFs manage roughly $20B and $6B aum respectively.

We’ll talk about QYLD because its holdings are published while JEPI is a discretionary, actively managed ETF. (But I still want to know who gets to hungry-hungry hippo those option orders!).

QYLD sells covered calls on the Nasdaq 100. That means it sells a call option while owning the underlying index. If you buy 100 shares of QQQ and sell a call option you could do the same thing. That’s not an argument against this product though. Ease is a valid use case for a product.

More background: it sells the 1-month at-the-money call as opposed to out-of-the-money calls which is what people generally think of with covered-call strategies (when I was just a boy they called these “buy-writes” but I haven’t heard that term since Arrested Development was on the air).

I’ve addressed “selling options for income” as euphemistic, sales-led framing. I’m not necessarily opposed to selling options but when you brand it as “income” you are blatantly misrepresenting reality. You are pretending the option premium is income when the bulk of it is just the fair discounted weighted average of a set of possible futures. My bone with the marketing pitch is that there’s no discussion of price. Again, whether this is a good strategy depends on price and the price isn’t static. (I feel like like I’ve force-fed you like foie gras on this topic. If I have to hear about this “strategy” from one more medical professional I hope I better be sedated on an operating table so I can finally drown it out)

When the marketers show me the level of implied correlations they are selling in the calls then we can have a good-faith conversation. Or how about when they tell me who the buyer for those calls is? Because I can assure you there’s no natural buyer — the boys and girls buying those calls are only doing so because they are too cheap. They didn’t wake up in the morning and think “I’m not going to look at prices, I just think owning call options that go to zero is a reasonable way to invest my money.” You know what traders are thinking when they see the marketers pitch: “Thank you for stocking the pond, we’ll be waiting”.

And they will be waiting. Market-makers are lions in the bush who know the dinner’s migration patterns. Unlike lions, they need to be discreet. You can’t just pounce and scare everyone off. You don’t want to make a scene. So they pre-position.

The market-makers’ pre-positioning serves a dual purpose.

  1. It spreads the market impact over a longer window of liquidity. This is actually pro-social — it’s “markets properly working”. The telegraphed order is not as scary even though it’s a large size because the end of it is known and there’s no adverse selection risk. It’s what’s known as a “dumb” or uninformed order. It’s not reasonable to expect zero market impact because unless there’s someone who wants to buy all these options, the pool of greeks need to be absorbed by a get-paid-to-warehouse-risk-in-exhange-for-profit entity. The market is just an auction for that clearing price and the greeks dropped on the market will be recycled in adjacent markets emanating from the original disturbance. (I.e. the market makers will buy vega from you and sell it in some other correlated market where the entire proposition presents an attractive relative value play — it’s just a big web. Market-makers are the silk between the nodes.)
  2. You want the option seller to get filled near the offer so they feel good about the fill. That’s what it means to “not leave a scene”. So now that you are short vol 3 days ahead of the anticipated arrival of the order, knowing that the current vol level incorporates the impact of your own selling, you are ready to buy the new supply “in line”. Remember this is not frontrunning. It’s a probabilistic bet. The market-makers have no fiduciary duty to the fund (as opposed to actual frontrunning where the broker trades ahead of an order they control). Market-makers want the brokers to “feel” like they got a good fill. There are no fingerprints. A TCA that looks at execution price vs arrival price is already benchmarked to a mid-market price that has been faded to absorb the flow.

What does this mean for the cost of something like QYLD?

A napkin math approach


  • At the current AUM, they sell about 5,000 NDX at-the-money call options (equivalent to 200,000 QQQ options) every month.
  • Implied volatility is about 25% so the fund collects 2.89% of the index level 3 in premium monthly. (Can you see how ridiculous it is to call this income? Would you call it income regardless of how little premium it collected? What if the option was in-the-money and they collected the same amount of premium? Conflating premium with income is a timeshare tactic except it’s pushed by corporations who know better not Jane “it’s this job or dogfood for dinner” Doe.
  • The ATM call is pure extrinsic value.

The question is how much vol slippage can we expect on that order. I asked around and a full vol point seems like a reasonable estimate. Because of the “setting the table” pre-positioning effect it’s hard to get a perfect answer. So we’ll use 1 vol point and you can adjust the final analysis by changing it.

If there is 1 full vol click of slippage and the option you sell is pure extrinsic, than you are losing:

1 vol point / 25 vol points x 2.89% of AUM x 12 months in annual slippage.

That’s 139 bps in annual slippage. That needs to added to the 60 bp expense ratio for the fund.

So you are paying 1.99% per year for a beta-like exposure created with vanilla products. And the alleged income is not income. It’s a correctly priced option premium in one of the most liquid equity index markets in the world.

Even if I grant you a 10% VRP (variance-risk-premium is an idea that options are bid beyond their fair value for any number of reasons like convexity-preference, hedging demand, or the possibility that markets allocate prices according to efficient portfolios and single assets being mispriced might not be from a portfolio point-of-view) that means the alleged income is 10% of what the marketers claim.

This whole trend in covered-call ETFs feels more like an innovation for getting paid for commoditized exposures in a fee-compressed landscape than an innovation that actually improves investing outcomes.

An (Overly) Candid Opinion

I’m not some socialist arguing against giving people an abundance of choice. I just want to remind you that no smart-sounding idea gets a free pass without consideration of its cost. And my own wholly personal opinion is you are paying a lot for convenience here. Plus the more AUM these things get the worse the slippage.

A saying I repeat too much: Asset management is the vitamin industry. It sells placebos. It sells noise as signal.

The proliferation of option products seems like something devised by products people not alpha people, a complaint I’d charge against most of the asset management world (which probably means I’m being too harsh but also I’m not criticizing any single firm — I don’t even know anything about these large fund companies because they were not part of my career genealogy. To me, they were always just the names of customers). Another reason I should be softer on all this is that, in aggregate, active management is critical. But there’s a paradox of thrift thing where we should (and this is dark) encourage it for others but not subscribe ourselves.

If you are truly obsessed and love investing then you can figure out your own way and maybe I’m just a faint admonishing voice in the background that you mostly ignore (I do hope I help you think better around the edges at least). But for the casual investor whose targeted by pitches and thinks they are missing out, you are given permission to live FOMO-free. There’s nothing to see except a midwit trap.

[And definitely don’t look at these. Gag me.

Actually, any TSLA options mm wants to gag me for raining on their parade. That should tell you something.]

Using Log Returns And Volatility To Normalize Strike Distances

Basic Review

Consider a $100 stock. In a simple return world, $150 and $50 are each 50% away. They are equidistant. But in compounded return world they are not. $150 is closer. This blog post will progress from an understanding of natural logs to normalizing the distance of asset strikes.

The use of log returns in financial and derivatives modeling is useful because investing contexts usually involve re-investing your capital. In other words, the growth process is multiplicative, not additive. But if it’s multiplicative we find ourselves needing to specify a compounding interval. This is an invitation to attach a cumbersome asterisk to every model.

Logarithms offer an elegant solution — they allow us to standardize an assumption:  returns are continuously compounded.

If you are uncomfortable already, these short primer posts will help you catch up. And don’t worry, we will revisit HS math intuitively in this post before getting to the main course.

  • In Examples Of Comparing Interest Rates With Different Compounding Intervals, we saw how to convert back and forth between simple returns and compounded returns by dividing a holding period into different intervals.
  • In Understanding Log Returnswe showed how log returns are an extreme case of compounded returns — it assumes that compounding occurs continuously. In other words as you divide the holding period into smaller and smaller intervals, you find a rate that is smaller than the growth rate for the entire holding period. If the growth from $1 to $2 is fixed than the more compounding periods there are, the lower the rate must be in order for $1 to end up being $2.

Math Class Made Intuitive

You probably remember hearing about the constant e and the natural log from math class. You also repressed it. Because it was taught poorly.

Understanding e

We’ll turn to

e is NOT just a number!

Describing e as “a constant approximately 2.71828…” is like calling pi “an irrational number, approximately equal to 3.1415…”. Sure, it’s true, but you completely missed the point. Pi is the ratio between circumference and diameter shared by all circles. It is a fundamental ratio inherent in all circles and therefore impacts any calculation of circumference, area, volume, and surface area for circles, spheres, cylinders, and so on.

e is the base rate of growth shared by all continually growing processes. e lets you take a simple growth rate (where all change happens at the end of the year) and find the impact of compound, continuous growth, where every nanosecond (or faster) you are growing just a little bit. 

e shows up whenever systems grow exponentially and continuously: population, radioactive decay, interest calculations, and more.

Just like every number can be considered a scaled version of 1 (the base unit), every circle can be considered a scaled version of the unit circle (radius 1), and every rate of growth can be considered a scaled version of e (unit growth, perfectly compounded).

So e is not an obscure, seemingly random number. e represents the idea that all continually growing systems are scaled versions of a common rate.

Let’s say our basic unit of time is a year.

e is the constant that says “if I start with $1 and continuously compound at a rate of 100%, how much do I end up with…$2.71828”

Understanding the natural logarithm (ln)

It’s true that the natural log is the inverse of an exponential of base e just as logs answer the question “what power do I raise 10 to in order to get to X?”. But defining the natural log as an inverse is circular not intuitive. Again, we turn to BetterExplained. From Demystifying the Natural Logarithm (ln):

The natural log gives you the time needed to reach a certain level of growth.

e and the Natural Log are twins:

ex is the amount we have after starting at 1.0 and growing continuously for x units of time

ln⁡(x) is the time to reach amount x, assuming we grew continuously from 1.0

If e is about growth, the natural log (ln) is about how much time it takes to achieve that growth.

The Natural Log is About Time

    • ex lets us plug in time and get growth.
    • ln(x) lets us plug in growth and get the time it would take.

For example:

    • e3 is 20.08. After 3 units of time, we end up with 20.08 times what we started with.
    • ln⁡(20.08) is about 3. If we want growth of 20.08, we’d wait 3 units of time (again, assuming a 100% continuous growth rate).

Let’s apply e and natural logs to asset returns to understand how to normalize distances.

Normalizing Distance

Let’s return to the $100 stock. We said $150 is closer than $50 in the world of compounding. Let’s assume our growth occurs over 3 years. Here’s a summary of simple returns vs annually compounded returns (or CAGR):

So far so good. The compounded returns are lower than the simple average return. Since log returns are just compounded returns sampled continuously we’d expect them to be even lower.

The total log return is indeed lower than the total simple return.

We can also see that in logspace -50% total return is “further” away than up 50%. This is the first encounter we get with the concept of distance where we see that 50% in either direction is not the same. But by the end of this post, you will learn how to normalize even 2 log returns that look the same, but don’t mean the same thing.

But before that, we will need to complete our understanding of log returns. We saw that the 3-year total log returns are lower than the 3-year total returns. To do that I pose the question:

Can you compute the annualized log returns?

Pattern-matching the computations for average simple returns and CAGR, it appears we have 2 choices respectively:

  1. Total log return / 3or
  2. (1 + Total log return) 1/3 – 1

Remember what e and ln mean in the first place:

The expression ex is a total quantity of growth. It’s actually assumed to be e 1 * x where the 1 represents 100% continuously compounded growth and X represents a unit of time. The natural log or ln(ex) then solves for how much time (ie x) did it take to arrive at the total quantity of growth assuming 100% continuous compounding. 

A key insight is that we don’t need to assume a 100% rate and x to be time. We can simply think of x as the product of “rate multiplied by time”. This allows us to substitute any rate for the assumed rate of 100% to find the time. Once again we turn to BetterExplained:

We can use their logic to return to our question: Can you compute the annualized log returns from these total 3-year  log returns?

Down Case:

log return = -69%

rate x time = -69%

rate x 3 = -69%

The annualized rate must be -23.1%

To annualize log returns, we simply take the total log return and divide by the number of years!

The complete summary table:

All is right in the world…the more compounding intervals we divide the total period into the lower the return must be. Continuous compounding represents the most intervals we can slice the period into and therefore it is the smallest rate.

Recapping so far:

  • Compounded rates are lower than simple rates for the same total return
  • Log returns are convenient measuring sticks because we just assume continuous compounding
  • etells us how much continuously compounded growth we get if we know the time period and rate
  • The natural log can tell us:
    • How much time we needed at a given rate to achieve that egrowth
    • What rate we needed for a given time period to achieve that egrowth

Normalizing Distances For Volatility

Let’s return to the $100 stock and assume continuous compounding. What price on the downside is the equivalent of the stock moving up $20? By now, we understand, the equivalent downside move is less $20 away. Let’s compute the equivalent distances in log space.

ln(120/100) = 18.23%

We solve for a negative 18.23% log return:

ln(x/100) = -18.23%

x/100 = e-18.23%

x = .8333 * 100 = $83.33

If the stock starts at $100 then $120 and $83.33 are equidistant in log space.

We want to take this further. To compare distances, especially in different assets, we want to normalize for volatility.

Volatility is just another word for standard deviation. A 10% log return in BTC means a lot less than a 10% log return in 5-year Treasury notes. We should measure log returns in terms of how many standard deviations away a specified amount of growth is. Note, this is exactly what the concept of a z-score is in statistics. It tells us how far away from the mean a particular observation is.

Let’s stick with our $100 stock and give it a volatility of 18.23%.

  • A 1 standard deviation move to the upside in 1 year is $120
  • A 1 standard deviation move to the downside in 1 year is $83.33

If we define K as a strike price, we can back into a general formula for how far K is from the spot price in terms of standard deviations. Let’s define all our variables first:

K = strike price

S = Spot price

σ = volatility

t = time (in years)

We start with an intuitive expression for a Z-score using our variables:

We can confirm this makes sense with numbers from the previous example. We’ll set t to 1 (ie 1 year) and the Z-score is 1 corresponding to 1 standard deviation:

The formula makes sense. In English, it says “divide the distance in logspace by the annualized volatility scaled to 1 year”.

This simply validated the expression for Z-score. We still want to define any strike price, K, as a function of its volatility and time.

Algebra ensues:

  • If you input a positive volatility number, the formula spits out what a 1 standard deviation up move is.
  • If you input a negative volatility number, the formula spits out what a 1 standard deviation down move is.

If you recall, the big insight from earlier:

The expression ex is a total quantity of growth…we don’t need to assume a 100% rate and x to be time. We can simply think of x as the product of “rate multiplied by time”.

This fact can allow us to decompose the Z-score expression to account for the fact that our underlying stock process has both:

  1. a drift component (option theory uses the risk-free rate for reasons that are beyond this post)
  2. a random component drawn from a distribution defined by a mean (spot + drift) and volatility.

Defining the expressions:

  • Risk-free rate or drift = r
  • The mean of the distribution (aka the “forward”) = Sert
  • The standard deviation scaled to time = σ√t

The Z-score formulas that incorporate drift for 1 standard deviation up and down respectively:

  • Kup = Se(rt + σ√t)
  • Kdown = Se(rt – σ√t)

[The rate in the ex portion is part drift and part random. Why do we combine them with addition instead of multiplication? Because the time portion affects each component differently. We can’t double the variance and halve the time because time also factors into the drift (ie the interest rate)]

Let’s wrap with an example, this time including the drift.

Set r = 5% and t = 1

Fwd = 100e.05 = $105.13

If we are just considering the one standard deviation around the mean (as opposed to a full standard deviation up or down) this is the theoretical stock distribution:

What’s the point of all this?

For anyone within sneezing distance of a derivatives desk, these are rudiments. These computations are the meaning behind the Black Scholes’s z-scores (d1 and d2) and probabilities. These standardizations are critical for comparing vol surfaces. If you can’t contextualize how far a price is you cannot make meaningful comparisons between option volatilities and therefore prices.

If you only trade linear instruments because you are a well-adjusted human then hopefully you still found this lesson helpful. Seeing math from different angles is like filling in the grout in the tiles of your mental processing. You can measure the distance (or accumulated growth, positive or negative) in log space to account for compounding. You can standardize comparisons by using the asset’s vol as a measuring stick. And after all that, if you still don’t enjoy this, you can feel better about your life choices to do work that doesn’t rely on it.

If you do rely on understanding this stuff, hopefully you got e.00995-1 better today.

Understanding Log Returns

If you draw a return a simple return at random from a normal (ie bell curve) distribution and compound it over time, the resultant wealth distribution will be lognormally distributed with the center of mass corresponding to the CAGR return.

Imagine your total 1-year return is 10%. So your terminal wealth is 1.10.

If you compounded monthly to end up at a terminal wealth of 1.10 we can compute the monthly compounding rate as:

1.10 ^ (1/12) = .797% per month or annualized (ie x12) =  9.57% 

Let’s instead compound daily to end up with a terminal wealth of 1.10.

1.10 ^ (1/365) – 1 = .026% or annualized (x365) = 9.53%

The more frequently we compound while keeping the total return the same the lower the compounded rate or average rate that prevails to get us from initial to terminal wealth.

Log returns are returns compounded continuously (as if you were going to compound even more frequently than every single second but at a tiny rate). When we annualize that rate as we did in the prior examples we end up with a log return.

Or simply:

Ln(1.10) = 9.53%

Similar after rounding to just compounding daily.

Let’s say your $1 grows to $1.50 after 1 year, then

  • your simple return is 50%
  • your log return is ln(1.5) = 40.5%

This chart reveals 2 facts:

  1. Log returns are always smaller than simple returns just as compounded returns are lower than simple returns. This makes sense because log returns are just compounding where the interval between compounding is reduced to zero so it takes a lower rate applied more frequently to get to the same total return.
  2. Higher volatility (ie the larger changes) means a wider gap between the simple and log return. Again, reminiscent of the formula relating geometric and arithmetic returns.

The chart raises a question. We know that volatility increases the gap between simple and compounded returns but why is this exacerbated on the downside? There was nothing in the formula (CAGR = Arithmetic Mean – .5 * σ²) that points to any such asymmetry.

The answer lies in an illusion.

In the chart, 1.5 and .5 appear to be equidistant away. They are both 50% away, right?

That’s true…but only in simple terms!

In compounded terms, .50 is “further away” than 1.5.

A thought exercise will make this clear:

If I start at 100 and can only move in increments of 10%, I can get to 150 in 5 moves.

100 * 1.10 * 1.10 * 1.10 *1.10 * 1.10 = 1.61

But on the downside, compounding by a fixed amount means more moves to cover the same absolute distance.

100 * .9⁵ = 59

In fact, I need 2 more moves to “cross” 50. With 7 moves I finally get to 47.8

The chart masks the fact that in logspace .5 is much further than 1.5 and therefore to have moved 50% from the start the volatility (ie the move size) must have been higher. And that’s exactly what the log returns show:

Price Simple Return Logreturn
50 -50% -69%
150 50% 41%

$50 is further away in logspace corresponding to a higher compounded volatility. If the volatility is higher, the gap between the simple and log-returns is wider.

Application to options

The analogy to options is the x-axis in this chart is strike prices because they are absolute distances apart. They are not equidistant apart in logspace!

We make the x-axis equidistant in logspace by making the log returns 10% apart.

Now we can chart the log returns on the x-axis. The distance of each total return from the diagonal shows the divergence between the log returns and simple return. It widens as you expect as we get to larger move sizes, but the chart is more symmetrical because the distance between the “strikes” is now normalized to compounded returns. 

Geometric vs Arithmetic Mean In The Wild


In ‘Well What Did You Expect’? we learned:

  • Mathematical “expectation” is a simple average or arithmetic mean of various outcomes weighted by their probability
  • Arithmetic means are familiar. Your average score in a class is the sum of your test scores divided by the number of tests. If you score 85, 90, 98  your average for the class is:  (85+90+98)/3 = 91

    Note the scores are weighted equally. Here’s what the number sentence looks like without factoring out the 1/3:

    .33 * 85+ .33 * 90 + .33 * 98 = 91

    If the final test is worth 50% of the total grade the weighted average is computed: .25 * 85 + .25 * 90 + .50 * 98  = 92.75

    Whether we are weighting the results equally or not, we are still computing the average by summing, then dividing.

  • Geometric means are like arithmetic means except quantities are multiplied instead of summed. Since investing is the process of earning a return and reinvesting the total proceeds we are multiplying, not summing results. If you invest $100 at 10% for 5 years your final wealth is given by:

    $100 * (1.10) * (1.10) * (1.10) * (1.10) * (1.10)  or simply $100 * (1.10)⁵ = $161.05

    In life, we often know the ending amount and the initial investment but want to know “what was my average growth rate per year?”

    The answer to that question is not the simple arithmetic average but the geometric average because we were re-investing or multiplying our capital each year by some rate. That rate is known as the CAGR or “compound annual growth rate”

    If we start with $100 and have $161.05 after 5 years we compute the geometric average in an analogous way to arithmetic averages, but instead of dividing by the number of years, we take Nth root of our total growth where N is the number of years we compounded for.

    CAGR for 5 years = ($161.05/$100) ^ (1/5) -1 = 10% 

    [we subtract that 1 at the end to remove our starting capital and just have the rate]

  • CAGR vs Simple Average Returns

With investing we are almost always re-investing our capital. That means our capital is being multiplied by a rate from one period to the next. When we want to know the average rate, we really want to pick the geometric average not the arithmetic one (there are other types of averages too like the harmonic average!). We want to compute the CAGR.

As a last proof that the CAGR and simple arithmetic average are different we can revisit the example above. If we compound an initial capital of $100 at 10% per year for 5 years we end up with $161.05 for a total return of 61.05%.

If we compute the simple average:

61.05% / 5 = 12.2%

This is higher than the CAGR of 10%

This is a consistent result. The geometric mean is always lower than the arithmetic mean!

How much lower?

It depends on how volatile the investment is. The reason is intuitive.

Imagine making 50% and losing 50%. The order doesn’t matter. You have net lost 25% of your initial capital.

The formula that relates the arithmetic mean and CAGR:

CAGR = Arithmetic Mean – .5 * σ²


σ = annualized volatility


This Is Not Just Theoretical

I grabbed SP500 total returns by year going from 1926-2023. Here’s what you find:

Simple arithmetic mean of the list: 12.01%

Standard deviation of returns: 19.8%

These are actual sample stats.

What did an investor experience?

If you start with $100 and let it compound over those 97 years, you end up with $1,151,937. 

What’s the CAGR?

CAGR = ($1,151,937 / $100)^(1/97) – 1 

CAGR = 10.12%

These are the actual historical results. An average annual return of 12.01% translated to an investor’s lived experience of compounding their wealth at 10.12% per year. 

Comparing the sample to theory

If you knew in advance that the stock market would increase 12.01% per year and you used the CAGR formula with our sample arithmetic mean return and standard deviation, what compound annual growth rate would you predict?

CAGR = Arithmetic Mean – .5 * σ²

CAGR = 12.01% – .5 * 19.8%²

CAGR = 10.06%

An average arithmetic return of 12.01% at 19.8% vol predicted a CAGR of 10.06% vs an actual result of 10.12%

Not too shabby. 

I used the same parameters to run a simulation where every year you draw a return from a normal distribution with mean 12% and standard deviation of 19.8% and compounded for 97 years.  

I ran it 10,000 times. (Github code — it works but you’ll go blind)

Theoretical expectations

CAGR = median return = mean return .5 * σ²

CAGR = .12 – .5 * .198² = 10.04% 

Median terminal wealth = 100 * (1+ CAGR)^ (N years)

Median terminal wealth = $100 * (1+ .104)^ (97) = $1,072,333

Arithmetic mean wealth = 100 * (1+ mean return)^ (N years)

Arithmetic mean wealth = $100 * (1+ .12)^ (97) = $5,944,950

The sample results from 10,000 sims

The median sample CAGR: 10.19%

The median sample terminal wealth = $1,2255,90

The mean terminal wealth: $5,952,373

Summary Table 

The most salient observation:

The median terminal wealth, the result of compounding, is much less than what simple returns suggest. When you are presented with an opportunity to invest in something with an IRR or expected return of X, your actual return if you keep re-investing will be lower than if you take the simple average of the annual returns.

If the investment is highly volatile…it will be much lower. 

The distribution of terminal wealth

The nice thing about simulating this process 10,000x is we can see the wealth distribution not just the mean and median outcomes.

Remember the assumptions:

  • Drawing a random sample from a normal distribution with a mean of 12% and standard deviation of 19.8%
  • Assume we fully re-invest our returns for 97 years

And our results:

  • The median sample CAGR: 10.19%

  • The median sample terminal wealth = $1,2255,90

  • The mean terminal wealth: $5,952,373

This was the percentile distribution of terminal wealth:

The mean wealth outcome is 5x the median wealth outcome due to a 2% gap between the arithmetic and geometric returns. The geometric return compounded corresponds exactly to the median terminal wealth which is why we use CAGR, a measure that includes the punishing effect of volatility. 

In terms of mathematical expectation, if you lived 10,000 lives, on average your terminal wealth would be nearly $6mm but in the one life you live, the odds of that happening are less than 20%.

The chart was calculated from this table:

Percentile Wealth 97-year CAGR
0.95 $22,323,532 13.5%
0.9 $12,048,311 12.8%
0.85 $7,955,791 12.3%
0.8 $5,601,855 11.9%
0.75 $4,098,451 11.6%
0.7 $3,210,573 11.3%
0.65 $2,480,813 11.0%
0.6 $1,981,453 10.7%
0.55 $1,604,153 10.5%
0.5 $1,275,987 10.2%
0.45 $1,009,583 10.0%
0.4 $804,035 9.7%
0.35 $627,807 9.4%
0.3 $476,756 9.1%
0.25 $357,112 8.8%
0.2 $257,498 8.4%
0.15 $186,552 8.1%
0.1 $115,257 7.5%
0.05 $58,646 6.8%

Note that, also 20% of the time, your $100 compounded for 97 years turns into $257,498 or a CAGR of 8.4%. A result that is 1/5 of the median and 1/20 of the mean. Ouch. 

So when someone says the stock market returns 10% per year because they looked at the average return in the past, realize that after adjusting for volatility and the fact that you will be re-investing your proceeds (a multiplicative process), you should expect something closer to 8% per year. 

And one last thing…you should be able to see how rates of return, when compounded for long periods of time, lead to dramatic differences in wealth. Taxes and fees are percentages of returns or invested assets. Make sure you are spending them on things you can’t get for free (like beta).

A Question I Wonder About

If you draw a return a simple return at random from a normal (ie bell curve) distribution and compound it over time, the resultant wealth distribution will be lognormally distributed with the center of mass corresponding to the CAGR return.

We saw that theory, simulation and reality all agreed. 

Or did they?

The simulation and theory were mechanically tied. I drew a random return from N [μ=12%, σ = 19.8%] and compounded it. But reality also agreed.

It may have been a coincidence. Let me explain. 

Stock market returns are not normally distributed. They are well-understood to differ from normal because they have a heavy fat-left tail and negative skew.

  1. The fat-left tail describes the tendency for returns to exhibit extreme (ie multi-standard deviation) moves more frequently than the volatility would suggest.
  2. Negative skew means that large moves are biased toward the downside.

These scary qualities are counterbalanced by the fact that the stock market goes up more often than it goes down. In the 97-year history I used to compute the stats, positive years outnumbered negative years 71-26 or nearly 3-1. 

The average returns, whichever average you care to look at, is the result of this tug-of-war between scary qualities and a bias toward heads. With the distribution not being a normal bell curve it feels suspicious that the relationship between CAGR and arithmetic mean returns conformed so closely to theory.

I have some intuitions about negative skew (that’s a long overdue post sitting in my drafts that I need to get to) that tell me that in the presence of lots of negative skew, volatility understates risk in a way that would artificially and optically narrow the gap between CAGR and mean return. By extension, I would expect that the measured CAGR of the last 97 years would have been lower relative to the theory’s prediction. 

But we did not see that.

I have 2 ideas why the CAGR was held up as expected, despite non-normal features that should penalize CAGR relative to mean return. 

  1. Path

    In Path: How Compounding Alters Return Distributions, we saw that trending markets actually reduce the volatility tax that causes CAGRs to lag arithmetic returns. It’s the “choppy” market that goes up and down by the same percent that leaves you worse off for letting your capital compound instead of rebalancing back to your original position size. The volatility tax or “variance drain” occurs when the chop happens more than trends (holding volatility constant of course). But since the stock market has gone up nearly 3x as often as it went down perhaps this trend compounding “bonus” offset the punitive negative skew effect on CAGR. 

  2. What negative skew?
      Qty Avg Return St Dev of Returns
    Up years 71 21.3% 12.7%
    Dn years 26 -13.4% 11.4%

    Using annual point-to-point returns, I’m not seeing negative skew. 

I’ve exhausted my bandwidth for this topic so I’ll leave it to the hive. Hit me up with your guesses. 



Well What Did You “Expect”?

Here’s a simple coin flip game. It costs $1 to play.

  • Heads: you get paid an additional $1 (ie 100% return)
  • Tails: you lose $.90

The expectancy of the game is $.05 or 5%.

We compute expectancy:

.5 * $1.00 + .5 * (-$.90)

It’s exactly the same calculation as a weighted average or arithmetic mean. This is a useful computation for many simple one-off decisions. Like should I buy an airline ticket for $1000 or the refundable fare for $1,100?

If there’s a 10% chance I need a refund then the extra $100 saves me $1,100.

10% * $1,100 =$110 which is greater than the $100 surcharge. 9% is my breakeven probability.

It’s tempting to use this logic in investing. Let’s say you expect the stock market to return 7% per year on average for 40 years. Start with $100 and plug in numbers:

$100 * 1.07⁴⁰ = $1497

Yay, you expect to have about 15x your starting capital after 40 years!

Eh. Sort of.

See the word “expect” in math terms and in colloquial terms is a bit different.

If I bet $1 on that coin game I theoretically expect to have $1.05 after 1 trial. In reality, I’m either going to end up with $2 when I double up or $.10 when I lose.

Another example:

I roll a die. If it comes up “1”, I win $600. Otherwise, nothing happens. Theoretically, I expect to win $100:

1/6 * $600 + 5/6 * $0 = $100

But if I asked you what you “expect” to happen if you play this game…you “expect” to win nothing. You only win 1/6 of the time after all.

Back to the investing example.

Investing is not a one-off game. It’s a compounding game where you plow your total capital back into the sausage machine to get that 7%

That’s why we use 1.07⁴⁰.

You are counting on your $100 growing by 1.07 * 1.07 * 1.07…

So that 15x number…that’s mathematical expectancy the same way the dice game is worth $100 or the coin game is worth $1.05 even though those outcomes are never actually experienced.

What you expect to happen in the colloquial sense of the term is the geometric mean. The arithmetic average is a measure of centrality when you sum the results and divide by the number of results. (In our examples you are summing results weighted by their probabilities, but you are still summing). The geometric mean corresponds to the median result of a compounding process. Compounding means “multiplying not summing”. The median is the measure that maps to our colloquial use of “expected” because it’s the 50/50 point of the distribution. That’s the number you plan life around.

The theoretical arithmetic mean result of playing the lotto might be losing 50% of your $2 Powerball ticket (which is another way of saying you are paying 2x what the ticket is mathematically worth). The median result is you lit your cash on fire. You plan your life around the median, especially when it’s far away from the mean. We’ll come back to that.

With investing we are multiplying our results from one year to the next together. The geometric mean is what you actually “expect” in the colloquial sense of the term. The geometric mean is more familiarly known as the CAGR or ‘compound annual growth rate’.

What is the relationship between the arithmetic mean to the geometric mean? This is the same exact question as “what is the relationship of mathematical expectancy and the CAGR?”

It’s an important question since that theoretical arithmetic mean is only expected if we live thousands of lives (actually there are ways to experience the arithmetic mean without relying on reincarnation. This is pleasant news because what good is being rich if you come back a pony.) We want to focus on the CAGR, which is much closer to what we might experience.

It turns out that number is lower.

How much lower? It depends on how volatile the investment is. The formula that relates the arithmetic mean and CAGR:

CAGR = Arithmetic Mean – .5 * σ²


σ = annualized volatility

If an investment earned 7% per year with a standard deviation(ie volatility) of 20% you can estimate the CAGR as follows:

CAGR = .07% – .5 * .20² = .05

In arithmetic expectancy, over 40 years you expect to earn 1.07⁴⁰ = 15. You expect to have 15x’d your money.

But the median outcome, which corresponds to the geometric mean is 1.05⁴⁰ = 7.

7x is much closer to what you “expect” in the colloquial sense of the term. Less than 1/2 the arithmetic expectation!

The formula tells us that the arithmetic and geometric mean (“CAGR”) will diverge by the volatility. And that volatility term is squared…which means the divergence is extremely sensitive to the volatility.

This is a table of CAGRs where you can see the destructive power of volatility:

Why is volatility so impactful on a compounded return?

An easy way to see the impact of high volatility is to imagine making 50% and losing 50%. The order doesn’t matter. You have net lost 25% of your initial capital.

We can compute the geometric mean by weighting each possibility by its frequency in the exponent (in this case the exponents must sum to 2 because that’s the sample space — up and down):

.5¹ x 1.5¹ = .75

Go back to the first game in the post. You invest $1 in a coin game. Heads to make 100%, tails you lose 90%. This game had a positive arithmetic expectancy of 5%.

What is our arithmetic expectancy if you compound (ie re-invest) by playing 2x then the total possibilities are:

HT: 2 x .1 = .2

HH: 2 x 2 = 4

TH: .1 x 2 = .2

TT: .1 x .1 = .01

Since each scenario is equally likely (25% each) the arithmetic expectancy is simply the average = 1.1025

This jives with 1.05² = 1.1025

The average arithmetic return compounds as expected.

But our lived (median) experience is much worse. The median result is .20, a loss of 80%!

We could have seen that by computing the geometric mean:

2¹ x .1¹ = .20

Driving the point home with an extreme example

Consider a super favorable bet.

You roll a die:

  • Any number except a ‘6’: 10x your bet
  • Roll a ‘6’: Lose your entire bet

The arithmetic expectancy is ridiculous.

5/6 x 10 + 1/6 x -1 = 8.167 or ~700% return

But if you keep reinvesting your proceeds in this bet, you will go bust as soon as the 6 comes up. The median experience is a total loss, even though the arithmetic expectancy compounded is wildly positive. If you played this game 20 times in a row you’d [arithmetically] expect to make ~ 700%²⁰.

But you have a 97.4% chance of going broke because you need “not a 6” to come up 20 times in a row = 1 – (5/6)²⁰

That arithmetic expectancy of ~ 700%²⁰ is being driven by the single scenario where the 6 never comes up (that occurs 2.6% of the time). In that case, your p/l is $10²⁰ or between a quintillion and sextillion dollars.

But the geometric mean is 0 because multiplying over the 6 sample spaces:

10⁵ x 0¹ = 0

I chose such extreme examples because nothing illustrates volatility like all-or-nothing bets. The intuition you need to keep is that high volatility means you should expect to lose your money even if the arithmetic expectancy is high.

As soon as you start re-investing (ie compounding) your results are going to be governed by that geometric mean which hates volatility.

For the people who tout lotto ticket investments like crypto or transformative technologies with talks of “asymmetrical upside” or “super positive expectancy” remember even if they might be right, the most likely scenario is they lose all their money on that investment. Even literal lotto tickets can tip into positive expectancy. When that happens how much do you put into it?

Exactly. Not much. Because you know what to expect.

The role of rebalancing and diversification

Investing is not a one-off game. You always re-invest. By re-balancing, you “create” more lives by not concentrating your wealth in a single bucket which swamps the rest of your portfolio as it grows. If you never rebalanced BTC on the way up it would have eventually become nearly 100% of your portfolio and then 2022 happened.

If you don’t ever rebalance you are effectively praying that “not a 6” comes up for the 40 years you are compounding wealth. It’s not as extreme as that because market volatility isn’t as extreme as dice or coins. But the principle holds.

You only get one life so you care about the median. Diversification plus rebalancing gives you the god-perspective of getting to invest a fraction of your wealth into many lives.

Keep in mind — rebalancing is not changing your overall expectancy; it’s changing the distribution of returns by pushing the median return (geometric mean or CAGR) up to your theoretical arithmetic return. This trade-off is not free. If you rebalance you don’t get the 1000x payoff that occurs when a single concentrated position hits 50 heads in a row.

Money Angle For Masochists

Imagine a $100 stock that can either go up or down 25% every year.

It’s 50/50 to be up or down.

Let’s look at the distribution of the stock after 4 years (with the probabilities of each price below it)

Look at the extremes after 4 years:

  • $31.64

    A -25% CAGR over 4 years = cumulative loss of 68%

  • $244.14

    A +25% CAGR over 4 years = cumulative gain of 144%

If you sumproduct every terminal probability by terminal price you get $100. And yet, while the stock is fairly valued at $100, after 4 years, you have lost money in 11/16th of scenarios (~69%). The right tail is driving the fair value of $100 while most paths take the stock lower.

This is the mathematical nature of compounding. The most likely outcomes are lower even if the stock is fairly priced.

In the real world, stocks don’t just flip up and down like coins. The probabilities are not 50/50 and there aren’t just 2 buckets they can rest in from one year to the next. The beauty of option surfaces is they allow us to separate the probabilities from the distance of the buckets (and the number of buckets is continuous…there’s no price the stock is not allowed to go to).

Here’s some homework you can do with the above data:

  1. What’s the value of the 4-year $146.68 strike call worth?1
  2. What’s the value of the 4-year $75 strike put? 2
  3. How about the 4-year $125 call? 3

Bonus Questions

Imagine this stock is an ETF and there’s a 2x levered version (which means it’s 2x as volatile) of it.

  • What strike call on the levered ETF is equivalent to the $146.48 strike on the unlevered ETF?4(Hint: It’s further than $46.48 OTM)
  • What’s the value of the call at that strike? 5
  • If I was a market-maker and I got lifted at fair value on the 2x levered ETF 4-year 200 strike call and I go buy the regular ETF 150 4-year 150 calls to cover my risk how many do I need to buy to be perfectly hedged? (Assume you can buy them for what they’re worth…you have enough information to compute their fair value). 6

If you got through this then you have a new appreciation for how far certain prices are from a spot price and how it depends on time and volatility!

Starting from basics like the volatility tax, progressing to how path influences the volatility tax (trends are more like a volatility rebate and choppiness is a tax….the ratio of trend to chop will determine the ultimate cost of the volatility), and finally bridging these concepts to Black Scholes this series will take your understanding of compounding and how returns work to a deeper level.

  1. The Volatility Drain
  2. Path: How Compounding Alters Return Distributions
    [Between this post and the bonus questions you can start to see why pricing OTM options on levered ETFs given a liquid options market on the unlevered version is an application of these concepts]
  3. Solving A Compounding Riddle With Black-Scholes

Shout Out To Matt Hollerbach

Despite trading options for nearly 20 years at the time, it wasn’t until 2019 that I thought really hard about compounding. I knew how to manipulate formulas and how it related to options but it wasn’t until I discovered Matt’s work that I started to see it from a new angle. Matt makes it approachable and builds up insights in small steps. His blog inspired mine, especially many of my earlier posts. The entire blog is worth spending time working through. It’s similar to what I’ve said about gambling — it’s a place where you will learn how to think about risk and return far better than what finance texts will teach.

These are all-time great ones:

Trend Following is Hot Air

Investing Games

Solving the Equity Premium Puzzle, and Uncovering a Huge Flaw in Investment Theory

It’s painful to watch the median (or should I say average) “investor” reason about how markets work because without these intuitions (you don’t need to know formulas necessarily) you are innumerate. That’s like being illiterate but for like numbers and stuff. And the deficiency is as obvious as illiteracy is to a literate person.

The good news is we can all get better.

Understanding Implied Forwards

These are not trick questions:

Suppose you have an 85 average on the first 4 tests of the semester. There’s one test left. All tests have an equal value in your final score. You need a 90 average for an A in the class.

What do you need on the last test to get an A in the class?

What is the maximum score you can get for the semester?

If you are comfortable with the math you have the prerequisites required to learn about a useful finance topic — implied forwards!

Implied forwards can help you:

  • find trading opportunities
  • understand arbitrage and its limits

We’ll start in the world of interest rates.

The Murkiness Of Comparing Rates Of Different Maturities

Consider 2 zero-coupon bonds. One that matures in 11 months and one that matures in 12 months. They both mature to $100.

Scenario A: The 11-month bond is trading for $92 and the 12-month bond is trading for $90.

What are the annualized yields of these bonds if we assume continuous compounding?1
Computing the 12-month yield

r = ln($100/$90) = 10.54%
Computing the 11-month yield

r = ln($100/$92) * 12/11 = 9.10%

This is an ascending yield curve. You are compensated with a higher interest rate for tying up your money for a longer period of time.

But it is very steep.

You are picking up 140 extra basis points of interest for just one extra month.

Let’s do another example.

Scenario B: We’ll keep the 12-month bond at $90 but say the 11-month bond is trading for only $91.
Computing the 11-month yield

r = ln($100/$91) * 12/11 = 10.29%

So now the 11-month bond yields 10.29% and the 12-month bond yields 10.54%

You still get paid more for taking extra time risk but maybe it looks more reasonable. It’s kind of hard to reason about 25 bps for an extra month. It’s murky.

Think back to the test score question this post opened with. There is another way of looking at this if we use a familiar concept — the weighted average.

The Implied Forward Interest Rate

We can think of the 12-month rate as the average rate over all the intervals. Just like a final grade is an average of the individual tests.

We can decompose the 12-month rate into the average of an 11-month rate plus a month-11 to month-12 forward rate:

“12-month” rate = “11-month” rate + “11 to 12-month” forward rate

Let’s return to scenario A:

12-month rate = 10.54%

11-month rate = 9.1%
Compute the “11 to 12-month” forward rate like a weighted average:

10.54% x 12 = 9.1% x 11 + Forward Rate11-12 x 1

Forward Rate11-12 = 26.37%

We knew that 140 bps was a steep premium for one month but when you explicitly compute the forward you realize just how obnoxious it really is.
How about scenario B:

12-month rate = 10.54%

11-month rate = 10.29%
Compute the “11 to 12-month” forward rate like a weighted average:

10.54% x 12 = 10.29% x 11 + Forward Rate11-12 x 1

Forward Rate11-12 = 13.26%

Arbitraging The Forward Rate (Sort Of)

It’s common to have a dashboard that shows term structures. But the slopes between months can be optically underwhelming with such a view. Seeing that the implied forward rate is 13.26% feels more profound than seeing a 25 bps difference between month 11 and month 12.

You may be thinking, “this forward rate is a cute spreadsheet trick, but it’s not a rate that exists in the market.”

Let’s take a walk through a trade and see if we can find this rate in the wild.

The first step is just to ground ourselves in a basic example before we understand what it means to capture some insane forward rate.

Consider a flat-term structure:

[Note: the forward rate should be 10.54% but because I’m computing YTM on a bond price that only goes to 2 decimal places we are getting an artifact. It’s immaterial for these demonstrations]

Now let’s look back at the steep term structure from scenario A:

With an 11-month rate of 9.10% and a 12-month rate of 10.54% we want to borrow at the shorter-term rate and lend at the longer-term rate. That means selling the nearer bond and buying the longer bond.

When you study asset pricing, one of the early lessons is to step through the cash flows. This is the basis of arbitrage pricing theory (APT), a way of thinking about asset values according to their arbitrage or boundary conditions. As opposed to other pricing models, for example CAPM, someone using APT says the price of an asset is X because if it weren’t there would be free money in the world. By walking through the cash flows, they would then show you the free money2. The fair APT price is the one for which there is no free money.

Stepping Thru The Cash Flows

Let’s see how this works:

  1. We short the 11-month bond at $92
  2. We buy 1.022 12-month bonds for $90. We can buy 1.022 of the cheaper bonds from the proceeds of selling the more expensive $92 bond. The net cash flow or outlay is $0.
  3. Spend the next 11 months surfing.

At the 11-month maturity

We will need $100 to pay the bondholder of the 11-month bond so we sell 12-month bonds.

But for what price?

Well, let’s say the prevailing 1-month interest rate matched the rates we were seeing in the flat term structure world of 10.49%, the rate implied by the 11-12 month forward when we initiated the trade.

In that case, the bonds we own are worth $99.13.

[With one month to maturity we compute the continuous YTM: ln(100/99.13) * 12 = 10.49%]

If we sell 1.009 of our bonds at $99.13 we can raise the $100 to pay back the loan. We are left with .0134 bonds.
At the 12-month maturity

Our stub of .0134 bonds mature and we are left with $1.34.

So what was our net return?

Hmm, lemme think, carry the one, uh — infinite!

We did a zero cash flow trade at the beginning. We didn’t lay out any money and ended with $1.34.

That’s what happens when you effectively shorted a 26.37% forward rate but the one-month rate has rolled down to something normal, in this case about 10.50%

[In real life there is all kind of frictions — you know like, collateral when you short bonds.]

Summary table:

What if somehow, that crazy 26.37% “11-12 month forward rate” didn’t roll down to a reasonable spot rate but actually turned out to be a perfect prediction of what the 1-month rate would be in 11 months?

Let’s skip straight to the summary table.

Note the big difference in this scenario: the bond with 1 month remaining until maturity is only worth $97.83 (corresponding to that 26.33% yield, ignore small rounding). So you need to sell all 1.022 of the bonds to raise $100 to pay back the loan.

Besides frictions, you can see why this is definitely not an arbitrage — if the 1-month rate spiked even higher than 26.33% the price of the bonds would be lower than $97.83. You would have sold all 1.022 of your bonds and still not been able to repay the $100 you owe!

So the “borrow short, lend long” trade is effectively a way to short a 1-month forward at 26.33%. It might be a good trade but it’s not free money.

Still, this exercise shows how our measure of the forward is a tradeable level!

[If you went through the much more arduous task of adjusting for all the real-world frictions and costs you would impute a forward rate that better matched what you considered to be a “tradeable price”. The principle is the same, the details will vary. I was not a fixed-income trader and own all the errors readers discover.]

The Implied Forward Implied Volatility

Now you’re warmed up.

Like interest rates, implied volatilities have a term structure. Every pair of expiries has an implied forward volatility. The principle is the same. The math is almost the same.

With interest rates we were able to do the weighted average calculation by multiplying the rates by the number of days or fraction of the year. That’s because there is a linear relationship between time and rates. If you have an un-annualized 6-month rate, you simply double it to find the annualized rate. You can’t do that with volatility.3

The solution is simple. Just square all the implied volatility inputs so they are variances. Variance is proportional to time so you can safely multiply variance by the number of days. Take the square root of your forward variance to turn it back into a forward volatility.

Consider the following hypothetical at-the-money volatilities for BTC:

Expiry1 Expiry 2
Implied Vol 40% 42%
Variance (Vol2) .16 .1764
Time to Expiry (in days) 20 30

Let’s compute the 20-to-30 day implied forward volatility. We follow the same pattern as the weighted test averages and weighted interest rate examples.

The decomposition where DTE = “days to expiry”:

“variance for 30 days” = “variance for 20 days” + “variance from day 20 to 30”

Expiry2 variance * DTEExpiry2 = Expiry1 variance * DTEexpiry1 + Forward variance20-30 * Days20-30

Re-arrange for forward variance:

Fwd Variance20-30 = (Expiry2 variance * DTEExpiry2 – Expiry1 variance * DTEexpiry1) / Days20-30

Fwd Variance20-30 = (.1764 * 30 – .16 * 20) / 10

Fwd Variance20-30 = .2092

Turning variance back into volatility:

√.2092 = 45.7%

If the 20-day option implies 40% vol and the 30-day option implies 42% vol, then it makes sense that the vol between 20 and 30 days must be higher than 42%. The 30-day volatility includes 42% vol for 20 days, so the time contained in the 30-day option that DOES NOT overlap with the 20-day option must be high enough to pull the entire 30-day vol up.

This works in reverse as well. If the 30-day implied volatility were lower than the 20-day vol, then the 20-30 day forward vol would need to be lower than the 30-day volatility.

The Arbitrage Lower Bound of a Calendar Spread

The fact that the second expiry includes the first expiry creates an arbitrage condition (at least in equities). An American-style time spread cannot be worth less than 0. In other words, a 50 strike call with 30 days to expiry cannot be worth less than a 50 strike call with 20 days to expiry.

Here’s a little experiment (use ATM options, it will not work if the options are far OTM and therefore have no vega):

Pull up an options calculator where you make a time spread worth 0.

I punched in a 9-day ATM call at 39.6% vol and a 16-day ATM call at 29.70001% vol. These options are worth the same (for the $50 strike ATM they are both worth $1.24).

Now compute the implied forward vol.

Expiry1 Expiry 2
Implied Vol 39.6% 29.70001%
Variance (Vol2) .157 .088
Time to Expiry (in days) 9 16

You can predict what happens when we weight the variance by days:

Expiry1 = .157 * 9 = 1.411

Expiry2 = .088 * 16 = 1.411

Expiry 2 has the same total variance as Expiry 1 which means there is zero implied variance between day 9 and day 16.

The square root of zero is zero. That’s an implied forward volatility of zero!

A possible interpretation of zero implied forward vol:

The market expects a cash takeover of this stock to close no later than day 9 with 100% probability.

A Simple Tool To Build

With a list of expirations and corresponding ATM volatility, you can construct your own forward implied volatility matrix:


Like the interest rate forward example, there’s no arbitrage in trying to isolate the forward volatility unless you can buy a time spread for zero.4

For most of the past decade, implied volatility term structures have been ascending (or “contango” for readers who once donned a NYMEX or CBOT badge). If you sell a fat-looking time spread you have a couple major “gotchas” to contend with:

  1. Weighting the trade
    If you are short a 1-to-1 time spread you are short both vega, long gamma, paying theta. This is not inherently good or bad. But you need a framework for choosing which risks you want and at what price (that statement is basically the bumper sticker definition of trading imbued simultaneously with truth and banality). If you want to bet on the time spread narrowing, ie the forward vol declining, then you need to ratio the trades. The end of Moontower On Gamma discusses that. Even then, you still have problems with path-dependence because the gamma profile of the spread will change as soon as the underlying moves. The reason people trade variance swaps is that the gamma profile of the structure is constant over a wide range of strikes providing even exposure to the realized volatility. Sure you could implement a time spread with variance swaps, but you get into idiosyncratic issues such as bilateral credit risk and greater slippage.
  2. The bet, like the interest rate bet, comes down to what the longer-dated instrument does outright.You were trying to isolate the forward vol, but as time passes your net vega grows until eventually the front month expires and you are left with a naked vol position in the longer-dated expiry and your gamma flips from highly positive to negative (assuming the strikes were still near the money).

Term structure bets are usually not described as bets on forward volatility bets but more in the context of harvesting a term premium as time passes and implied vols “roll down the term structure”. This is a totally reasonable way to think of it, but using an implied forward vol matrix is another way to measure term premiums.

The Wider Lessons


Forwards vols represent another way to study term structures. Since term structures can shift, slope, and twist you can make bets on the specific movements using outright vega, time spreads, and time butterflies respectively. A tool to measure forward vols is a thermometer in a doctor’s bag. How do we conceptually situate such tools in the greater context of diagnosis and treatment?

Here’s my personal approach. Recognize that there are many ways to skin a cat, this is my own.

  1. I use dashboards with cross-sectional analysis as the top of an “opportunity funnel”. You could use highly liquid instruments to calibrate to a fair pricing of parameters (skew, IV risk premium, term premium, wing pricing, etc) in the world at any one point in time. This is not trivial and why I emphasize that trading is more about measurement than prediction. To compare parameters you need to normalize across asset types.
    To demonstrate just how challenging this is, an interview question I might ask is:

    Price a 12-month option on an ETF that holds a rolling front-month contract on the price of WTI crude oil5

    I wouldn’t need the answer to be bullseye accurate. I’m looking for the person’s understanding of arbitrage-pricing theory which is fundamental to being able to normalize comparisons between financial instruments. The answer to the question requires a practical understanding of replicating portfolios, walking through the time steps of a trade, and computing implied forward vols on assets with multiple underlyers. (Beyond pricing, actually trading such a derivative requires understanding the differences in flows between SEC and CFTC-governed markets and who the bridges between them are.)

  2. The contracts or asset classes that “stick out” become a list of candidates for research. There are 2 broad steps for this research.
    • Do these “mispriced” parameters reveal an opportunity or just a shortcoming in your normalization?
      Sleuthing the answer to that may be as simple as reading something publically available or could require talking to brokers or exchanges to see if there’s something you are missing. If you are satisfied to a degree of certainty commensurate with the edge in the opportunity that you are not missing anything crucial, then you can move to the next stage of investigation.
    • Understanding the flow
      What flow is causing the mispricing? What’s the motivation for the flow? Is it early enough to bet with it? Is it late enough to bet against it? You don’t want to trade the first piece of a large order but you will not get to trade the last piece either (that piece will be either be fed to the people who got hurt trading with the flow too early as a favor from the broker who ran them over — trading is a tit-for-tat iterated game, or internalized by the bank who controls the flow and knows the end is near.)

3. Execute

Suppose you determine that the term structure is too cheap compared to a “fair term structure” as triangulated by an ensemble of cross-sectional measurements. Perhaps, there is a large oil refiner selling gasoline calls to hedge their inventory (like covered calls in the energy world). You can use the forward vol matrix to drill down to the expiry you want to buy. “Ah, the 9-month contract looks like the best value according to the matrix. Let’s pull up a montage and see if it’s really there. Let’s see what the open interest is?…”

As you examine quotes from the screens or brokers, you may discover that the tool is just picking up a stale bid/ask or wide market, and that the cheapest term isn’t really liquid or tradeable. This isn’t a problem with the tool, it’s just a routine data screening pitfall. The point is that tools of this nature can help you optimize your trade expression in the later stage of the funnel.


This discussion of forward vols was like month 1 learning at SIG. It’s foundational. It’s also table stakes. Every pro understands it. I’m not giving away trade secrets. I am not some EMH maxi6 but I’ll say I’ve been more impressed than not at how often I’ll explore some opportunity and be discouraged to know that the market has already figured it out. The thing that looks mispriced often just has features that are overlooked by my model. This doesn’t become apparent until you dig further, or until you put on a trade only to get bloodied by something you didn’t account for as a particular path unfolds.

This may sound so negative that you may wonder why I even bother writing about this on the internet. Most people are so far out of their depth, is this even useful? My answer is a confident “yes” if you can learn the right lesson from it:

There is no silver bullet. Successful trading is the sum of doing many small things correctly including reasoning. Understanding arbitrage-pricing principles is a prerequisite for establishing what is baked into any price. Only from that vantage point can one then reason about why something might be priced in a way that doesn’t make sense and whether that’s an opportunity or a trap7. By slowly transforming your mind to one that compares any trade idea with its arbitrage-free boundary conditions or replicating portfolio/strategy, you develop an evergreen lens to ever-changing markets.

You may only gain or handle one small insight from these posts. But don’t be discouraged. Understanding is like antivenom. It takes a lot of cost and effort to produce a small amount8. If you enjoy this process despite its difficulty then it’s a craft you can pursue for intellectual rewards and profit.

If profit is your only motivation, at least you know what you’re up against.

Examples Of Comparing Interest Rates With Different Compounding Intervals

Simple Interest

If you pay someone $90 today and they promise to give you $100 in 12 months, you are making a loan. This is the same idea as buying a bond. To back out the simple interest rate denoted (ie assuming no compounding) we solve for r:

90 * (1+r) = 100

r = 100/90 – 1

r = 11.11%


If the loan was only for 6 months, then we’d annualize the interest rate by multiplying by 2 (12 months / 6 months) for a rate of 22.22%

Compound Interest

Let’s return to the 12-month loan and say that the rate is compounded semi-annually. Then the computation is:

90 * (1+r/2)² = 100

r /2 = (100/90).5 – 1

r = 10.82%

If you compound more frequently than annually, it makes sense that the implied interest rate is lower. Consider the path of the principal + accrued interest:

Compounding semi-annually means interest gets credited at the 6-month mark. So the rate for the next 6 months is being applied to the higher accrued value amount which means the implied rate to end up at $100 (the same way the simple interest case ends up $100) must be lower than the simple interest case.

Continuous Interest

We can compound interest more frequently. Quarterly, monthly, daily. Since the number we are backing out, namely the implied rate, is being applied to a growing basket of principal + accrued interest at each checkpoint (I think of the compounding interval as a checkpoint where the accrued interest is rolled into the remaining loan balance), the implied rate to end up at $100 must be smaller. If we take this logic to the extreme and keep cutting the time interval into smaller increments we eventually hit the limit of Δt → 0. The derivatives world models everything in continuous time finance so interest rates get the same treatment.

Mechanically, the math is no harder.

To compute the continuously compounded interest rate we still just solve for r:

90 * ert = 100

t is a fraction of a year. So for the 12-month case:

90 * er*1 = 100

er = 100/90

r ln(e) = ln(100/90)

r = 10.54%

As expected, this is a lower implied rate than the 11.11% simple rate and the 10.82% semi-annual rate. Again, because we are compounding continuously.

Annualizing remains easy. If $90 grows to $100 in just 6 months, we compute the continuously compounded rate as follow:

90 * er*1/2 = 100

er*1/2 = 100/90

r *1/2 = ln(100/90)

r = 21.07%

This can be contrasted with the 22.22% 6-month loan using simple interest we computed earlier.

Application To Real Life

Note in all these cases, $90 is growing to $100. We are just seeing that the implied rate depends on the compounding assumption. In real life, when you see “compounded daily” or “compounded monthly” and so on, you are now equipped with the tools to compare rates on an apples-to-apples basis. If a rate is lower but compounds more frequently than another rate the relative value between both loans is ambiguous.

APYs disclosed on financial products make yields comparable. But now you understand how APYs convert different rate schedules into a single measure.

An Example Of Using Probability To Build An Intuition For Correlation

The power of negative correlations is powerful when you see how rebalancing increases your expected compounded return. This isn’t intuitive to a typical, especially retail investor.

I’ve tried to make it easier to understand:

One of my favorite finance educators recently wrote an absolute must-read thread on this topic.

He creates a model with 2 simplifying features:

  • There are only 2 stocks
  • They are rebalanced to equal weight

You can use the intuition from this exercise to guide your portfolio thinking more broadly. It’s beautifully done and you should work through it carefully not just for the intuition but the practical knowledge of how to compute an expected return in a compounding context. However, there is a part I struggled with that I want to zoom in on because I’ve never before seen it presented as @10kdiver does it:

He converts probability to an estimate of correlation!

This is really cool. But because I struggled and the learnings of the thread are both important I dual purpose to writing this post.

  1. The meta-lesson

    This is the easy one:

    When I read the post, it was easy to nod along thinking “yep, that makes sense…ok, ok, got it”. Except for that, I don’t “got it”. I couldn’t reconstruct the logic on my own on a blank sheet of paper which means I didn’t learn it. Paradoxically, this demonstrates how good @10diver’s explanation was. Extrapolate this paradox to many things you think you learned by reading and you will have internalized a useful life lesson — get your hands dirty to actually learn.

  2. Diving into the probability math I struggled with.

    Let’s do it…

Zooming In: The Probability Basis For Correlation


Example computation for CAGR (also seen in tweet #4):

CAGR_A = =((1+A_up_size)^(A_prob_up*hold_period)*(1+A_down_size)^(A_prob_down*hold_period))^(1/hold_period)-1

Define the probability space

We are focusing on tweets 6-10 in particular. The summary matrix:

Understanding the boxes:

Start with the logic: “what would the probability space look like if they were perfectly correlated?”

  • Top left box = X (This corresponds to both up)

They would go up together 80% of the time if they were perfectly correlated. We generalize “probability of stocks up together as X”

  • Top right box = .8-X (This corresponds to B up, A down)

Since stock B goes up 80% of the time we know its probability of going down is .8-X

  • Bottom left box = .8 – X  (This corresponds to A up, B down)

Since stock A goes up 80% of the time we know its probability of going down is .8-X

  • Bottom right box = X – .6 (This corresponds to both down)

With one box left it’s easy, we know all the boxes must sum to 100% probability.

100% – [X + 80% -X + 80% – X] = X – .6

We called the probability of moving up together X. We set the matrix up using the simple case of the stocks being perfectly correlated (ie moving up together 80% of the time). But they don’t need to be perfectly correlated. So now we can find the range of X, a joint probability, that is internally consistent with each stock’s individual probability of going up.

What is the probability range of X ie “how often the stocks move together”?

Upper bound

X is defined as “how often they move up together”. Another way to think of this:  the upper bound of the joint probability is the lower bound of how often either stock goes up.

Let’s change the numbers and pretend stock A goes up 50% of the time and stock B goes up 80% of the time. Then 50% is the upper bound of how often they can both up together. (Stock A is the limiting reagent here, it can’t move up more than 50% of the time). So the minimum of their “up” probabilities represents an upper bound on X.

Back to the original example, the upper bound of how often these stocks move together is 80% because the minimum of either stock’s individual probability of going up is 80%. Mathematically this is

.8 – X > 0 so:

Upper bound of X = 80%

Lower bound

Proceeding with the logic that no box can be negative, the bottom right box cannot be less than 60%. This represents the least co-movement possible given the stocks’ probabilities.

Lower bound of X = 60%

Think of it this way, if there were 10 trials each stock could have 2 down years. If they were maximally correlated the stocks would share the same down 2 down years. If they were minimally correlated they would never go down at the same time. The probability of both stocks going down simultaneously would be zero, but since the 4 down years would be spread out over 10 years, the pair of stocks would only go up simultaneously 60% of the time.


The probability of the stocks moving together, X, is bounded as:

60% < X < 80%

X is not a correlation. X is a probability. The fact that the stocks can co-move from 60-80% of the time maps to a correlation.

A Key Insight

A zero correlation means 2 variables are independent! If they are independent, the joint probability is a simple product of their individual probabilities.

That’s why the 0 correlation point corresponds to 64%:

X = .8 x .8 = 64%

Loosely Mapping Probability to Correlation

If you’re feeling spry, you can use the probability space and covariance math to compute the actual correlation. But, we can estimate the rough shape of the correlation using zero correlation (statistical independence corresponding to X = 64%, the joint probability of both stocks going up together) as the fulcrum.

Look back at tweet #10 to see the extremes:

At the lowest correlation, corresponding to a co-movement of 60% frequency:

  • The correlation is slightly negative. It’s below the 64% independence point.
  • The stocks NEVER go down together.
  • The stocks move in opposite directions 40% of the time
  • When the stocks do move together, it’s up.
  • The stocks have a negative correlation despite being up together 60% of the time.

At the highest correlation point, corresponding to 80% frequency of co-movement:

  • The stocks go up 80% of the time together
  • They go down 20% of the time together
  • They never move in opposite directions.
  • The magnitude of the max positive correlation is greater than the magnitude of the maximum negative correlation since the independence point is near the lower end of the range.

Rebalancing Benefits Improve As Correlations Fall

The thread heats up again in tweet #17 by identifying the possible values of the portfolio rebalanced to 50/50 at the end of a year.

In tweet #18, those states are weighted by the probabilities to generate expected values of the portfolio, which can finally be used to compute the CAGR of the portfolio if rebalanced annually.

The lower the value of X (the joint probability of the stocks moving up together), the lower the correlation.

The lower the correlation, the higher the expected value of a rebalanced portfolio.

The remainder of the thread speaks for itself:

  • When X = 60% (ie, strongly negative correlation), we have:
    • Without re-balancing: $1 –> $5.94
    • With re-balancing: $1 –> $17.85 (>3x as much!), over the same 25 years.
    • Thus, negative correlations + re-balancing can be a powerful combination.

  • If we do this well, our portfolio can end up getting us a HIGHER return than any single stock in it! We just saw an example with 2 stocks. Each got us only ~7.39%. But a 50/50 re-balanced portfolio of them got us ~12.22%. When I first saw this, I couldn’t believe it!

    [Moontower note: in practice, portfolios usually have many names and a variety of weighting schemes. While the intuition is similar the math is more complex and you are now looking at a matrix of pairwise correlations, assets with varying volatilities and therefore different weights in the portfolio]

  • This is the ESSENCE of diversification. We minimize correlations, so our portfolio nearly always has both risen and fallen stocks. We “cash in” on this gap via re-balancing — ie, we periodically sell over-valued stocks and put the money into under-valued ones.

  • Negative correlations aren’t strictly necessary. We could use stocks with zero — or even positive — correlation. But the MORE heavily correlated our stocks, the LESS “bang for the buck” we get from re-balancing.

Wrapping Up

The idea that low or negative correlations improve with falling correlations is common knowledge in professional circles. Still, the intuition is elusive. The sheer size of the effect on total CAGR is shocking.

Until @10kdriver’s thread, I hadn’t seen a mapping from probability which is intuitive to correlation which is fuzzy (recall that when the 2 stocks had a negative correlation they still went up together 60% of the time!)

When I read the thread, I found myself nodding along but I needed to walk through it to fully appreciate the math. That’s a useful lesson on its own.

If you found this post helpful, I use another of @10kdiver’s threads to show how we can solve a compounding probability problem using option theory:

Solving A Compounding Riddle With Black-Scholes (13 min read)

Bet Sizing Is Not Intuitive

Humans are not good bettors.

It takes effort both in study and practice to become more proficient. But like anything hard, most people won’t persevere. Devoting some cycles to improve will arm you with a rare arrow in your quiver as you go through life.

Skilled betting demands 2 pivotal actions:

  1. Identifying attractive propositions

    This can be coded as “positive expected value” or “good risk/reward”. There is no strategy that turns a bad proposition into an attractive one on its own merit (as opposed to something like buying insurance which is a bad deal in isolation but can make sense holistically). For example, there is no roulette betting strategy that magically turns its negative EV trials into a positive EV session.

  2. Effective bet sizing

    Once you are faced with an attractive proposition, how much do you bet? While this is also a big topic we can make a simple assertion — bad bet sizing is enough to ruin a great proposition. This is a deeper point than it appears. By sizing a bet poorly, you can fumble away a certain win. You cannot afford to get bet sizing dramatically wrong.

Of these 2 points, the second one is less appreciated. Bet sizing is not very intuitive.

To show that, we will examine a surprising study.

The Haghani-Dewey Biased Coin Study

In October 2016, Richard Dewey and Victor Haghani (of LTCM infamy) published a study titled:

Observed Betting Patterns on a Biased Coin (Editorial from the Journal of Portfolio Management)

The study is a dazzling illustration of how poor our intuition is for proper bet sizing. The link goes into depth about the study. I will provide a condensed version by weaving my own thoughts with excerpts from the editorial.

The setup

  • 61 individuals start with $25 each. They can play a computer game where they can bet any proportion of their bankroll on a coin. They can choose heads or tails. They are told the coin has a 60% chance of landing heads. The bet pays even money (i.e. if you bet $1, you either win or lose $1). They get 30 minutes to play.
  • The sample was largely composed of college-age students in economics and finance and young professionals at financial firms. We had 14 analyst and associate-level employees of two leading asset management firms.

Your opportunity to play

Before continuing with a description of what an optimal strategy might look like, we ask you to take a few moments to consider what you would do if given the opportunity to play this game. Once you read on, you’ll be afflicted with the curse of knowledge, making it difficult for you to appreciate the perspective of our subjects encountering this game for the first time.

If you want to be more hands-on, play the game here.

Devising A Strategy

  1. The first thing to notice is betting on heads is positive expected value (EV). If X is your wager:

    EV = 60% (x) – 40% (x) = 20% (x)

    You expect to earn 20% per coin flip. 

  2. The next observation is the betting strategy that maximizes your total expected value is to bet 100% of your bankroll on every flip. 

  3. But then you should notice that this also maximizes your chance of going broke. On any single flip, you have a 40% of losing your stake and being unable to continue this favorable game. 

  4. What if you bet 50% of your bankroll on every flip?

    On average you will lose 97% of your wealth (as opposed to nearly 100% chance if you had bet your full bankroll). 97% sounds like a lot! How does that work?

    If you bet 50% of your bankroll on 100 flips you expect 60 heads and 40 tails. 

    If you make 50% on 60 flips, and lose 50% on 40 flips your expected p/l:

1.560 x .5040 = .033

You will be left with 3% of your starting cash! This is because heads followed by tails, or vice versa, results in a 25% loss of your bankroll (1.5 * 0.5 = 0.75).

This is a significant insight on its own. Cutting your bet size dramatically from 100% per toss to 50% per toss left you in a similar position — losing all or nearly all your money.

Optimal Strategy

There’s no need for build-up. There’s a decent chance any reader of this blog has heard of the Kelly Criterion which uses the probabilities and payoffs of various outcomes to compute an “optimal” bet size. In this case, the computation is straightforward — the optimal bet size as a fraction of the bankroll is 20%, matching the edge you get on the bet.

Since the payoff is even money the Kelly formula reduces to 2p -1 where p = probability of winning.

2 x 60% – 1 = 20%

The clever formula developed by Bell Labs researcher John Kelly:

provides an optimal betting strategy for maximizing the rate of growth of wealth in games with favorable odds, a tool that would appear a good fit for this problem. Dr. Kelly’s paper built upon work first done by Daniel Bernoulli, who resolved the St. Petersburg Paradox— a lottery with an infinite expected payout—by introducing a utility function that the lottery player seeks to maximize. Bernoulli’s work catalyzed the development of utility theory and laid the groundwork for many aspects of modern finance and behavioral economics. 

The emphasis refers to the assumption that a gambler has a log utility of wealth function. In English, this means the more money you have the less a marginal dollar is worth to you. Mathematically it also means that the magnitude of pain from losing $1 is greater than magnitude of joy from gaining $1. This matches empirical findings for most people. They are “loss-averse”.

How did the subjects fare in this game?

The paper is blunt:

Our subjects did not do very well. Suboptimal betting came in all shapes and sizes: overbetting, underbetting, erratic betting, and betting on tails were just some of the ways a majority of players squandered their chance to take home $250 for 30 minutes play.

Let’s take a look, shall we?

Bad results and strange behavior

Only 21% of participants reached the maximum payout of $250, well below the 95% that should have reached it given a simple constant percentage betting strategy of anywhere from 10% to 20%

  • 1/3 of the participants finished will less money than the $25 they started with. (28% went bust entirely!)
  • 67% of the participants bet on tails at some point. The authors forgive this somewhat conceding that players might be curious if the tails really are worse, but 48% bet on tails more than 5 times! Many of these bets on tails occurred after streaks of heads suggesting a vulnerability to gambler’s fallacy.
  • Betting patterns and debriefings also found prominent use of martingale strategies (doubling down after a loss).
  • 30% of participants bet their entire bankroll on one flip, raising their risk of ruin from nearly 0% to 40% in a lucrative game!

Just how lucrative is this game?

Having a trading background, I have an intuitive understanding that this is a very profitable game. If you sling option contracts that can have a $2 range over the course of their life and collect a measly penny of edge, you have razor-thin margins. The business requires trading hundreds of thousands of contracts a week to let the law of averages assure you of profits.

A game with a 20% edge is an astounding proposition.

Not only did most of our subjects play poorly, they also failed to appreciate the value of the opportunity to play the game. If we had offered the game with no cap [and] assume that a player with agile fingers can put down a bet every 6 seconds, 300 bets would be allowed in the 30 minutes of play. The expected gain of each flip, betting the Kelly fraction, is 4% [Kris clarification: 20% of bankroll times 20% edge].

The expected value of 300 flips is $25 * (1 + 0.04)300 = $3,220,637!

In fact, they ran simulations for constant bet fractions of 10%, 15%, and 20% (half Kelly, 3/4 Kelly, full Kelly) and found a 95% probability that the subjects would reach the $250 cap!

Instead, just over 20% of the subjects reached the max payout.

Editorialized Observations

  • Considering how lucrative this game was, the performance of the participants is damning. That nearly one-third risked the entire bankroll is anathema to traders who understand that the #1 rule of trading (assuming you have a positive expectancy business) is survival.

  • Only 5 out of the 61 finance-educated participants were familiar with Kelly betting. And 2 out of the 5 didn’t consider using it. A game like this is the context it’s tailor-made for!
  • The authors note that the syllabi of MIT, Columbia, Chicago, Stanford, and Wharton MBA programs do not make any reference to betting or Kelly topics in their intro finance, trading, or asset-pricing courses. 

  • Post-experiment interviews revealed that betting “a constant proportion of wealth” seemed to be a surprisingly unintuitive strategy to participants. 

Given that many of our subjects received formal training in finance, we were surprised that the Kelly criterion was virtually unknown among our subjects, nor were they able to bring other tools (e.g., utility theory) to the problem that would also have led them to a heuristic of constant-proportion betting. 

These results raise important questions. If a high fraction of quantitatively sophisticated, financially trained individuals have so much difficulty in playing a simple game with a biased coin, what should we expect when it comes to the more complex and long-term task of investing one’s savings? Given the propensity of our subjects to bet on tails (with 48% betting on tails on more than five flips), is it any surprise that people will pay for patently useless advice? What do the results suggest about the prospects for reducing wealth inequality or ensuring the stability of our financial system? Our research suggests that there is a significant gap in the education of young finance and economics students when it comes to the practical application of the
concepts of utility and risk-taking.

Our research will be worth many multiples of the $5,574 winnings we paid out to our 61 subjects if it helps encourage educators to fill this void, either through direct instruction or through trial-and-error exercises like our game. As Ed Thorp remarked to us upon reviewing this experiment, “It ought to become part of the basic education of anyone interested in finance or gambling.”

I will add my own concern. It’s not just individual investors we should worry about. Their agents in the form of financial advisors or fund managers, even if they can identify attractive proposition, may undo their efforts by poorly sizing opportunities by either:

  1.  falling far short of maximizing

    Since great opportunities are rare, failing to optimize can be more harmful than our intuition suggests…making $50k in a game you should make $3mm is one of the worst financial errors one could make.

  2. overbetting an edge

    There isn’t a price I’d play $100mm Russian Roulette for

Getting these things correct requires proper training. In Can Your Manager Solve Betting Games With Known Solutions?, I wonder if the average professional manager can solve problems with straightforward solutions. Nevermind the complexity of assessing risk/reward and proper sizing in investing, a domain that epitomizes chaotic, adversarial dynamics.

Nassim Taleb was at least partly referring to the importance of investment sizing when he remarked, “If you gave an investor the next day’s news 24 hours in advance, he would go bust in less than a year.”

Furthermore, effective sizing is not just about analytics but discipline. It takes a team culture of truth-seeking and emotional checks to override the biases that we know about. Just knowing about them isn’t enough. The discouraged authors found:

…that without a Kelly-like framework to rely upon, our subjects exhibited a menu of widely documented behavioral biases such as illusion of control, anchoring, overbetting, sunk-cost bias, and gambler’s fallacy.


Take bet sizing seriously. A bad sizing strategy squanders opportunity. With a little effort, you can get better at maximizing the opportunities you find, rather than needing to keep finding new ones that you risk fumbling.

You need to identify good props and size them well. Both abilities are imperative. It seems most people don’t realize just how critical sizing is.

Now you do.