Understanding Implied Forwards

These are not trick questions:

Suppose you have an 85 average on the first 4 tests of the semester. There’s one test left. All tests have an equal value in your final score. You need a 90 average for an A in the class.

What do you need on the last test to get an A in the class?

What is the maximum score you can get for the semester?

If you are comfortable with the math you have the prerequisites required to learn about a useful finance topic — implied forwards!

Implied forwards can help you:

  • find trading opportunities
  • understand arbitrage and its limits

We’ll start in the world of interest rates.

The Murkiness Of Comparing Rates Of Different Maturities

Consider 2 zero-coupon bonds. One that matures in 11 months and one that matures in 12 months. They both mature to $100.

Scenario A: The 11-month bond is trading for $92 and the 12-month bond is trading for $90.

What are the annualized yields of these bonds if we assume continuous compounding?1
Computing the 12-month yield

r = ln($100/$90) = 10.54%
Computing the 11-month yield

r = ln($100/$92) * 12/11 = 9.10%

This is an ascending yield curve. You are compensated with a higher interest rate for tying up your money for a longer period of time.

But it is very steep.

You are picking up 140 extra basis points of interest for just one extra month.

Let’s do another example.

Scenario B: We’ll keep the 12-month bond at $90 but say the 11-month bond is trading for only $91.
Computing the 11-month yield

r = ln($100/$91) * 12/11 = 10.29%

So now the 11-month bond yields 10.29% and the 12-month bond yields 10.54%

You still get paid more for taking extra time risk but maybe it looks more reasonable. It’s kind of hard to reason about 25 bps for an extra month. It’s murky.

Think back to the test score question this post opened with. There is another way of looking at this if we use a familiar concept — the weighted average.

The Implied Forward Interest Rate

We can think of the 12-month rate as the average rate over all the intervals. Just like a final grade is an average of the individual tests.

We can decompose the 12-month rate into the average of an 11-month rate plus a month-11 to month-12 forward rate:

“12-month” rate = “11-month” rate + “11 to 12-month” forward rate

Let’s return to scenario A:

12-month rate = 10.54%

11-month rate = 9.1%
Compute the “11 to 12-month” forward rate like a weighted average:

10.54% x 12 = 9.1% x 11 + Forward Rate11-12 x 1

Forward Rate11-12 = 26.37%

We knew that 140 bps was a steep premium for one month but when you explicitly compute the forward you realize just how obnoxious it really is.
How about scenario B:

12-month rate = 10.54%

11-month rate = 10.29%
Compute the “11 to 12-month” forward rate like a weighted average:

10.54% x 12 = 10.29% x 11 + Forward Rate11-12 x 1

Forward Rate11-12 = 13.26%

Arbitraging The Forward Rate (Sort Of)

It’s common to have a dashboard that shows term structures. But the slopes between months can be optically underwhelming with such a view. Seeing that the implied forward rate is 13.26% feels more profound than seeing a 25 bps difference between month 11 and month 12.

You may be thinking, “this forward rate is a cute spreadsheet trick, but it’s not a rate that exists in the market.”

Let’s take a walk through a trade and see if we can find this rate in the wild.

The first step is just to ground ourselves in a basic example before we understand what it means to capture some insane forward rate.

Consider a flat-term structure:

[Note: the forward rate should be 10.54% but because I’m computing YTM on a bond price that only goes to 2 decimal places we are getting an artifact. It’s immaterial for these demonstrations]

Now let’s look back at the steep term structure from scenario A:

With an 11-month rate of 9.10% and a 12-month rate of 10.54% we want to borrow at the shorter-term rate and lend at the longer-term rate. That means selling the nearer bond and buying the longer bond.

When you study asset pricing, one of the early lessons is to step through the cash flows. This is the basis of arbitrage pricing theory (APT), a way of thinking about asset values according to their arbitrage or boundary conditions. As opposed to other pricing models, for example CAPM, someone using APT says the price of an asset is X because if it weren’t there would be free money in the world. By walking through the cash flows, they would then show you the free money2. The fair APT price is the one for which there is no free money.

Stepping Thru The Cash Flows

Let’s see how this works:

  1. We short the 11-month bond at $92
  2. We buy 1.022 12-month bonds for $90. We can buy 1.022 of the cheaper bonds from the proceeds of selling the more expensive $92 bond. The net cash flow or outlay is $0.
  3. Spend the next 11 months surfing.

At the 11-month maturity

We will need $100 to pay the bondholder of the 11-month bond so we sell 12-month bonds.

But for what price?

Well, let’s say the prevailing 1-month interest rate matched the rates we were seeing in the flat term structure world of 10.49%, the rate implied by the 11-12 month forward when we initiated the trade.

In that case, the bonds we own are worth $99.13.

[With one month to maturity we compute the continuous YTM: ln(100/99.13) * 12 = 10.49%]

If we sell 1.009 of our bonds at $99.13 we can raise the $100 to pay back the loan. We are left with .0134 bonds.
At the 12-month maturity

Our stub of .0134 bonds mature and we are left with $1.34.

So what was our net return?

Hmm, lemme think, carry the one, uh — infinite!

We did a zero cash flow trade at the beginning. We didn’t lay out any money and ended with $1.34.

That’s what happens when you effectively shorted a 26.37% forward rate but the one-month rate has rolled down to something normal, in this case about 10.50%

[In real life there is all kind of frictions — you know like, collateral when you short bonds.]

Summary table:

What if somehow, that crazy 26.37% “11-12 month forward rate” didn’t roll down to a reasonable spot rate but actually turned out to be a perfect prediction of what the 1-month rate would be in 11 months?

Let’s skip straight to the summary table.

Note the big difference in this scenario: the bond with 1 month remaining until maturity is only worth $97.83 (corresponding to that 26.33% yield, ignore small rounding). So you need to sell all 1.022 of the bonds to raise $100 to pay back the loan.

Besides frictions, you can see why this is definitely not an arbitrage — if the 1-month rate spiked even higher than 26.33% the price of the bonds would be lower than $97.83. You would have sold all 1.022 of your bonds and still not been able to repay the $100 you owe!

So the “borrow short, lend long” trade is effectively a way to short a 1-month forward at 26.33%. It might be a good trade but it’s not free money.

Still, this exercise shows how our measure of the forward is a tradeable level!

[If you went through the much more arduous task of adjusting for all the real-world frictions and costs you would impute a forward rate that better matched what you considered to be a “tradeable price”. The principle is the same, the details will vary. I was not a fixed-income trader and own all the errors readers discover.]

The Implied Forward Implied Volatility

Now you’re warmed up.

Like interest rates, implied volatilities have a term structure. Every pair of expiries has an implied forward volatility. The principle is the same. The math is almost the same.

With interest rates we were able to do the weighted average calculation by multiplying the rates by the number of days or fraction of the year. That’s because there is a linear relationship between time and rates. If you have an un-annualized 6-month rate, you simply double it to find the annualized rate. You can’t do that with volatility.3

The solution is simple. Just square all the implied volatility inputs so they are variances. Variance is proportional to time so you can safely multiply variance by the number of days. Take the square root of your forward variance to turn it back into a forward volatility.

Consider the following hypothetical at-the-money volatilities for BTC:

Expiry1 Expiry 2
Implied Vol 40% 42%
Variance (Vol2) .16 .1764
Time to Expiry (in days) 20 30

Let’s compute the 20-to-30 day implied forward volatility. We follow the same pattern as the weighted test averages and weighted interest rate examples.

The decomposition where DTE = “days to expiry”:

“variance for 30 days” = “variance for 20 days” + “variance from day 20 to 30”

Expiry2 variance * DTEExpiry2 = Expiry1 variance * DTEexpiry1 + Forward variance20-30 * Days20-30

Re-arrange for forward variance:

Fwd Variance20-30 = (Expiry2 variance * DTEExpiry2 – Expiry1 variance * DTEexpiry1) / Days20-30

Fwd Variance20-30 = (.1764 * 30 – .16 * 20) / 10

Fwd Variance20-30 = .2092

Turning variance back into volatility:

√.2092 = 45.7%

If the 20-day option implies 40% vol and the 30-day option implies 42% vol, then it makes sense that the vol between 20 and 30 days must be higher than 42%. The 30-day volatility includes 42% vol for 20 days, so the time contained in the 30-day option that DOES NOT overlap with the 20-day option must be high enough to pull the entire 30-day vol up.

This works in reverse as well. If the 30-day implied volatility were lower than the 20-day vol, then the 20-30 day forward vol would need to be lower than the 30-day volatility.

The Arbitrage Lower Bound of a Calendar Spread

The fact that the second expiry includes the first expiry creates an arbitrage condition (at least in equities). An American-style time spread cannot be worth less than 0. In other words, a 50 strike call with 30 days to expiry cannot be worth less than a 50 strike call with 20 days to expiry.

Here’s a little experiment (use ATM options, it will not work if the options are far OTM and therefore have no vega):

Pull up an options calculator where you make a time spread worth 0.

I punched in a 9-day ATM call at 39.6% vol and a 16-day ATM call at 29.70001% vol. These options are worth the same (for the $50 strike ATM they are both worth $1.24).

Now compute the implied forward vol.

Expiry1 Expiry 2
Implied Vol 39.6% 29.70001%
Variance (Vol2) .157 .088
Time to Expiry (in days) 9 16

You can predict what happens when we weight the variance by days:

Expiry1 = .157 * 9 = 1.411

Expiry2 = .088 * 16 = 1.411

Expiry 2 has the same total variance as Expiry 1 which means there is zero implied variance between day 9 and day 16.

The square root of zero is zero. That’s an implied forward volatility of zero!

A possible interpretation of zero implied forward vol:

The market expects a cash takeover of this stock to close no later than day 9 with 100% probability.

A Simple Tool To Build

With a list of expirations and corresponding ATM volatility, you can construct your own forward implied volatility matrix:


Like the interest rate forward example, there’s no arbitrage in trying to isolate the forward volatility unless you can buy a time spread for zero.4

For most of the past decade, implied volatility term structures have been ascending (or “contango” for readers who once donned a NYMEX or CBOT badge). If you sell a fat-looking time spread you have a couple major “gotchas” to contend with:

  1. Weighting the trade
    If you are short a 1-to-1 time spread you are short both vega, long gamma, paying theta. This is not inherently good or bad. But you need a framework for choosing which risks you want and at what price (that statement is basically the bumper sticker definition of trading imbued simultaneously with truth and banality). If you want to bet on the time spread narrowing, ie the forward vol declining, then you need to ratio the trades. The end of Moontower On Gamma discusses that. Even then, you still have problems with path-dependence because the gamma profile of the spread will change as soon as the underlying moves. The reason people trade variance swaps is that the gamma profile of the structure is constant over a wide range of strikes providing even exposure to the realized volatility. Sure you could implement a time spread with variance swaps, but you get into idiosyncratic issues such as bilateral credit risk and greater slippage.
  2. The bet, like the interest rate bet, comes down to what the longer-dated instrument does outright.You were trying to isolate the forward vol, but as time passes your net vega grows until eventually the front month expires and you are left with a naked vol position in the longer-dated expiry and your gamma flips from highly positive to negative (assuming the strikes were still near the money).

Term structure bets are usually not described as bets on forward volatility bets but more in the context of harvesting a term premium as time passes and implied vols “roll down the term structure”. This is a totally reasonable way to think of it, but using an implied forward vol matrix is another way to measure term premiums.

The Wider Lessons


Forwards vols represent another way to study term structures. Since term structures can shift, slope, and twist you can make bets on the specific movements using outright vega, time spreads, and time butterflies respectively. A tool to measure forward vols is a thermometer in a doctor’s bag. How do we conceptually situate such tools in the greater context of diagnosis and treatment?

Here’s my personal approach. Recognize that there are many ways to skin a cat, this is my own.

  1. I use dashboards with cross-sectional analysis as the top of an “opportunity funnel”. You could use highly liquid instruments to calibrate to a fair pricing of parameters (skew, IV risk premium, term premium, wing pricing, etc) in the world at any one point in time. This is not trivial and why I emphasize that trading is more about measurement than prediction. To compare parameters you need to normalize across asset types.
    To demonstrate just how challenging this is, an interview question I might ask is:

    Price a 12-month option on an ETF that holds a rolling front-month contract on the price of WTI crude oil5

    I wouldn’t need the answer to be bullseye accurate. I’m looking for the person’s understanding of arbitrage-pricing theory which is fundamental to being able to normalize comparisons between financial instruments. The answer to the question requires a practical understanding of replicating portfolios, walking through the time steps of a trade, and computing implied forward vols on assets with multiple underlyers. (Beyond pricing, actually trading such a derivative requires understanding the differences in flows between SEC and CFTC-governed markets and who the bridges between them are.)

  2. The contracts or asset classes that “stick out” become a list of candidates for research. There are 2 broad steps for this research.
    • Do these “mispriced” parameters reveal an opportunity or just a shortcoming in your normalization?
      Sleuthing the answer to that may be as simple as reading something publically available or could require talking to brokers or exchanges to see if there’s something you are missing. If you are satisfied to a degree of certainty commensurate with the edge in the opportunity that you are not missing anything crucial, then you can move to the next stage of investigation.
    • Understanding the flow
      What flow is causing the mispricing? What’s the motivation for the flow? Is it early enough to bet with it? Is it late enough to bet against it? You don’t want to trade the first piece of a large order but you will not get to trade the last piece either (that piece will be either be fed to the people who got hurt trading with the flow too early as a favor from the broker who ran them over — trading is a tit-for-tat iterated game, or internalized by the bank who controls the flow and knows the end is near.)

3. Execute

Suppose you determine that the term structure is too cheap compared to a “fair term structure” as triangulated by an ensemble of cross-sectional measurements. Perhaps, there is a large oil refiner selling gasoline calls to hedge their inventory (like covered calls in the energy world). You can use the forward vol matrix to drill down to the expiry you want to buy. “Ah, the 9-month contract looks like the best value according to the matrix. Let’s pull up a montage and see if it’s really there. Let’s see what the open interest is?…”

As you examine quotes from the screens or brokers, you may discover that the tool is just picking up a stale bid/ask or wide market, and that the cheapest term isn’t really liquid or tradeable. This isn’t a problem with the tool, it’s just a routine data screening pitfall. The point is that tools of this nature can help you optimize your trade expression in the later stage of the funnel.


This discussion of forward vols was like month 1 learning at SIG. It’s foundational. It’s also table stakes. Every pro understands it. I’m not giving away trade secrets. I am not some EMH maxi6 but I’ll say I’ve been more impressed than not at how often I’ll explore some opportunity and be discouraged to know that the market has already figured it out. The thing that looks mispriced often just has features that are overlooked by my model. This doesn’t become apparent until you dig further, or until you put on a trade only to get bloodied by something you didn’t account for as a particular path unfolds.

This may sound so negative that you may wonder why I even bother writing about this on the internet. Most people are so far out of their depth, is this even useful? My answer is a confident “yes” if you can learn the right lesson from it:

There is no silver bullet. Successful trading is the sum of doing many small things correctly including reasoning. Understanding arbitrage-pricing principles is a prerequisite for establishing what is baked into any price. Only from that vantage point can one then reason about why something might be priced in a way that doesn’t make sense and whether that’s an opportunity or a trap7. By slowly transforming your mind to one that compares any trade idea with its arbitrage-free boundary conditions or replicating portfolio/strategy, you develop an evergreen lens to ever-changing markets.

You may only gain or handle one small insight from these posts. But don’t be discouraged. Understanding is like antivenom. It takes a lot of cost and effort to produce a small amount8. If you enjoy this process despite its difficulty then it’s a craft you can pursue for intellectual rewards and profit.

If profit is your only motivation, at least you know what you’re up against.

Examples Of Comparing Interest Rates With Different Compounding Intervals

Simple Interest

If you pay someone $90 today and they promise to give you $100 in 12 months, you are making a loan. This is the same idea as buying a bond. To back out the simple interest rate denoted (ie assuming no compounding) we solve for r:

90 * (1+r) = 100

r = 100/90 – 1

r = 11.11%


If the loan was only for 6 months, then we’d annualize the interest rate by multiplying by 2 (12 months / 6 months) for a rate of 22.22%

Compound Interest

Let’s return to the 12-month loan and say that the rate is compounded semi-annually. Then the computation is:

90 * (1+r/2)² = 100

r /2 = (100/90).5 – 1

r = 10.82%

If you compound more frequently than annually, it makes sense that the implied interest rate is lower. Consider the path of the principal + accrued interest:

Compounding semi-annually means interest gets credited at the 6-month mark. So the rate for the next 6 months is being applied to the higher accrued value amount which means the implied rate to end up at $100 (the same way the simple interest case ends up $100) must be lower than the simple interest case.

Continuous Interest

We can compound interest more frequently. Quarterly, monthly, daily. Since the number we are backing out, namely the implied rate, is being applied to a growing basket of principal + accrued interest at each checkpoint (I think of the compounding interval as a checkpoint where the accrued interest is rolled into the remaining loan balance), the implied rate to end up at $100 must be smaller. If we take this logic to the extreme and keep cutting the time interval into smaller increments we eventually hit the limit of Δt → 0. The derivatives world models everything in continuous time finance so interest rates get the same treatment.

Mechanically, the math is no harder.

To compute the continuously compounded interest rate we still just solve for r:

90 * ert = 100

t is a fraction of a year. So for the 12-month case:

90 * er*1 = 100

er = 100/90

r ln(e) = ln(100/90)

r = 10.54%

As expected, this is a lower implied rate than the 11.11% simple rate and the 10.82% semi-annual rate. Again, because we are compounding continuously.

Annualizing remains easy. If $90 grows to $100 in just 6 months, we compute the continuously compounded rate as follow:

90 * er*1/2 = 100

er*1/2 = 100/90

r *1/2 = ln(100/90)

r = 21.07%

This can be contrasted with the 22.22% 6-month loan using simple interest we computed earlier.

Application To Real Life

Note in all these cases, $90 is growing to $100. We are just seeing that the implied rate depends on the compounding assumption. In real life, when you see “compounded daily” or “compounded monthly” and so on, you are now equipped with the tools to compare rates on an apples-to-apples basis. If a rate is lower but compounds more frequently than another rate the relative value between both loans is ambiguous.

APYs disclosed on financial products make yields comparable. But now you understand how APYs convert different rate schedules into a single measure.

An Example Of Using Probability To Build An Intuition For Correlation

The power of negative correlations is powerful when you see how rebalancing increases your expected compounded return. This isn’t intuitive to a typical, especially retail investor.

I’ve tried to make it easier to understand:

One of my favorite finance educators recently wrote an absolute must-read thread on this topic.

He creates a model with 2 simplifying features:

  • There are only 2 stocks
  • They are rebalanced to equal weight

You can use the intuition from this exercise to guide your portfolio thinking more broadly. It’s beautifully done and you should work through it carefully not just for the intuition but the practical knowledge of how to compute an expected return in a compounding context. However, there is a part I struggled with that I want to zoom in on because I’ve never before seen it presented as @10kdiver does it:

He converts probability to an estimate of correlation!

This is really cool. But because I struggled and the learnings of the thread are both important I dual purpose to writing this post.

  1. The meta-lesson

    This is the easy one:

    When I read the post, it was easy to nod along thinking “yep, that makes sense…ok, ok, got it”. Except for that, I don’t “got it”. I couldn’t reconstruct the logic on my own on a blank sheet of paper which means I didn’t learn it. Paradoxically, this demonstrates how good @10diver’s explanation was. Extrapolate this paradox to many things you think you learned by reading and you will have internalized a useful life lesson — get your hands dirty to actually learn.

  2. Diving into the probability math I struggled with.

    Let’s do it…

Zooming In: The Probability Basis For Correlation


Example computation for CAGR (also seen in tweet #4):

CAGR_A = =((1+A_up_size)^(A_prob_up*hold_period)*(1+A_down_size)^(A_prob_down*hold_period))^(1/hold_period)-1

Define the probability space

We are focusing on tweets 6-10 in particular. The summary matrix:

Understanding the boxes:

Start with the logic: “what would the probability space look like if they were perfectly correlated?”

  • Top left box = X (This corresponds to both up)

They would go up together 80% of the time if they were perfectly correlated. We generalize “probability of stocks up together as X”

  • Top right box = .8-X (This corresponds to B up, A down)

Since stock B goes up 80% of the time we know its probability of going down is .8-X

  • Bottom left box = .8 – X  (This corresponds to A up, B down)

Since stock A goes up 80% of the time we know its probability of going down is .8-X

  • Bottom right box = X – .6 (This corresponds to both down)

With one box left it’s easy, we know all the boxes must sum to 100% probability.

100% – [X + 80% -X + 80% – X] = X – .6

We called the probability of moving up together X. We set the matrix up using the simple case of the stocks being perfectly correlated (ie moving up together 80% of the time). But they don’t need to be perfectly correlated. So now we can find the range of X, a joint probability, that is internally consistent with each stock’s individual probability of going up.

What is the probability range of X ie “how often the stocks move together”?

Upper bound

X is defined as “how often they move up together”. Another way to think of this:  the upper bound of the joint probability is the lower bound of how often either stock goes up.

Let’s change the numbers and pretend stock A goes up 50% of the time and stock B goes up 80% of the time. Then 50% is the upper bound of how often they can both up together. (Stock A is the limiting reagent here, it can’t move up more than 50% of the time). So the minimum of their “up” probabilities represents an upper bound on X.

Back to the original example, the upper bound of how often these stocks move together is 80% because the minimum of either stock’s individual probability of going up is 80%. Mathematically this is

.8 – X > 0 so:

Upper bound of X = 80%

Lower bound

Proceeding with the logic that no box can be negative, the bottom right box cannot be less than 60%. This represents the least co-movement possible given the stocks’ probabilities.

Lower bound of X = 60%

Think of it this way, if there were 10 trials each stock could have 2 down years. If they were maximally correlated the stocks would share the same down 2 down years. If they were minimally correlated they would never go down at the same time. The probability of both stocks going down simultaneously would be zero, but since the 4 down years would be spread out over 10 years, the pair of stocks would only go up simultaneously 60% of the time.


The probability of the stocks moving together, X, is bounded as:

60% < X < 80%

X is not a correlation. X is a probability. The fact that the stocks can co-move from 60-80% of the time maps to a correlation.

A Key Insight

A zero correlation means 2 variables are independent! If they are independent, the joint probability is a simple product of their individual probabilities.

That’s why the 0 correlation point corresponds to 64%:

X = .8 x .8 = 64%

Loosely Mapping Probability to Correlation

If you’re feeling spry, you can use the probability space and covariance math to compute the actual correlation. But, we can estimate the rough shape of the correlation using zero correlation (statistical independence corresponding to X = 64%, the joint probability of both stocks going up together) as the fulcrum.

Look back at tweet #10 to see the extremes:

At the lowest correlation, corresponding to a co-movement of 60% frequency:

  • The correlation is slightly negative. It’s below the 64% independence point.
  • The stocks NEVER go down together.
  • The stocks move in opposite directions 40% of the time
  • When the stocks do move together, it’s up.
  • The stocks have a negative correlation despite being up together 60% of the time.

At the highest correlation point, corresponding to 80% frequency of co-movement:

  • The stocks go up 80% of the time together
  • They go down 20% of the time together
  • They never move in opposite directions.
  • The magnitude of the max positive correlation is greater than the magnitude of the maximum negative correlation since the independence point is near the lower end of the range.

Rebalancing Benefits Improve As Correlations Fall

The thread heats up again in tweet #17 by identifying the possible values of the portfolio rebalanced to 50/50 at the end of a year.

In tweet #18, those states are weighted by the probabilities to generate expected values of the portfolio, which can finally be used to compute the CAGR of the portfolio if rebalanced annually.

The lower the value of X (the joint probability of the stocks moving up together), the lower the correlation.

The lower the correlation, the higher the expected value of a rebalanced portfolio.

The remainder of the thread speaks for itself:

  • When X = 60% (ie, strongly negative correlation), we have:
    • Without re-balancing: $1 –> $5.94
    • With re-balancing: $1 –> $17.85 (>3x as much!), over the same 25 years.
    • Thus, negative correlations + re-balancing can be a powerful combination.

  • If we do this well, our portfolio can end up getting us a HIGHER return than any single stock in it! We just saw an example with 2 stocks. Each got us only ~7.39%. But a 50/50 re-balanced portfolio of them got us ~12.22%. When I first saw this, I couldn’t believe it!

    [Moontower note: in practice, portfolios usually have many names and a variety of weighting schemes. While the intuition is similar the math is more complex and you are now looking at a matrix of pairwise correlations, assets with varying volatilities and therefore different weights in the portfolio]

  • This is the ESSENCE of diversification. We minimize correlations, so our portfolio nearly always has both risen and fallen stocks. We “cash in” on this gap via re-balancing — ie, we periodically sell over-valued stocks and put the money into under-valued ones.

  • Negative correlations aren’t strictly necessary. We could use stocks with zero — or even positive — correlation. But the MORE heavily correlated our stocks, the LESS “bang for the buck” we get from re-balancing.

Wrapping Up

The idea that low or negative correlations improve with falling correlations is common knowledge in professional circles. Still, the intuition is elusive. The sheer size of the effect on total CAGR is shocking.

Until @10kdriver’s thread, I hadn’t seen a mapping from probability which is intuitive to correlation which is fuzzy (recall that when the 2 stocks had a negative correlation they still went up together 60% of the time!)

When I read the thread, I found myself nodding along but I needed to walk through it to fully appreciate the math. That’s a useful lesson on its own.

If you found this post helpful, I use another of @10kdiver’s threads to show how we can solve a compounding probability problem using option theory:

Solving A Compounding Riddle With Black-Scholes (13 min read)

Bet Sizing Is Not Intuitive

Humans are not good bettors.

It takes effort both in study and practice to become more proficient. But like anything hard, most people won’t persevere. Devoting some cycles to improve will arm you with a rare arrow in your quiver as you go through life.

Skilled betting demands 2 pivotal actions:

  1. Identifying attractive propositions

    This can be coded as “positive expected value” or “good risk/reward”. There is no strategy that turns a bad proposition into an attractive one on its own merit (as opposed to something like buying insurance which is a bad deal in isolation but can make sense holistically). For example, there is no roulette betting strategy that magically turns its negative EV trials into a positive EV session.

  2. Effective bet sizing

    Once you are faced with an attractive proposition, how much do you bet? While this is also a big topic we can make a simple assertion — bad bet sizing is enough to ruin a great proposition. This is a deeper point than it appears. By sizing a bet poorly, you can fumble away a certain win. You cannot afford to get bet sizing dramatically wrong.

Of these 2 points, the second one is less appreciated. Bet sizing is not very intuitive.

To show that, we will examine a surprising study.

The Haghani-Dewey Biased Coin Study

In October 2016, Richard Dewey and Victor Haghani (of LTCM infamy) published a study titled:

Observed Betting Patterns on a Biased Coin (Editorial from the Journal of Portfolio Management)

The study is a dazzling illustration of how poor our intuition is for proper bet sizing. The link goes into depth about the study. I will provide a condensed version by weaving my own thoughts with excerpts from the editorial.

The setup

  • 61 individuals start with $25 each. They can play a computer game where they can bet any proportion of their bankroll on a coin. They can choose heads or tails. They are told the coin has a 60% chance of landing heads. The bet pays even money (i.e. if you bet $1, you either win or lose $1). They get 30 minutes to play.
  • The sample was largely composed of college-age students in economics and finance and young professionals at financial firms. We had 14 analyst and associate-level employees of two leading asset management firms.

Your opportunity to play

Before continuing with a description of what an optimal strategy might look like, we ask you to take a few moments to consider what you would do if given the opportunity to play this game. Once you read on, you’ll be afflicted with the curse of knowledge, making it difficult for you to appreciate the perspective of our subjects encountering this game for the first time.

If you want to be more hands-on, play the game here.

Devising A Strategy

  1. The first thing to notice is betting on heads is positive expected value (EV). If X is your wager:

    EV = 60% (x) – 40% (x) = 20% (x)

    You expect to earn 20% per coin flip. 

  2. The next observation is the betting strategy that maximizes your total expected value is to bet 100% of your bankroll on every flip. 

  3. But then you should notice that this also maximizes your chance of going broke. On any single flip, you have a 40% of losing your stake and being unable to continue this favorable game. 

  4. What if you bet 50% of your bankroll on every flip?

    On average you will lose 97% of your wealth (as opposed to nearly 100% chance if you had bet your full bankroll). 97% sounds like a lot! How does that work?

    If you bet 50% of your bankroll on 100 flips you expect 60 heads and 40 tails. 

    If you make 50% on 60 flips, and lose 50% on 40 flips your expected p/l:

1.560 x .5040 = .033

You will be left with 3% of your starting cash! This is because heads followed by tails, or vice versa, results in a 25% loss of your bankroll (1.5 * 0.5 = 0.75).

This is a significant insight on its own. Cutting your bet size dramatically from 100% per toss to 50% per toss left you in a similar position — losing all or nearly all your money.

Optimal Strategy

There’s no need for build-up. There’s a decent chance any reader of this blog has heard of the Kelly Criterion which uses the probabilities and payoffs of various outcomes to compute an “optimal” bet size. In this case, the computation is straightforward — the optimal bet size as a fraction of the bankroll is 20%, matching the edge you get on the bet.

Since the payoff is even money the Kelly formula reduces to 2p -1 where p = probability of winning.

2 x 60% – 1 = 20%

The clever formula developed by Bell Labs researcher John Kelly:

provides an optimal betting strategy for maximizing the rate of growth of wealth in games with favorable odds, a tool that would appear a good fit for this problem. Dr. Kelly’s paper built upon work first done by Daniel Bernoulli, who resolved the St. Petersburg Paradox— a lottery with an infinite expected payout—by introducing a utility function that the lottery player seeks to maximize. Bernoulli’s work catalyzed the development of utility theory and laid the groundwork for many aspects of modern finance and behavioral economics. 

The emphasis refers to the assumption that a gambler has a log utility of wealth function. In English, this means the more money you have the less a marginal dollar is worth to you. Mathematically it also means that the magnitude of pain from losing $1 is greater than magnitude of joy from gaining $1. This matches empirical findings for most people. They are “loss-averse”.

How did the subjects fare in this game?

The paper is blunt:

Our subjects did not do very well. Suboptimal betting came in all shapes and sizes: overbetting, underbetting, erratic betting, and betting on tails were just some of the ways a majority of players squandered their chance to take home $250 for 30 minutes play.

Let’s take a look, shall we?

Bad results and strange behavior

Only 21% of participants reached the maximum payout of $250, well below the 95% that should have reached it given a simple constant percentage betting strategy of anywhere from 10% to 20%

  • 1/3 of the participants finished will less money than the $25 they started with. (28% went bust entirely!)
  • 67% of the participants bet on tails at some point. The authors forgive this somewhat conceding that players might be curious if the tails really are worse, but 48% bet on tails more than 5 times! Many of these bets on tails occurred after streaks of heads suggesting a vulnerability to gambler’s fallacy.
  • Betting patterns and debriefings also found prominent use of martingale strategies (doubling down after a loss).
  • 30% of participants bet their entire bankroll on one flip, raising their risk of ruin from nearly 0% to 40% in a lucrative game!

Just how lucrative is this game?

Having a trading background, I have an intuitive understanding that this is a very profitable game. If you sling option contracts that can have a $2 range over the course of their life and collect a measly penny of edge, you have razor-thin margins. The business requires trading hundreds of thousands of contracts a week to let the law of averages assure you of profits.

A game with a 20% edge is an astounding proposition.

Not only did most of our subjects play poorly, they also failed to appreciate the value of the opportunity to play the game. If we had offered the game with no cap [and] assume that a player with agile fingers can put down a bet every 6 seconds, 300 bets would be allowed in the 30 minutes of play. The expected gain of each flip, betting the Kelly fraction, is 4% [Kris clarification: 20% of bankroll times 20% edge].

The expected value of 300 flips is $25 * (1 + 0.04)300 = $3,220,637!

In fact, they ran simulations for constant bet fractions of 10%, 15%, and 20% (half Kelly, 3/4 Kelly, full Kelly) and found a 95% probability that the subjects would reach the $250 cap!

Instead, just over 20% of the subjects reached the max payout.

Editorialized Observations

  • Considering how lucrative this game was, the performance of the participants is damning. That nearly one-third risked the entire bankroll is anathema to traders who understand that the #1 rule of trading (assuming you have a positive expectancy business) is survival.

  • Only 5 out of the 61 finance-educated participants were familiar with Kelly betting. And 2 out of the 5 didn’t consider using it. A game like this is the context it’s tailor-made for!
  • The authors note that the syllabi of MIT, Columbia, Chicago, Stanford, and Wharton MBA programs do not make any reference to betting or Kelly topics in their intro finance, trading, or asset-pricing courses. 

  • Post-experiment interviews revealed that betting “a constant proportion of wealth” seemed to be a surprisingly unintuitive strategy to participants. 

Given that many of our subjects received formal training in finance, we were surprised that the Kelly criterion was virtually unknown among our subjects, nor were they able to bring other tools (e.g., utility theory) to the problem that would also have led them to a heuristic of constant-proportion betting. 

These results raise important questions. If a high fraction of quantitatively sophisticated, financially trained individuals have so much difficulty in playing a simple game with a biased coin, what should we expect when it comes to the more complex and long-term task of investing one’s savings? Given the propensity of our subjects to bet on tails (with 48% betting on tails on more than five flips), is it any surprise that people will pay for patently useless advice? What do the results suggest about the prospects for reducing wealth inequality or ensuring the stability of our financial system? Our research suggests that there is a significant gap in the education of young finance and economics students when it comes to the practical application of the
concepts of utility and risk-taking.

Our research will be worth many multiples of the $5,574 winnings we paid out to our 61 subjects if it helps encourage educators to fill this void, either through direct instruction or through trial-and-error exercises like our game. As Ed Thorp remarked to us upon reviewing this experiment, “It ought to become part of the basic education of anyone interested in finance or gambling.”

I will add my own concern. It’s not just individual investors we should worry about. Their agents in the form of financial advisors or fund managers, even if they can identify attractive proposition, may undo their efforts by poorly sizing opportunities by either:

  1.  falling far short of maximizing

    Since great opportunities are rare, failing to optimize can be more harmful than our intuition suggests…making $50k in a game you should make $3mm is one of the worst financial errors one could make.

  2. overbetting an edge

    There isn’t a price I’d play $100mm Russian Roulette for

Getting these things correct requires proper training. In Can Your Manager Solve Betting Games With Known Solutions?, I wonder if the average professional manager can solve problems with straightforward solutions. Nevermind the complexity of assessing risk/reward and proper sizing in investing, a domain that epitomizes chaotic, adversarial dynamics.

Nassim Taleb was at least partly referring to the importance of investment sizing when he remarked, “If you gave an investor the next day’s news 24 hours in advance, he would go bust in less than a year.”

Furthermore, effective sizing is not just about analytics but discipline. It takes a team culture of truth-seeking and emotional checks to override the biases that we know about. Just knowing about them isn’t enough. The discouraged authors found:

…that without a Kelly-like framework to rely upon, our subjects exhibited a menu of widely documented behavioral biases such as illusion of control, anchoring, overbetting, sunk-cost bias, and gambler’s fallacy.


Take bet sizing seriously. A bad sizing strategy squanders opportunity. With a little effort, you can get better at maximizing the opportunities you find, rather than needing to keep finding new ones that you risk fumbling.

You need to identify good props and size them well. Both abilities are imperative. It seems most people don’t realize just how critical sizing is.

Now you do.

Another Kind Of Mean

Let’s use this section to learn a math concept.

We begin with a question:

You drive to the store and back. The store is 50 miles away. You drive 50 mph to the store and 100 mph coming back. What’s your average speed in MPH for the trip?

[Space to think about the problem]




[If you think the answer is 75 there are 2 problems worth pointing out. One of them is you have the wrong answer.]




[The other is that 75 is the obvious gut response, but since I’m asking this question, you should know that’s not the answer. If it’s not the answer that should clue you in to think harder about the question.]




[You’re trying harder, right?]




[Ok, let’s get on with this]

The answer is 66.67 MPH

If you drive 50 MPH to a store 50 miles away, then it took 60 minutes to go one way.

If you drive 100 MPH on the way back you will return home in half the time or 30 minutes.

You drove 100 miles in 1.5 hours or 66.67 MPH

Congratulations, you are on the way to learning about another type of average or mean.

You likely already know about 2 of the other so-called Pythagorean means.

  • Arithmetic mean

    Simple average. Used when trying to find a measure of central tendency in a set of values that are added together.

  • Geometric mean

    The geometric mean or geometric average is a measure of central tendency for a set of values that are multiplied together. One of the most common examples is compounding. Returns and growth rates are just fractions multiplied together. So if you have 10% growth then 25% growth you compute:

    1 x 1.10 x 1.25 = 1.375

    If you computed the arithmetic mean of the growth rates you’d get 17.5% (the average of 10% and 25%).

    The geometric mean however answers the question “what is the average growth rate I would need to multiply each period by to arrive at the final return of 1.375?”

    In this case, there are 2 periods.

    To solve we do the inverse of the multiplication by taking the root of the number of periods or 1.375^1/2 – 1 = 17.26%

    We can check that 17.26% is in fact the CAGR or compound average growth rate:

    1 x 1.1726 * 1.1726 = 1.375

    Have a cigar.

The question about speed at the beginning of the post actually calls for using a 3rd type of mean:

The harmonic mean

The harmonic mean is computed by taking the average of the reciprocals of the values, then taking the reciprocal of that number to return to the original units.

That’s wordy. Better to demonstrate the 2 steps:

  1. “Take the average of the reciprocals”

    Instead of averaging MPH, let’s average hours per mile then convert back to MPH at the end:

    50 MPH = “it takes 1/50 of an hour to go a mile” = 1/50 HPM
    100 MPH = “it takes 1/100 of an hour to go a mile” = 1/100 HPM

    The average of 1/50 HPM and 1/100 HPM = 1.5/100 HPM

  2. “Take the reciprocal of that number to return to the original units”

    Flip 1.5/100 HPM to 100/1.5 MPH. Voila, 66.67 MPH

Ok, right now you are thinking “Wtf, why is there a mean that deals with reciprocals in the first place?”

If you think about it, all means are computed with numbers that are fractions. You just assume the denominator of the numbers you are averaging is 1. That is fine when each number’s contribution to the final weight is equal, but that’s not the case with an MPH problem. You are spending 2x as much time as the lower speed as the higher speed! This pulls the average speed over the whole trip towards the lower speed. So you get a true average speed of 66.67, not the 75 that your gut gave you.

I want to pause here because you are probably a bit annoyed about this discovery. Don’t be. You have already won half the battle by realizing there is this other type of mean with the weird name “harmonic”.

The other half of the battle is knowing when to apply it. This is trickier. It relies on whether you care about the numerator or denominator of any number. And since every number has a numerator or denominator it feels like you might always want to ask if you should be using the harmonic mean.

I’ll give you a hint that will cover most practical cases. If you are presented with a whole number that is a multiple, but the thing you actually care about is a yield or rate then you should use the harmonic mean. That means you convert to the yield or rate first, find the arithmetic average which is muscle memory for you already, and then convert back to the original units.


  • When you compute the average speed for an entire trip you actually want to average hours per mile (a rate) rather than the rate expressed as a multiple (mph) before converting back to mph. Again, this is because your periods of time at each speed are not equal.
  • You can’t average P/E ratios when trying to get the average P/E for an entire portfolio. Why? Because the contribution of high P/E stocks to the average of the entire portfolio P/E is lower than for lower P/E stocks. If you average P/Es, you will systematically overestimate the portfolio’s total P/E! You need to do the math in earnings yield space (ie E/P). @econompic wrote a great post about this and it’s why I went down the harmonic mean rabbit hole in the first place:

    The Case for the Harmonic Mean P/E Calculation (3 min read)

  • Consider this example of when MPG is misleading and you actually want to think of GPM. From Percents Are Tricky:

    Which saves more fuel?

    1. Swapping a 25 mpg car for one that gets 60 mpg
    2. Swapping a 10 mpg car for one that gets 20 mpg

    [Jeopardy music…]

    You know it’s a trap, so the answer must be #2. Here’s why:

    If you travel 1,000 miles:

    1. A 25mpg car uses 40 gallons. The 60 mpg vehicle uses 16.7 gallons.
    2. A 10 mpg car uses 100 gallons. The 20 mpg vehicle uses 50 gallons

    Even though you improved the MPG efficiency of car #1 by more than 100%, we save much more fuel by replacing less efficient cars. Go for the low-hanging fruit. The illusion suggests we should switch ratings from MPG to GPM or to avoid decimals Gallons Per 1,000 Miles.

  • The Tom Brady “deflategate” controversy also created statistical illusions based on what rate they used. You want to spot anomalies by looking at fumbles per play not plays per fumble.

    Why Those Statistics About The Patriots’ Fumbles Are Mostly Junk (14 min read)

The most important takeaway is that whenever you are trying to average a rate, yield, or multiple consider

a) taking the average of the numbers you are presented with


b) doing the same computation with their reciprocals then flipping it back to the original units. That’s all it takes to compute both the arithmetic mean and the harmonic mean.

If you draw the same conclusions about the variable you care about, you’re in the clear.

Just knowing about harmonic means will put you on guard against making poor inferences from data.

For a more comprehensive but still accessible discussion of harmonic means see:

On Average, You’re Using the Wrong Average: Geometric & Harmonic Means in Data Analysis: When the Mean Doesn’t Mean What You Think it Means (20 min read)
by @dnlmc

This post is so good, that I’m not sure if I should have just linked to it and not bothered writing my own. You tell me if I was additive.

Greeks Are Everywhere

The option greeks everyone starts with are delta and gamma. Delta is the sensitivity of the option price with respect to changes in the underlying. Gamma is the change in that delta with respect to changes in the underlying.

If you have a call option that is 25% out-of-the-money (OTM) and the stock doubles in value, you would observe the option graduating from a low delta (when the option is 25% OTM a 1% change in the stock isn’t going to affect the option much) to having a delta near 100%. Then it moves dollar for dollar with the stock.

If the option’s delta changed from approximately 0 to 100% then gamma is self-evident. The option delta (not just the option price) changed as the stock rallied. Sometimes we can even compute a delta without the help of an option model by reasoning about it from the definition of “delta”. Consider this example from Lessons From The .50 Delta Option where we establish that delta is best thought of as a hedge ratio 1:

Stock is trading for $1. It’s a biotech and tomorrow there is a ruling:

  • 90% of the time the stock goes to zero
  • 10% of the time the stock goes to $10

First take note, the stock is correctly priced at $1 based on expected value (.90 x $0 + .10 x $10). So here are my questions.

What is the $5 call worth?

  • Back to expected value:90% of the time the call expires worthless.10% of the time the call is worth $5

.9 x $0 + .10 x $5 = $.50

The call is worth $.50

Now, what is the delta of the $5 call?

$5 strike call =$.50

Delta = (change in option price) / (change in stock price)

  • In the down case, the call goes from $.50 to zero as the stock goes from $1 to zero.Delta = $.50 / $1.00 = .50
  • In the up case, the call goes from $.50 to $5 while the stock goes from $1 to $10Delta = $4.50 / $9.00 = .50

The call has a .50 delta

Using The Delta As a Hedge Ratio

Let’s suppose you sell the $5 call to a punter for $.50 and to hedge you buy 50 shares of stock. Each option contract corresponds to a 100 share deliverable.

  • Down scenario P/L:Short Call P/L = $.50 x 100 = $50Long Stock P/L = -$1.00 x 50 = -$50

    Total P/L = $0

  • Up scenario P/L:Short Call P/L = -$4.50 x 100 = -$450Long Stock P/L = $9.00 x 50 = $450

    Total P/L = $0

Eureka, it works! If you hedge your option position on a .50 delta your p/l in both cases is zero.

But if you recall, the probability of the $5 call finishing in the money was just 10%. It’s worth restating. In this binary example, the 400% OTM call has a 50% delta despite only having a 10% chance of finishing in the money.

The Concept of Delta Is Not Limited To Options


Futures have deltas too. If the SPX cash index increases by 1%, the SP500 futures go up 1%. They have a delta of 100%.

But let’s look closer.

The fair value of a future is given by:

Future = Seʳᵗ


S = stock price

r = interest rate

t = time to expiry in years

This formula comes straight from arbitrage pricing theory. If the cash index is trading for $100 and 1-year interest rates are 5% then the future must trade for $105.13

100e^(5% * 1) = $105.13

What if it traded for $103?

  • Then you buy the future, short the cash index at $100
  • Earn $5.13 interest on the $100 you collect when you short the stocks in the index.
  • For simplicity imagine the index doesn’t move all year. It doesn’t matter if it did move since your market risk is hedged — you are short the index in the cash market and long the index via futures.
  • At expiration, your short stock position washes with the expiring future which will have decayed to par with the index or $100.
  • [Warning: don’t trade this at home. I’m handwaving details. Operationally, the pricing is more intricate but conceptually it works just like this.]
  • P/L computation:You lost $3 on your futures position (bought for $103 and sold at $100).
    You broke even on the cash index (shorted and bought for $100)
    You earned $5.13 in interest

    Net P/L: $2.13 of riskless profit!

You can walk through the example of selling an overpriced future and buying the cash index. The point is to recognize that the future must be priced as Seʳᵗ to ensure no arbitrage. That’s the definition of fair value.

You may have noticed that a future must have several greeks. Let’s list them:

  • Theta: the future decays as time passes. If it was a 1-day future it would only incorporate a single day’s interest in its fair value. In our example, the future was $103 and decayed to $100 over the course of the year as the index was unchanged. The daily theta is exactly worth 1 day’s interest.
  • Rho: The future’s fair value changes with interest rates. If the rate was 6% the future would be worth $106.18. So the future has $1.05 of sensitivity per 100 bps change in rates.
  • Delta: Yes the future even has a delta with respect to the underlying! Imagine the index doubled from $100 to $200. The new future fair value assuming 5% interest rates would be $210.25.Invoking “rise over run” from middle school:delta = change in future / change in index
    delta = (210.25 – 105.13)/ (200 – 100)
    delta = 105%

    That holds for small moves too. If the index increases by 1%, the future increases by 1.05%

  • Gamma: 0. There is no gamma. The delta doesn’t change as the stock moves.

Levered ETFs

Levered and inverse ETFs have both delta and gamma! My latest post dives into how we compute them.

✍️The Gamma Of Levered ETFs (8 min read)

This is an evergreen reference that includes:

  • the mechanics of levered ETFs
  • a simple and elegant expression for their gamma
  • an explanation of the asymmetry between long and short ETFs
  • insight into why shorting is especially difficult
  • the application of gamma to real-world trading strategies
  • a warning about levered ETFs
  • an appendix that shows how to use deltas to combine related instruments

And here’s some extra fun since I mentioned the challenge of short positions:


Bonds have delta and gamma. They are called “duration” and “convexity”. The duration is the sensitivity to the bond price with respect to interest rates. Borrowing from my older post Where Does Convexity Come From?:

Consider the present value of a note with the following terms:

Face value: $1000
Coupon: 5%
Schedule: Semi-Annual
Maturity: 10 years

Suppose you buy the bond when prevailing interest rates are 5%. If interest rates go to 0, you will make a 68% return. If interest rates blow out to 10% you will only lose 32%.

It turns out then as interest rates fall, you actually make money at an increasing rate. As rates rise, you lose money at a decreasing rate. So again, your delta with respect to interest rate changes. In bond world, the equivalent of delta is duration. It’s the answer to the question “how much does my bond change in value for a 1% change in rates?”

So where does the curvature in bond payoff come from? The fact that the bond duration changes as interest rates change. This is reminiscent of how the option call delta changed as the stock price rallied.

The red line shows the bond duration when yields are 10%. But as interest rates fall we can see the bond duration increases, making the bonds even more sensitive to rates decline. The payoff curvature is a product of your position becoming increasingly sensitive to rates. Again, contrast with stocks where your position sensitivity to the price stays constant.


Companies have all kinds of greeks. A company at the seed stage is pure optionality. Its value is pure extrinsic premium to its assets (or book value). In fact, you can think of any corporation as the premium of the zero strike call.

[See a fuller discussion of the Merton model on Lily’s Substack which is a must-follow. We talk about similar stuff but she’s a genius and I’m just old.]

Oil drillers are an easy example. If a driller can pull oil out of the ground at a cost of $50 a barrel but oil is trading for $25 it has the option to not drill. The company has theta in the form of cash burn but it still has value because oil could shoot higher than $50 one day. The oil company’s profits will be highly levered to the oil price. With oil bouncing around $20-$30 the stock has a small delta, if oil is $75, the stock will have a high delta. This implies the presence of gamma since the delta is changing.


One of the reasons I like boardgames is they are filled with greeks. There are underlying economic or mathematical sensitivities that are obscured by a theme. Chess has a thin veneer of a war theme stretched over its abstraction. Other games like Settlers of Catan or Bohnanza (a trading game hiding under a bean farming theme) have more pronounced stories but as with any game, when you sit down you are trying to reduce the game to its hidden abstractions and mechanics.

The objective is to use the least resources (whether those are turns/actions, physical resources, money, etc) to maximize the value of your decisions. Mapping those values to a strategy to satisfy the win conditions is similar to investing or building a successful business as an entrepreneur. You allocate constrained resources to generate the highest return, best-risk adjusted return, smallest loss…whatever your objective is.

Games have mine a variety of mechanics (awesome list here) just as there are many types of business models. Both game mechanics and business models ebb and flow in popularity. With games, it’s often just chasing the fashion of a recent hit that has captivated the nerds. With businesses, the popularity of models will oscillate (or be born) in the context of new technology or legal environments.

In both business and games, you are constructing mental accounting frameworks to understand how a dollar or point flows through the system. On the surface, Monopoly is about real estate, but un-skinned it’s a dice game with expected values that derive from probabilities of landing on certain spaces times the payoffs associated with the spaces. The highest value properties in this accounting system are the orange properties (ie Tennessee Ave) and red properties (ie Kentucky). Why? Because the jail space is a sink in an “attractor landscape” while the rents are high enough to kneecap opponents. Throw in cards like “advance to nearest utility”, “advance to St. Charles Place”, and “Illinois Ave” and the chance to land on those spaces over the course of a game more than offsets the Boardwalk haymaker even with the Boardwalk card in the deck.

In deck-building games like Dominion, you are reducing the problem to “create a high-velocity deck of synergistic combos”. Until you recognize this, the opponent who burns their single coin cards looks like a kamikaze pilot. But as the game progresses, the compounding effects of the short, efficient deck creates runaway value. You will give up before the game is over, eager to start again with X-ray vision to see through the theme and into the underlying greeks.

[If the link between games and business raises an antenna, you have to listen to Reid Hoffman explain it to Tyler Cowen!]

Wrapping Up

Option greeks are just an instance of a wider concept — sensitivity to one variable as we hold the rest constant. Being tuned to estimating greeks in business and life is a useful lens for comprehending “how does this work?”. Armed with that knowledge, you can create dashboards that measure the KPIs in whatever you care about, reason about multi-order effects, and serve the ultimate purpose — make better decisions.

The Gamma Of Levered ETFs

Levered ETFs use derivatives to amplify the return of an underlying index. Here’s a list of 2x levered ETFs. For example, QLD gives you 2x the return of QQQ (Nasdaq 100). Levered ETFs use derivatives to get the levered exposure. In this post, we will compute the delta and gamma of levered ETFs and what that means for investors and traders.

Levered ETF Delta

In options, delta is the sensitivity of the option premium to a change in the underlying stock. If you own a 50% delta call and the stock price goes up by $1, you make $.50. If the stock went down $1, you lost $.50. Delta, generally speaking, is a rate of change of p/l with respect to how some asset moves. I like to say it’s the slope of your p/l based on how the reference asset changes.

For levered ETFs, the delta is simply the leverage factor. If you buy QLD, the 2x version of QQQ, you get 2x the return of QQQ. So if QQQ is up 1%, you earn 2%. If QQQ is down 1%, you lose 2%. If you invest $1,000 in QLD your p/l acts as if you had invested $2,000.

$100 worth of QLD is the equivalent exposure of $200 of QQQ.

Your dollar delta is $200 with respect to QQQ. If QQQ goes up 1%, you make 1% * $200 QQQ deltas = $2

The extra exposure cuts both ways. On down days, you will lose 2x what the underlying QQQ index returns.

The takeaway is that your position or delta is 2x the underlying exposure.

Dollar delta of levered ETF = Exposure x Leverage Factor

In this case, QLD dollar delta is $200 ($100 x 2).

Note that QLD is a derivative with a QQQ underlyer.

Levered ETF Gamma

QLD is a derivative because it “derives” its value from QQQ. $100 exposure to QLD represents a $200 exposure to QQQ. In practice, the ETF’s manager offers this levered exposure by engaging in a swap with a bank that guarantees the ETF’s assets will return the underlying index times the leverage factor. For the bank to offer such a swap, it must be able to manufacture that return in its own portfolio. So in the case of QLD, the bank simply buys 2x notional the NAV of QLD so that its delta or slope of p/l matches the ETFs promise.

So if the ETF has a NAV of $1B, the bank must maintain exposure of $2B QQQ deltas. That way, if QQQ goes up 10%, the bank makes $200mm which it contributes to the ETF’s assets so the new NAV would be $1.2B.

Notice what happened:

  • QQQ rallied 10% (the reference index)
  • QLD rallies 20% (the levered ETF’s NAV goes from$1B –> $1.2B)
  • The bank’s initial QQQ delta of $2B has increased to $2.2B.

Uh oh.

To continue delivering 2x returns, the bank’s delta needs to be 2x the ETF’s assets or $2.4B, but it’s only $2.2B! The bank must buy $200M worth of QQQ deltas (either via QQQs, Nasdaq futures, or the basket of stocks).

If we recall from options, gamma is the change in delta due to a change in stock price. The bank’s delta went from 2 (ie $2B/$1B) to 1.833 ($2.2B/$1.2B). So it got shorter deltas, in a rising market –> negative gamma!

The bank must dynamically rebalance its delta each day to maintain a delta of 2x the ETF’s assets. And the adjustment means it must buy deltas at the close of an up day in the market or sell deltas at the close of a down day. Levered ETFs, therefore, amplify price moves. The larger the daily move, the larger the rebalancing trades need to be!

I’ve covered this before in Levered ETF/ETN tool, where I give you this spreadsheet to compute the rebalancing trades:

From Brute Force To Symbols

There was confusion on Twitter about how levered ETFs worked recently and professor @quantian stepped up:

Junior PM interview question: An X-times leveraged fund tracks an underlying asset S. After time T, S have moved to ST = (1+dS)S0. The initial delta is of course X. What is the portfolio gamma, defined as (dDelta)/(dS), as a function of X?

Despite correctly understanding how levered and inverse ETFs work I struggled to answer this question with a general solution (ie convert the computations we brute-forced above into math symbols). It turns out the solution is a short expression and worth deriving to find an elegant insight.

@quantian responded to my difficulty with the derivation.

I’ll walk you through that slowly.
Mapping variables to @quantian’s question:
  • NAV =1

You are investing in a levered ETF that starts with a NAV of 1

  • X = The leverage factor

The bank needs to have a delta of X to deliver the levered exposure. For a 2x ETF, the bank’s initial delta will be 2 * NAV = 2

  • S = the underlying reference index

The dynamic:

  • When S moves, the bank’s delta will no longer be exactly X times the NAV. Its delta changed as S changed. That’s the definition of gamma.
  • When S moves, the bank needs to rebalance (buy or sell) units of S to maintain the desired delta of X. The rebalancing amount is therefore the change in delta or gamma.

Let’s find the general formula for the gamma (ie change in delta) in terms of X. Remember X is the leverage factor and therefore the bank’s desired delta.

The general formula for the gamma as a function of the change in the underlying index is, therefore:
X (X – 1)
where X = leverage factor

There are 2 key insights when we look at this elegant expression:

  1. The gamma, or imbalance in delta due to the move, is proportional to the square of the leverage factor. The more levered the ETF, the larger the delta adjustment required. If there was no leverage (like SPY to the SPX index), the gamma is 0 because 0 (0-1) = 0

  2. The asymmetry of inverse ETFs — they require larger rebalances for the same size move! Imagine a simple inverse ETF with no leverage.

-1 (-1 – 1) = 2

A simple inverse ETF, has the same gamma as a double long ETF.

Consider how a double short ETF has a gamma of 6!:

-2 (-2 -1) = 6 

When I admit that I had only figured out the rebalancing quantities by working out the mechanics by brute force in Excel, @quantian had a neat observation:

I originally found this by doing the brute force Excel approach! Then I plotted it and was like “hm, that’s just a parabola, I bet I could simplify this”

X2– X shows us that the gamma of an inverse ETF is equivalent to the gamma of its counterpart long of one degree higher. For example, a triple-short ETF has the same gamma as a 4x long. Or a simple inverse ETF has the gamma of a double long. The fact that a 1x inverse ETF has gamma at all is a clue to the difficulty of running a short book…when you win, your position size shrinks and the effect is compounded by the fact that your position is shrinking even faster relative to your growing AUM as your shorts profit!

I’ve explained this asymmetry before in The difficulty with shorting and inverse positions as well as the asymmetry of redemptions:

  • As the reference asset rallies, position size gets bigger and AUM drops due to losses. As reference asset falls, position size shrinks while AUM increase due to profits.
  • Redemptions can stabilize rebalance requirements in declines and exacerbate rebalance quantities in rallies as redemptions reduce shares outstanding and in turn AUM while in both cases triggering the fund’s need to buy the reference asset which again is stabilizing after declines but not after rallies. In other words, profit-taking is stabilizing while puking is de-stabilizing.

Rebalancing In Real Life

The amount of the rebalance from our derivation is:

X(1 + X ΔS) – X (1+ ΔS)


X = leverage factor

ΔS = percent change in underlying index

Another way to write that is:

X (X-1) (ΔS)

In our example, 2 * (2-1) * 10% = $.2 or an imbalance of 20% of the original NAV!

In practice, the size of the rebalance trade is of practical use. If an index is up or down a lot as you approach the end of a trading day then you can expect flows that exacerbate the move as levered ETFs must buy on up days and sell on down days to rebalance. It doesn’t matter if the ETF is long or inverse, the imbalance is always destabilizing in that it trades in the same direction as the move. The size of flows depends on how much AUM levered ETFs are holding but they can possibly be mitigated by profit-taking redemptions.

During the GFC, levered financial ETFs had large rebalance trades amidst all the volatility in bank stocks. Estimating, frontrunning, and trading against the rebalance to close was a popular game for traders who understood this dynamic. Years later levered mining ETFs saw similar behavior as precious metals came in focus in the aftermath of GFC stimulus. Levered energy ETFs, both in oil and natural gas, have ebbed and flowed in popularity. When they are in vogue, you can try to estimate the closing buy/sell imbalances that accompany highly volatile days.

Warning Label

Levered ETFs are trading tools that are not suitable for investing. They do a good job of matching the levered return of an underlying index intraday. The sum of all the negative gamma trading is expensive as the mechanical re-balancing gets front-run and “arbed” by traders. This creates significant drag on the levered ETF’s assets. In fact, if the borrowing costs to short levered ETFs were not punitive, a popular strategy would be to short both the long and short versions of the same ETF, allowing the neutral arbitrageur to harvest both the expense ratios and negative gamma costs from tracking the index!

ETFs such as USO or VXX which hold futures are famous for bleeding over time. That blood comes from periods when the underlying futures term structure is in contango and the corresponding negative “roll” returns (Campbell has a timeless paper decomposing spot and roll returns titled Deconstructing Futures Returns: The Role of Roll Yield). This is a separate issue from the negative gamma effect of levered or inverse ETFs.

Some ETFs combine all the misery into one simple ticker. SCO is a 2x-levered, inverse ETF referencing oil futures. These do not belong in buy-and-hold portfolios. Meth heads only please.

[The amount of variance drag that comes from levered ETFs depends on the path which makes the options especially tricky. I don’t explain how to price options on levered ETFs but this post is a clue to complication — Path: How Compounding Alters Return Distributions]

Key Takeaways

  • Levered ETFs are derivatives. Their delta changes as the underlying index moves. This change in delta is the definition of gamma. 

  • Levered and inverse ETFs have “negative gamma” in that they must always rebalance in a destabilizing manner — in the direction of the underlying move.
  • The required rebalance in terms of the fund’s NAV is:

X (X-1) (ΔS)

  • The size of the rebalance is proportional to the square of the leverage factor. The higher the leverage factor the larger the rebalance. For a given leverage factor, inverse ETFs have larger gammas. 

  • The drag that comes from levered ETFs means they will fail to track the desired exposure on long horizons. They are better suited to trading or short-term risk management.

Appendix: Using Delta To Summarize Exposures

We can see that delta is not limited to options, but is a useful way to denote exposures in derivatives generally. It allows you to sum deltas that reference the same underlying to compute a net exposure to that underlying.

Consider a portfolio:

  • Short 2000 shares of QQQ
  • Long 1000 shares of QLD
  • Long 50 1 month 53% delta calls

By transforming exposures into deltas then collapsing them into a single number we can answer the question, “what’s my p/l if QQQ goes up 1%?”

We want to know the slope of our portfolio vis a vis QQQ.

A few observations:

  • I computed net returns for the portfolio based on the gross (absolute value of exposures)
  • The option exposure is just the premium, but what we really care about is the delta coming from the options. Even though the total premium is <$37k, the largest delta is coming from the options position.

Moontower on Gamma

The first option greek people learn after delta is gamma. Recall that delta represents how much an option’s price changes with respect to share price. That makes it a convenient hedge ratio. It tells you the share equivalent position of your option position. So if an option has a .50 delta, its price changes by $.50 for a $1.00 change in the stock price. Calls have positive deltas and puts have negative deltas (ie puts go down in value as the stock price increases). If you are long a .50 delta call option and want to be hedged, you must be short 50 shares of the stock (options refer to 100 shares of underlying stock). For small moves in the stock, your call and share position p/l’s will offset because you are “delta neutral”.

This is true for small moves only. “Small” is a bit wishy-washy because small depends on volatility and this post is staying away from that much complexity. Instead, we want to focus on how your delta changes as the stock moves. This is vital because if our option delta changes then your equivalent share position changes. If your position size changes, then that same $1 move in the stock leads means your p/l changes are not constant for every $1 change. If I’m long 50 shares of a stock, I make the same amount of money for each $1 change. But if I’m long 50 shares equivalent by owning a .50 delta option, then as the stock increases my delta increases as the option becomes more in-the-money. That means the next $1 change in the stock, produces $60 of p/l instead of just $50. We know that deep in-the-money options have a 1.00 delta meaning they act just like the stock (imagine a 10 strike call expiring tomorrow when the stock is trading for $40. The option price and stock price will move perfectly in lockstep. The option has 100% sensitivity to the change).

A call option can go from .50 delta to 1.00 delta. Gamma is the change in delta for the change in stock. Suppose you own a .50 delta call and the stock goes up by $1. The call is solidly in-the-money and perhaps its new delta is .60. That change in delta from .50 to .60 for a $1 move is known as gamma. In this case, we say the option has .10 gamma per $1. So if the stock goes up $1, the delta goes up by .10.

While this is mechanically straightforward, some of the lingo around gamma is confusing. People spout phrases like “a squared term”, “curvature”, “convexity”. I’ve written about what convexity is and isn’t because I’ve seen it trip up people who should know better. See Where Does Convexity Come From?. In this post, we will demystify the relationship of these words to “gamma”. In the process, you will deeply improve your understanding of options’ non-linear nature.

How the post is laid out:


  • Acceleration
  • The squared aspect of gamma
  • Dollar gamma


  • Constant gamma
  • Strikeless products
  • How gamma scales with price and volatility
  • Gamma weighting relative value trades



You already understand “curvature”. I’ll prove it to you.

You wake up tomorrow morning and see a bizarre invention in your driveway. An automobile with an unrivaled top speed.  You take it on an abandoned road to test it out. Weirdly, it accelerates slowly for a racecar. Conveniently for me, it makes the charts I’m about to show you easy to read.

You are traveling at 60 mph.

Imagine 2 scenarios:

  1. You maintain that constant speed.
  2. You accelerate such that after 1 minute you are now traveling at 80 mph. Assume your acceleration is smooth. That means over the 60 seconds it takes to reach 80 mph, your speed increases equally every second. So after 3 seconds, you are traveling 61 mph, at 6 seconds you are moving 62 mph. Eventually at 60 seconds, you are traveling 80 mph.


In the acceleration case, what was your average speed or velocity during that minute?

Since the acceleration was smooth, the answer is 70 mph.

How far did you travel in each case?

Constant velocity:

Accelerate at 20mph per minute:

If the acceleration is smooth, we can take the average velocity over the duration and multiply it by the duration to compute the distance traveled.

Let’s now continue accelerating this supercar by a marginal 20mph rate for the next 15 minutes and see how far we travel. Compare this to a vehicle that maintains 60 mph for the whole trip. The table uses the same logic — the average speed for the last minute assumes a constant acceleration rate.

Let’s zoom in on the cumulative distance traveled at each minute:

We found it! Curvature.

Curvature is the adjustment to the linear estimate of distance traveled that we would have presumed if we assumed our initial speed was constant. Let’s map this analogy to options.

  • Time –> stock price

    How much time has elapsed from T₀ maps to “how far has the stock moved from our entry?”

  • Velocity –> delta

    Delta is the instantaneous slope of the p/l with respect to stock price, just as velocity is the instantaneous speed of the car.

  • Acceleration –> gamma

    Acceleration is the change in our velocity just as gamma is the change in delta.

  • Cumulative distance traveled –> cumulative p/l

    Distance = velocity x time. Since the velocity changes, multiply the average velocity by time. In this case, we can double-check our answer by looking at the table. We traveled 52.5 miles in 15 minutes or 210 mph on average. That corresponds to our speed at the midpoint of the journey — minute 8 out of 15.
    P/l = average position size x change in stock price. Just as our speed was changing, our position size was changing!

Delta is the slope of your p/l. That’s how I think about position sizes. Convexity is non-linear p/l that results from your position size varying. Gamma mechanically alters your position size as the stock moves around.

The calculus that people associate with options is simply the continuous expression of these same ideas. We just worked through them step-wise, minute by minute taking discrete averages for discrete periods.

Intuition For the Squared Aspect Of Gamma

Delta is familiar to everyone because it exists in all linear instruments. A stock is a linear instrument. If you own 100 shares and it goes up $1, you make $100. If it goes up $10, you make $1,000. The position size is weighted by 1.00 delta (in fact bank desks that trade ETFs and stocks without options are known as “Delta 1 desks”).  Since you just multiply by 1, the position size is the delta. If you’re long 1,000 shares of BP, I say “you’re long 1,000 BP deltas”. This allows you to combine share positions and option positions with a common language. If any of the deltas come from options that’s critical information since we know gamma will change the delta as the stock moves.

If your 1,000 BP deltas come from:

500 shares of stock


10 .50 delta calls

that’s important to know. Still, for a quick summary of your position you often just want to know your net delta just to have an idea of what your p/l will be for small moves.

If you have options, that delta will not predict your p/l accurately for larger moves. We saw that acceleration curved the total distance traveled. The longer you travel the larger the “curvature adjustment” from a linear extrapolation of the initial speed. Likewise, the gamma from options will curve your p/l from your initial net delta, and that curvature grows the further the stock moves.

If you have 1,000 BP deltas all coming from shares, estimating p/l for a $2 rally is easy — you expect to make $2,000.

What if your 1,000 BP deltas all come from options? We need to estimate a non-linear p/l because we have gamma.

Let’s take an example from the OIC calculator.

The stock is $28.35

This is the 28.5 strike call with 23 days to expiry. It’s basically at-the-money.

It has a .50 delta and .12 of gamma. Let’s accept the call value of $1.28 as fair value.

Here’s the setup:

Initial position = 20 call options.

What are your “greeks”?

    • Delta  =  1,000

      .50 x 20 contracts x 100 share multiplier

    • Gamma =  240

      .12 x 20 contracts x 100 share multiplier

(the other greeks are not in focus for this post)

The greeks describe your exposures. If you simply owned 1,000 shares of BP you know the slope of your p/l per $1 move…it’s $1,000. That slope won’t change.

But what about this option exposure? What happens if the stock increases by $1, what is your new delta and what is your p/l?

After $1 rally:

    • New delta = 1,240 deltas

      .62 x 20 contracts x 100 share multiplier

      Remember that gamma is the change in delta per $1 move. That tells us if the stock goes up $1, this call will increase .12 deltas, taking it from a .50 delta call to a .62 delta call.

That’s fun. As the stock went up, your share equivalent position went from 1,000 to 1,240.

Can you see how to compute your p/l by analogizing from the accelerating car example?

[It’s worth trying on your own before continuing]

Computing P/L When You Have Gamma 

Your initial delta is 1,000. Your terminal delta is 1,240.

(It’s ok to assume gamma is smooth over this move just as we said the acceleration was smooth for the car.)

Your average delta over the move = 1,120

1,120 x $1 = $1,120

You earned an extra $120 vs a basic share position for the same $1 move. That $120 of extra profit is curvature from a simple extrapolation of delta p/l. Since that curvature is due to gamma it’s best to decompose the p/l into a delta portion and a gamma portion.

  • The delta portion is the linear estimate of p/l = initial delta of 1,000 x $1 = $1,000
  • The gamma portion of the p/l is the same computation as the acceleration example:

Your gamma represents the change in delta over the whole move. That’s 240 deltas of change per $1. So on average, your delta was higher by 120 over the move. So we scale the gamma by the move size and divide by 2. That represents our average change in delta which we multiply by the move size to compute a “gamma p/l”.


Γ = position weighted gamma = gamma per contract  x  qty of contracts  x  100 multiplier

△S = change in stock price

We can re-write this to make the non-linearity obvious — gamma p/l is proportional to the square of the stock move!

Generalizing Gamma: Dollar Gamma

In investing, we normally don’t speak about our delta or equivalent share position. If I own 1,000 shares of a $500 stock that is very different than 1,000 shares of a $20 stock. Instead, we speak about dollar notional. Those would be $500,000 vs $20,000 respectively. Dollar notional or gross exposures are common ways to denote position size. Option and derivative traders do the same thing. Instead of just referring to their delta or share equivalent position, they refer to their “dollar delta”. It’s identical to dollar notional, but preserves the “delta” vocabulary.

It is natural to compute a “delta 1%” which describes our p/l per 1% move in the underlying.

For the BP example:

  • Initial dollar delta = delta x stock price = 1,000 x $28.35 = $28,350 dollar deltas
  • Δ1% = $28,350/100 = $283.50

    You earn $283.50 for every 1% BP goes up.

Gamma has analogous concepts. Thus far we have defined gamma in the way option models define it — change in delta per $1 move. We want to generalize gamma calculations to also deal in percentages. Let’s derive dollar gamma continuing with the BP example.

  1. Gamma 1%

    Gamma per $1 = 240

    Of course, a $1 move in BP is over 3.5% ($1/$28.35). To scale this to “gamma per 1%” we multiply the gamma by 28.35/100 which is intuitive.

    Gamma 1% = 240 * .2835 = 68.04

    So for a 1% increase in BP, your delta gets longer by 68.04 shares.

  2. Dollar gamma

    Converting gamma 1% to dollar gamma is simple. Just multiply by the share price.

    By substituting for gamma 1% from the above step, we arrive at the classic dollar gamma formula:

Let’s use BP numbers.

$Gamma = 240 * 28.35² / 100 = $1,929

The interpretation:

A 1% rally in BP, leads to an increase of 1,929 notional dollars of BP due to gamma. 

Instead of speaking of how much our delta (equivalent share position) changes, you can multiply dollar gamma by percent changes to compute changes in our dollar delta.

Generalizing Gamma P/L For Percent Changes

In this section, we will estimate gamma p/l for percent changes instead of $1 changes. Let’s look at 2 ways.

The Accelerating Car Method

The logic flows as follows (again, using the BP example):

  • If a 1% rally leads to an increase of $1,929 of BP exposure then, assuming gamma is smooth, a 3.5% rally (or $1) will lead to an increase of $6,751 of BP length because 3.5%/1% * $1,929
  • Therefore the average length over the move is $3,375 (ie .5 * $6,751) due to gamma
  • $3,375 * 3.5% = $118 (This is very close to the $120 estimate we computed with the original gamma p/l formula. This makes sense since we followed the same logci…multiply the average position size due to gamma times the move size.)

The Algebraic Method

We can adapt the original gamma p/l formula for percent changes.

We start with a simple identity. To turn a price change into a percent we simply divide by the stock price. If a $50 stock increased $1 it increased 2%

If we substitute the percent change in the stock for the change in the stock we must balance the identity by multiplying by :

We can double-check that this works with our BP example. Recall that the initial stock price is $28.35:

This also checks out with the gamma p/l we computed earlier.


Constant Gamma 

In all the explanations, we assume gamma is smooth or constant over a range of prices. This is not true in practice. Option gammas peak near the ATM strike. Gamma falls to zero as the option goes deep ITM or deep OTM. When you manage an option book, you can sum your positive or negative gammas across all your inventory to arrive at a cumulative gamma. The gamma of your net position falls as you move away from your longs and can flip negative as you approach shorts. This means gamma p/l estimates are rarely correct, because gamma calculations themselves are instantaneous. As soon as the stock moves, time passes, or vols change your gamma is growing or shrinking.

This is one of the most underappreciated aspects vol trading for novices. Vanilla options despite being called vanilla are diabolical because of path dependence. If you buy a straddle for 24% vol and vol realizes 30% there’s no guarantee you make money. If the stock makes large moves with a lot of time to expiration or when the straddle is not ATM then those moves will get multiplied by relatively low amounts of dollar gamma. If the underlying grinds to a halt as you approach expiration, especially if it’s near your long strikes, you will erode quickly with little hope of scalping your deltas.

Skew and the correlation of realized vol with spot introduce distributional effects to vol trading and may give clues to the nature of path dependence. As a trader gains more experience, they move from thinking simply in terms of comparing implied to realized vol, but trying to understand what the flows tell us about the path and distribution. The wisdom that emerges after years of trading a dynamically hedged book is that the bulk of your position-based p/l (as opposed to trading or market-making) will come from a simple observation: were you short options where the stock expired and long where it didn’t?

That’s why “it’ll never get there” is not a reason to sell options. If you hold deltas against positions, you often want to own the options where the stock ain’t going and vice versa. This starts to really sink in around year 10 of options trading.

Strikeless Products

The path-dependant nature of vanilla options makes speculating on realized vol frustrating. Variance swaps are the most popular form of “strikeless” derivatives that have emerged to give investors a way to bet on realized vols without worrying about path dependence. Conceptually, they are straightforward. If you buy a 1-year variance swap implying 22% vol, then any day that the realized vol exceeds 22% you accrue profits and vice versa(sort of1). The details are not important for our purpose, but we can use what we learned about gamma to appreciate their construction.

A derivative market cannot typically thrive if there is no replicating portfolio of vanilla products that the manufacturer of the derivative can acquire to hedge its risk. So if variance swaps exist, it must mean there is a replicating portfolio that gives a user a pure exposure to realized vol. The key insight is that the product must maintain a fairly constant gamma over a wide range of strikes to deliver that exposure. Let’s look at the dollar gamma formula once again.

We can see that gamma is proportional to the square of the stock price. While the gamma of an option depends on volatility and time to expiration, the higher the strike price the higher the “peak gamma”. Variance swaps weight a strip of options across a wide range of strikes in an attempt to maintain a smooth exposure to realized variance. Because higher strike options have a larger “peak” gamma, a common way to replicate the variance swap is to overweight lower strikes to compensate for their smaller peak gammas. The following demonstrates the smoothness of the gamma profile under different weighting schemes.

Examples of weightings:

Constant = The replicating strip holds the same number of 50 strike and 100 strike options

1/K = The replicating strip holds 2x as many 50 strike options vs 100 strike

1/K² = The replicating stripe holds 4x as many 50 strike options vs 100 strike

Note that the common 1/K² weighting means variance swap pricing is highly sensitive to skew since the hedger’s portfolio weights downside puts so heavily. This is also why the strike of variance swaps can be much higher than the ATM vol of the same term. It reflects the cost of having constant gamma even as the market sells off. That is expensive because it requires owning beefy, sought-after low delta puts.

How Gamma Scales With Price, Volatility, and Time

Having an intuition for how gamma scales is useful when projecting how your portfolio will behave as market conditions or parameters change. A great way to get a feel for this is to tinker with an option calculator.  To demonstrate the effects of time, vol, and price, we hold 2 of the 3 constant and vary the 3rd.

Assume the strike is ATM for each example.

Here are a few rules of thumb for how price, vol, and time affect gamma.

  • If one ATM price is 1/2 the other, the lower price will also have 1/2 the dollar gamma. Linear effect as the higher gamma per option is offset by the dollar gamma’s extra weight to higher-priced stocks.
  • If one ATM volatility is 1/2 the other, the dollar gamma is inversely proportional to the ratio of the vols (ie 1/vol ratio).
  • If an option has 1/2 as much time until expiration, it will have √ratio of more gamma.

Stated differently:

  • Spot prices have linearly proportional differences in gamma. The lower price has less dollar gamma.
  • Volatility has inverse proportionality in gamma. The higher vol has less dollar gamma.
  • Time is inversely proportional to gamma with a square root scaling. More time means less dollar gamma.

Gamma-weighting Relative Value Trades

As you weight relative value trades these heuristics are handy (it’s also the type of things interview test your intuition for).

Some considerations that pop out if you choose to run a gamma-neutral book?

  • Time spreads are tricky. You need to overweight the deferred months and since vega is positively proportional to root time, you will have large net vega exposures if you try to trade term structure gamma-neutral.
  • Stocks with different vols. You need to overweight the higher vol stocks to be vega-neutral, but their higher volatility comes with higher theta. Your gamma-neutral position will have an unbalanced theta profile. This will be the case for inter-asset vol spreads but also intra-asset. Think of risk reversals that have large vol differences between the strikes.
  • Overweighting lower-priced stocks to maintain gamma neutrality does not tend to create large imbalances because spot prices are positively proportional to other greeks (higher spot –> higher vega, higher dollar gamma, higher theta all else equal).

Weighting trades can be a difficult topic. How you weight trades really depends on what convergence you are betting on. If you believe vols move as a fixed spread against each other then you can vega-weight trades. If you believe vols move relative to each other (ie fixed ratio — all vols double together) then you’d prefer theta weighting.

I’ve summarized some of Colin Bennett’s discussion on weighting here. The context is dispersion, but the intuitions hold.

Finally, this is an example of a top-of-funnel tool to spot interesting surfaces. The notations on it tie in nicely with the topic of weightings. The data is ancient and besides the point.

Wrapping Up

Gamma is the first higher-order greek people are exposed to. Like most of my posts, I try to approach it intuitively. I have always felt velocity and acceleration are the easiest bridges to understanding p/l curvature. While the first half of the post is intended for a broad audience, the second half is likely to advanced for novices and too rudimentary for veterans. If it helps novices who are trying to break into the professional world, I’ll consider that a win. I should add that in Finding Vol Convexity I apply the concept of convexity to implied volatility. You can think of that as the “gamma of vega”. In other words, how does an option’s vega change as volatility changes?

I realize I wrote that post which is more advanced than this one in the wrong order. Shrug.

Can Your Manager Solve Betting Games With Known Solutions?

One of the best threads I’ve seen in a while. It’s important because it shows how betting strategies vary based on your goals.

In the basic version, the “Devil’s Card Game” is constrained by the rule that you must bet your entire stack each time.

You can maximize:

  1. expectation
  2. utility (in the real world Kelly sizing is the instance of this when utility follows a log function)
  3. the chance of a particular outcome.

At the end of the thread, we relax the bet sizing rules and allow the player to bet any fraction of the bankroll they’d like. This is a key change.

It leads to a very interesting strategy called backward induction. In markets, the payoffs are not well-defined. But this game features a memory because it is a card game without replacement. Like blackjack. You can count the possibilities.

The thread shows how the backward induction strategy blows every other strategy out of the water.

If we generalize this, you come upon a provocative and possibly jarring insight:

The range of expectations simply based on betting strategies is extremely wide.

That means a good proposition can be ruined by an incompetent bettor. Likewise, a poor proposition can be somewhat salvaged by astute betting.

I leave you with musings.

  1. Is it better to pair a skilled gambler with a solid analyst or the best analyst with a mid-brow portfolio manager?
  2. How confident are you that the people who manage your money would pick the right betting strategy for a game with a known solution?Maybe allocators and portfolio managers should have to take gambling tests. If analytic superiority is a source of edge, the lack of it is not simply an absence of one type of edge. It’s actually damning because it nullifies any other edge over enough trials assuming markets are competitive (last I checked that was their defining feature).

If You Make Money Every Day, You’re Not Maximizing

If You Make Money Every Day, You’re Not Maximizing

This is an expression I heard early in my trading days. In this post, we will use arithmetic to show what it means in a trading context, specifically the concept of hedging.

I didn’t come to fully appreciate its meaning until about 5 years into my career. Let’s start with a story. It’s not critical to the technical discussion, so if you are a robot feel free to beep boop ahead.

The Belly Of The Trading Beast

Way back in 2004, I spent time on the NYSE as a specialist in about 20 ETFs. A mix of iShares and a relatively new name called FEZ, the Eurostoxx 50 ETF. I remember the spreadsheet and pricing model to estimate a real-time NAV for that thing, especially once Europe was closed, was a beast. I also happened to have an amazing trading assistant that understood the pricing and trading strategy for all the ETFs assigned to our post. By then, I had spent nearly 18 months on the NYSE and wanted to get back into options where I started.

I took a chance.

I let my manager who ran the NYSE floor for SIG know that I thought my assistant should be promoted to trader. Since I was the only ETF post on the NYSE for SIG, I was sort of risking my job. But my assistant was great and hadn’t come up through the formal “get-hired-out-of-college-spend-3-months-in-Bala” bootcamp track. SIG was a bit of a caste system that way. It was possible to crossover from external hire to the hallowed trader track, but it was hard. My assistant deserved a chance and I could at least advocate for the promotion.

This would leave me in purgatory. But only briefly. Managers talk. Another manager heard I was looking for a fresh opportunity from my current manager. He asked me if I want to co-start a new initiative. We were going to the NYMEX to trade futures options. SIG had tried and failed to break into those markets twice previously but could not gain traction. The expectations were low. “Go over there, try not to lose too much money, and see what we can learn. We’ll still pay you what you would have expected on the NYSE”.

This was a lay-up. A low-risk opportunity to start a business and learn a new market. And get back to options trading. We grabbed a couple clerks, passed our membership exams, and took inventory of our new surroundings.

This was a different world. Unlike the AMEX, which was a specialist system, the NYMEX was open outcry. Traders here were more aggressive and dare I say a bit more blue-collar (appearances were a bit deceiving to my 26-year-old eyes, there was a wide range of diversity hiding behind those badges and trading smocks. Trading floors are a microcosm of society. So many backstories. Soft-spoken geniuses were shoulder-to-shoulder with MMA fighters, ex-pro athletes, literal gangsters or gunrunners, kids with rich daddies, kids without daddies). We could see how breaking in was going to be a challenge. These markets were still not electronic. Half the pit was still using paper trading sheets. You’d hedge deltas by hand-signaling buys and sells to the giant futures ring where the “point” clerk taking your order was also taking orders from the competitors standing next to you. He’s been having beers with these other guys for years. Gee, I wonder where my order is gonna stand in the queue?

I could see this was going to be about a lot more than option math. This place was 10 years behind the AMEX’s equity option pits. But our timing was fortuitous. The commodity “super-cycle” was still just beginning. Within months, the futures would migrate to Globex leveling the field. Volumes were growing and we adopted a solid option software from a former market-maker in its early years (it was so early I remember helping their founder correct the weighted gamma calculation when I noticed my p/l attribution didn’t line up to my alleged Greeks).

We split the duties. I would build the oil options business and my co-founder who was more senior would tackle natural gas options (the reason I ever got into natural gas was because my non-compete precluded me from trading oil after I left SIG). Futures options have significant differences from equity options. For starters, every month has its own underlyers, breaking the arbitrage relationships in calendar spreads you learn in basic training. During the first few months of trading oil options, I took small risks, allowing myself time to translate familiar concepts to this new universe. After 6 months, my business had roughly broken even and my partner was doing well in gas options. More importantly, we were breaking into the markets and getting recognition on trades.

[More on recognition: if a broker offers 500 contracts, and 50 people yell “buy em”, the broker divvies up the contracts as they see fit. Perhaps his bestie gets 100 and the remaining 400 get filled according to some mix of favoritism and fairness. If the “new guy” was fast and loud in a difficult-to-ignore way, there is a measure of group-enforced justice that ensures they will get allocations. As you make friends and build trust by not flaking on trades and taking your share of losers, you find honorable mates with clout who advocate for you. Slowly your status builds, recognition improves, and the system mostly self-regulates.]

More comfortable with my new surroundings, I started snooping around. Adjacent to the oil options pit was a quirky little ring for product options — heating oil and gasoline. There was an extremely colorful cast of characters in this quieter corner of the floor. I looked up the volumes for these products and saw they were tiny compared to the oil options but they were correlated (gasoline and heating oil or diesel are of course refined from crude oil. The demand for oil is mostly derivative of the demand for its refined products. Heating oil was also a proxy for jet fuel and bunker oil even though those markets also specifically exist in the OTC markets). If I learned anything from clerking in the BTK index options pit on the Amex, it’s that sleepy pits keep a low-profile for a reason.

I decided it was worth a closer look. We brought a younger options trader from the AMEX to take my spot in crude oil options (this person ended up becoming a brother and business partner for my whole career. I repeatedly say people are everything. He’s one of the reasons why). As I helped him get up to speed on the NYMEX, I myself was getting schooled in the product options. This was an opaque market, with strange vol surface behavior, flows and seasonality. The traders were cagey and clever. When brokers who normally didn’t have business in the product options would catch the occasional gasoline order and have to approach this pit, you could see the look in their eyes. “Please take it easy on me”.

My instincts turned out correct. There was edge in this pit. It was a bit of a Rubik’s cube, complicated by the capital structure of the players. There were several tiny “locals” and a couple of whales who to my utter shock were trading their own money. One of the guys, a cult legend from the floor, would not shy away from 7 figure theta bills. Standing next to these guys every day, absorbing the lessons in their banter, and eventually becoming their friends (one of them was my first backer when I left SIG) was a humbling education that complemented my training and experience. It illuminated approaches that would have been harder to access in the monoculture I was in (this is no shade on SIG in any way, they are THE model for how to turn people into traders, but markets offer many lessons and nobody has a monopoly on how to think).

As my understanding and confidence grew, I started to trade bigger. Within 18 months, I was running the second-largest book in the pit, a distant second to the legend, but my quotes carried significant weight in that corner of the business. The oil market was now rocking. WTI was on its way to $100/barrel for the first time, and I was seeing significant dislocations in the vol markets between oil and products1. This is where this long-winded story re-connects with the theme of this post.

How much should I hedge? We were stacking significant edge and I wanted to add as much as I could to the position. I noticed that the less capitalized players in the pit were happy to scalp their healthy profits and go home relatively flat. I was more brash back then and felt they were too short-sighted. They’d buy something I thought was worth $1.00 for $.50 and be happy to sell it out for $.70. In my language, that’s making 50 cents on a trade, to lose 30 cents on your next trade. The fact that you locked in 20 cents is irrelevant.

You need to be a pig when there’s edge because trading returns are not uniform. You can spend months breaking even, but when the sun shines you must make as much hay as possible. You don’t sleep. There’s plenty of time for that when things slow down. They always do. New competitors will show up and the current time will be referred to as “the good ole’ days”. Sure enough, that is the nature of trading. The trades people do today are done for 1/20th the edge we used to get.

I started actively trading against the pit to take them out of their risk. I was willing to sacrifice edge per trade, to take on more size (I was also playing a different game than the big guy who was more focused on the fundamentals of the gasoline market, so our strategies were not running into one another. In fact, we were able to learn from each other). The other guys in the pit were hardly meek or dumb. They simply had different risk tolerances because of how they were self-funded and self-insured. My worst case was losing my job, and that wasn’t even on the table. I was transparent and communicative about the trades I was doing. I asked for a quant to double-check what I was seeing.

This period was a visceral experience of what we learned about edge and risk management. It was the first time my emotions were interrupted. I wanted assurance that the way I was thinking about risk and hedging was correct so I could have the fortitude to do what I intellectually thought was the right play.

This post is a discussion of hedging and risk management.

Let’s begin.

What Is Hedging?

Investopedia defines a hedge:

A hedge is an investment that is made with the intention of reducing the risk of adverse price movements in an asset. Normally, a hedge consists of taking an offsetting or opposite position in a related security.

The first time I heard about “hedging”, I was seriously confused. Like if you wanted to reduce the risk of your position, why did you have it in the first place.? Couldn’t you just reduce the risk by owning less of whatever was in your portfolio? The answer lies in relativity. Whenever you take a position in a security you are placing a bet. Actually, you’re making an ensemble of bets. If you buy a giant corporation like XOM, you are also making oblique bets on GDP, the price of oil, interest rates, management skill, politics, transportation, the list goes on. Hedging allows you to fine-tune your bets by offsetting the exposures you don’t have a view on. If your view was strictly on the price of oil you could trade futures or USO instead. If your view had nothing to do with the price of oil, but something highly idiosyncratic about XOM, you could even short oil against the stock position.

Options are popular instruments for implementing hedges. But even when used to speculate, this is an instance of hedging bundled with a wager. The beauty of options is how they allow you to make extremely narrow bets about timing, the size of possible moves, and the shape of a distribution. A stock price is a blunt summary of a proposition, collapsing the expected value of changing distributions into a single number. A boring utility stock might trade for $100. Now imagine a biotech stock that is 90% to be worth 0 and 10% to be worth $1000. Both of these stocks will trade for $100, but the option prices will be vastly different 2.

If you have a differentiated opinion about a catalyst, the most efficient way to express it will be through options. They have the most urgent function to a reaction. If you think a $100 stock can move $10, but the straddle implies $5 you can make 100% on your money in a short window of time. Annualize that! Go a step further. Suppose you have an even finer view — you can handicap the direction. Now you can score a 5 or 10 bagger allocating the same capital to call options only. Conversely, if you do not have a specific view, then options can be an expensive, low-resolution solution. You pay for specificity just like parlay bets. The timing and distance of a stock’s move must collaborate to pay you off.

So options, whether used explicitly for hedging or for speculating actually conform to a more over-arching definition of hedging — hedges are trades that isolate the investor’s risk.

The Hedging Paradox

If your trades have specific views or reasons, hedging is a good idea. Just like home insurance is a good idea. Whether you are conscious of it or not, owning a home is a bundle of bets. Your home’s value depends on interest rates, the local job market, and state policy. It also depends on some pretty specific events. For example, “not having a flood”. Insurance is a specific hedge for a specific risk. In The Laws Of Trading, author and trader Agustin Lebron states rule #3:

Take the risks you are paid to take. Hedge the others.

He’s reminding you to isolate your bets so they map as closely as possible to your original reason for wanting the exposure.

You should be feeling tense right about now. “Dude, I’m not a robot with a Terminator HUD displaying every risk in my life and how hedged it is?”.

Relax. Even if you were, you couldn’t do anything about it. Even if you had the computational wherewithal to identify every unintended risk, it would be too expensive to mitigate3. Who’s going to underwrite the sun not coming up tomorrow? [Actually, come to think of it, I will. If you want to buy galactic continuity insurance ping me and I’ll send you a BTC address].

We find ourselves torn:

  1. We want to hedge the risks we are not paid to take.
  2. Hedging is a cost

What do we do?

Before getting into this I will mention something a certain, beloved group of wonky readers are thinking: “Kris, just because insurance/hedging on its own is worth less than its actuarial value, the diversification can still be accretive at the portfolio level especially if we focus on geometric not arithmetic returns…rebalancing…convexi-…”[trails off as the sound of the podcast in the background drowns out the thought]. Guys (it’s definitely guys), I know. I’m talking net of all that.

As the droplets of caveat settle the room like nerd Febreze, let’s see if we can give this conundrum a shape.

Reconciling The Paradox

This is a cornerstone of trading:

Edge scales linearly, risk scales slower

[As a pedological matter, I’m being a bit brusque. Bear with me. The principle and its demonstration are powerful, even if the details fork in practice.]

Let’s start with coin flips:

[A] You flip a coin 10 times, you expect 5 heads with a standard deviation of 1.584.

[B] You flip 100 coins you expect 50 heads with a standard deviation of 5.

Your expectancy scaled with N. 10x more flips, 10x more expected heads.

But your standard deviation (ie volatility) only grew by √10 or 3.16x.

The volatility or risk only scaled by a factor of √N while expectancy grew by N.

This is the basis of one of my most fundamental posts, Understanding Edge. Casinos and market-makers alike “took a simple idea and took it seriously”. Taking this seriously means recognizing that edges are incredibly valuable. If you find an edge, you want to make sure to get as many chances to harvest it as possible. This has 2 requirements:

  1. You need to be able to access it.
  2. You need to survive so you can show up to collect it.

The first requirement requires spotting an opportunity or class of opportunities, investing in its access, and warehousing the resultant risk. The second requirement is about managing the risk. That includes hedging and all its associated costs.

The paradox is less mystifying as the problem takes shape.

We need to take risk to make money, but we need to reduce risk to survive long enough to get to a large enough number of bets on a sliver of edge to accumulate meaningful profits. Hedging is a drawbridge from today until your capital can absorb more variance.

The Interaction of Trading Costs, Hedging, and Risk/Reward

Hedging reduces variance, in turn improving the risk/reward of a strategy. This comes at a substantial cost. Every options trader has lamented how large of line-item this cost has been over the years. Still, as the cost of survival, it is non-negotiable. We are going to hedge. So let’s pull apart the various interactions to gain intuition for the various trade-offs. Armed with the intuition, you can then fit the specifics of your own strategies into a risk management framework that aligns your objectives with the nature of your markets.

Let’s introduce a simple numerical demonstration to anchor the discussion. Hedging is a big topic subject to many details. Fortunately, we can gesture at a complex array of considerations with a toy model.

The Initial Proposition

Imagine a contract that has an expected value of $1.00 with a volatility (i.e. standard deviation) of $.80. You can buy this contract for $.96 yielding $.04 of theoretical edge.

Your bankroll is $100.

[A quick observation so more advanced readers don’t have this lingering as we proceed:

The demonstration is going to bet a fixed amount, even as the profits accumulate. At first glance, this might feel foreign. In investing we typically think of bet size as a fraction of bankroll. In fact, a setup like this lends itself to Kelly sizing5. However, in trading businesses, the risk budget is often set at the beginning of the year based on the capital available at that time. As profits pile up, contributing to available capital, risk limits and bet sizes may expand. But such changes are more discrete than continuous so if we imagine our demonstration is occurring within a single discrete interval, perhaps 6 months or 1 year, this is a reasonable approach. It also keeps this particular discussion a bit simpler without sacrificing intuition.]

The following table summarizes the metrics for various trial sizes.

What you should notice:

  • Expected value grows linearly with trial size
  • The standard deviation of p/l grows slower (√N)
  • Sharpe ratio (expectancy/standard deviation) is a measure of risk-reward. Its progression summarizes the first 2 bullets…as trials increase the risk/reward improves

Introducing Hedges

Let’s show the impact of adding a hedge to reduce risk. Let’s presume:

  • The hedge costs $.01.

    This represents 25% of your $.04 of edge per contract. Options traders and market makers like to transform all metrics into a per/contract basis. That $.01 could be made up of direct transaction costs and slippage.

    [In reality, there is a mix of drudgery, assumptions, and data analysis to get a firm handle on these normalizations. A word to the uninitiated, most of trading is not sexy stuff, but tons of little micro-decisions and iterations to create an accounting system that describes the economic reality of what is happening in the weeds. Drunkenmiller and Buffet’s splashy bets get the headlines, but the magic is in the mundane.]

  • The hedge cuts the volatility in half.

Right off the bat, you should expect the sharpe ratio to improve — you sacrificed 25% of your edge to cut 50% of the risk.

The revised table:


  • Sharpe ratio is 50% higher across the board
  • You make less money.

Let’s do one more demonstration. The “more expensive hedge scenario”. Presume:

  • The hedge costs $.02

    This now eats up 50% of your edge.

  • The hedge reduces the volatility 50%, just as the cheaper hedge did.



  • The sharpe ratio is exactly the same as the initial strategy. Both your net edge and volatility dropped by 50%, affecting the numerator and denominator equally. 

  • Again the hedge cost scales linearly with edge, so you have the same risk-reward as the unhedged strategy you just make less money.

If hedging doesn’t improve the sharpe ratio because it’s too expensive, you have found a limit. Another way it could have been expensive is if the cost of the hedge stayed fixed at $.01 but the hedge only chopped 25% of the volatility. Again, your sharpe would be unchanged from the unhedged scenario but you just make less money.

We can summarize all the results in this chart.

The Bridge

As you book profits, your capital increases. This leaves you with at least these choices:

  1. Hedge less since your growing capital is absorbing the same risk
  2. Increase bet size
  3. Increase concurrent trials

I will address #1 here, and the remaining choices in the ensuing discussion.

Say you want to hedge less. This is always a temptation. As we’ve seen, you will make money faster if you avoid hedging costs. How do we think about the trade-off between the cost of hedging and risk/reward?

We can actually target a desired risk/reward and let the target dictate if we should hedge based on the expected trial size.

Sharpe ratio is a function of trial size:


E = edge
σ = volatility
N = trials

If we target a sharpe ratio of 1.0 we can re-arrange the equation to solve for how large our trial size needs to be to achieve the target.

If our capital and preferences allow us to tolerate a sharpe of 1 and we believe we can get at least 400 trials, then we should not hedge.

Suppose we don’t expect 400 chances to do our core trade, but the hedge that costs $.01 is available. What is the minimum number of trades we can do if we can only tolerate a sharpe as low as 1?

Using the same math as above (1/.075)2 = 178

The summary table:

If our minimum risk tolerance is a 1.5 sharpe, we need more trials:

If your minimum risk tolerance is 1.5 sharpe, and you only expect to do 2 trades per business day or about 500 trades per year, then you should hedge. If you can do twice as many trades per day, it’s acceptable to not hedge.

These toy demonstrations show:

  • If you have positive expectancy, you should be trading
  • The cost of a hedge scales linearly with edge, but volatility does not
  • If the cost of a hedge is less than its proportional risk-reduction you have a choice whether to hedge or not
  • The higher your risk tolerance the less you should hedge
  • The decision to dial back the hedging depends on your risk tolerance (as proxied by a measure of risk/reward) vs your expected sample size

Variables We Haven’t Considered

The demonstrations were simple but provides a mental template to contextualize cost/benefit analysis of risk mitigation in your own strategies. We kept it basic by only focusing on 3 variables:

  • edge
  • volatility
  • risk tolerance as proxied by sharpe ratio

Let’s touch on additional variables that influence hedging decisions.


If your bankroll or capital is substantial compared to your bet size (perhaps you are betting far below Kelly or half-Kelly prescribed sizes) then it does not make sense to hedge. Hedges are negative expectancy trades that reduce risk.

We can drive this home with a sports betting example from the current March Madness tournament:

If you placed a $10 bet on St. Peters, by getting to the Sweet 16 you have already made 100x. You could lock it in by hedging all or part of it by betting against them, but the bookie vig would eat a slice of the profit. More relevant, the $1000 of equity might be meaningless compared to your assets. There’s no reason to hedge, you can sweat the risk. But what if you had bet $100 on St. Pete’s? $10,000 might quicken the ole’ pulse. Or what if you somehow happened upon a sports edge (just humor me) and thought you could put that $10k to work somewhere else instead of banking on an epic Cinderella story? If St. Pete’s odds for the remainder of the tourney are fair, then you will sacrifice expectancy by hedging or closing the trade. If you are rich, you probably just let it ride and avoid any further transaction costs.

If you are trading relatively small, your problem is that you are not taking enough risk. The reason professionals don’t take more risk when they should is not because they are shy. It’s because of the next 2 variables.

Capacity Per Trade

Many lucrative edges are niche opportunities that are difficult to access for at least 2 reasons.

  • Adverse selection

    There might only be a small amount of liquidity at dislocated prices (this is a common oversight of backtests) because of competition for edge. 

    Let’s return to the contract from the toy example. Its fair value is $1.00. Now imagine that there are related securities that getting bid up and market for our toy contract is:

 bid  ask
.95 – 1.05

10 “up” (ie there are 10 contracts on the offer and 10 contracts bid for)

Based on what’s trading “away”, you think this contract is now worth $1.10.

Let’s game this out.

You quickly determine that the .95-1.05 market is simply a market-maker’s bid-ask spread. Market-makers tend to be large firms with tentacles in every related market to the ones they quote. It’s highly unlikely that the $1.05 offer is “real”. In other words, if you tried to lift it, you would only get a small amount of size.

What’s going on?

The market-maker might be leaving a stale quote to maximize expectancy. If a real sell order were to come in and offer at $1.00, the market maker might lift the size and book $.10 of edge to the updated theoretical value. 

Of course, there’s a chance they might get lifted on their $1.05 stale offer but they might honor only a couple contracts. This is a simple expectancy problem. If 500 lots come in offered at $1.00, and they lift it, they make $5,000 profit ($.10 x 500 x option multiplier of 100). If you lift the $1.05 offer and they sell you 10 contracts, they suffer a measly $50 loss. 

So if they believe there’s a 1% chance or greater of a 500 lot naively coming in and offering at mid-market then they are correct in posting the stale quote.

What do you do?

You were smart enough to recognize the game being played. You used second-order thinking to realize the quote was purposefully stale. In a sense, you are now in cahoots with the market maker. You are both waiting for the berry to drop. The problem is your electronic “eye” will be slower than the market-maker to snipe the berry when it comes in. Still, even if you have a 10% chance of winning the race, it still makes sense to leave the quote stale, rather than turn the offer. If you do manage to get at least a partial fill on the snipe, there’s no reason to hedge. You made plenty of edge, traded relatively small size, and most importantly know your counterparty was not informed!

As a rule, liquidity is poor when trades are juiciest. The adverse selection of your fills is most common in fast-moving markets if you do not have a broad, fast view of the flows. This is why a trader’s first questions are “Do I think I’m the first to have seen this order? Did someone with a better perch to see all the flow already pass on this trade?”

In many markets, if you are not the first you might as well be last. You are being arbed because there’s a better relative trade somewhere out there that you are not seeing.

[Side note: many people think a bookie or market-maker’s job is to balance flow. That can be true for deeply liquid instruments. But for many securities out there, one side of the market is dumb and one side is real. Markets are often leaned. Tables are set when certain flows are anticipated. If a giant periodic buy order gets filled at mid-market or even near the bid, look at the history of the quote for the preceding days. Market-making is not an exercise in posting “correct” markets. It’s a for-profit enterprise.]

  • Liquidity

    The bigger you attempt to trade at edgy prices, the more information you leak into the market. You are outsizing the available liquidity by allowing competitors to reverse engineer your thinking. If a large trade happens and immediately looks profitable to bystanders, they will study the signature of how you executed it. The market learns and copies. The edge decays until you’re flipping million dollar coins for even money as a loss leader to get a look at juicier flow from brokers. 

    As edge in particular trades dwindles, the need to hedge increases. The hedges themselves can get crowded or at least turn into a race.


If a hedge, net of costs, improves the risk/reward of your position, you may entertain the use of leverage. This is especially tempting for high sharpes trades that have low absolute rates of return or edge. Market-making firms embody this approach. As registered broker-dealers they are afforded gracious leverage. Their businesses are ultimately capacity constrained and the edges are small but numerous. The leverage combined with sophisticated diversification (hedging!) creates a suitable if not impressive return on capital.

The danger with leverage is that it increases sensitivity to path and “risk of ruin”. In our toy model, we assumed a Gaussian distribution. Risk of ruin can be hard to estimate when distributions have unknowable amounts of skew or fatness in their tails. Leverage erodes your margin of error.

General Hedging Discussion

As long as hedging, again net of costs, improves your risk/reward there is substantial room for creative implementation. We can touch on a few practical examples.

Point of sale hedging vs hedging bands

In the course of market-making, the primary risk is adverse selection. Am I being picked off? If you suspect the counterparty is “delta smart” (whenever they buy calls the stock immediately rips higher), you want to hedge immediately. This is a race condition with any other market makers who might have sold the calls and the bots that react to the calls being printed on the exchange. That is known as a point-of-sale hedge is an immediate response to a suspected “wired” order.

If you instead sold calls to a random, uninformed buyer you will likely not hedge. Instead, the delta risk gets thrown on the pile of deltas (ie directional stock exposures) the firm has accumulated. Perhaps it offsets existing delta risk or adds to it. Either way, there is no urgency to hedge that particular deal.

In practice, firms use hedging bands to manage directional risk. In a similar process to our toy demonstration, market-makers decide how much directional risk they are willing to carry as a function of capital and volatility. This allows them to hedge less, incurring less costs along the way, and allowing their capital to absorb randomness. Just like the rich bettor, who lets the St. Peter’s bet ride.

In The Risk-Reversal Premium, Euan Sinclair alludes to band-based hedging:

While this example shows the clear existence of a premium in the delta-hedged risk-reversal, this implementation is far from what traders would do in practice (Sinclair, 2013). Common industry practice is to let the delta of a position fluctuate within a certain band and only re-hedge when those bands are crossed. In our case, whenever the net delta of the options either drops below 20 or above 40, the portfolio is rebalanced by closing the position and re-establishing with the options that are now closest to 15-delta in the same expiration.

Part art, part science

Hedging is a minefield of regret. It’s costly, but the wisdom of offloading risks you are not paid for and conforming to a pre-determined risk profile is a time-tested idea. Here’s a dump of concerns that come to mind:

  • If you hedge long gamma, but let short gamma ride you are letting losers grow and cutting winners short. Be consistent. If your delta tolerance is X and you hedge twice a day, you can cut all deltas in excess of X at the same 2 times every day. This will remove discretion from the decision. (I had one friend who used to hedge to flat every time he went to the bathroom. As long as he was regular this seemed reasonable to me.)

  • Low net/high gross exposures are a sign of a hedged book. There are significant correlation risks under that hood. It’s not necessarily a red flag, but when paired with leverage, this should make you nervous. 

  • Are you hedging your daily, weekly, or monthly p/l? Measures of local risk like Greeks and spot/vol correlation are less trustworthy for longer timeframes. Spot/vol correlation (ie vol beta) is not invariant to price level, move size, and move speed. Longer time frames provide larger windows for these variables to change.  If oil vol beta is -1 (ie if oil rallies 1%, ATM vol vols 1%) do I really believe that the price going from 50 to 100 cuts the vol in half?

  • There are massive benefits to scale for large traders who hedge. The more flow they interact with the more opportunity to favor anti-correlated or offsetting deltas because it saves them slippage on both sides. They turn everything they trade into a pooled delta or several pools of delta (so any tech name will be re-computed as an NDX exposure, while small-caps will be grouped as Russell exposures). This is efficient because they can accept the noise within the baskets and simply hedge each of the net SPX, NDX, IWM to flat once they reach specified thresholds.

    The second-order effect of this is subtle and recursively makes markets more efficient. The best trading firms have the scale to bid closest to the clearing price for diversifiable risk6. This in turn, allows them to grab even more market share widening their advantage over the competition. If this sounds like big tech7, you are connecting the dots. 

Wrapping Up

The other market-makers in the product options pit were not wrong to hedge or close their trades as quickly as they did. They just had different constraints. Since they were trading their own capital, they tightly managed the p/l variance.

At the same time, if you were well-capitalized and recognized the amount of edge raining down in the market at the time, the ideal play was to take down as much risk as you could and find a hedge with perhaps more basis risk (and therefore less cost because the more highly correlated hedges were bid for) or simply allow the firm’s balance sheet to absorb it.

Since I was being paid as a function of my own p/l there was not perfect alignment of incentives between me and my employer (who would have been perfectly fine with me not hedging). If I made a great bet and lost, it would have been the right play but I personally didn’t want to tolerate not getting paid.

Hedging is a cost. You need to weigh that with the benefit and that artful equation is a function of:

  • risk tolerance at every level of stakeholder — trader, manager, investor
  • capital
  • edge
  • volatility
  • liquidity
  • adverse selection

Maximizing is uncomfortable. Almost unnatural. It calls for you to tolerate larger swings, but it allows the theoretical edge to pile up faster. This post offers guardrails for dissecting a highly creative problem.

But if you consistently make money, ask yourself how much you might be leaving on the table. If you are making great trades somewhere, are you locking it in with bad trades? If you can’t tell what the good side is that’s ok.

But if you know the story of your edge, there’s a good chance you can do better.