Do Professional Investors Understand Fees?

Fees Are In Focus


Giant fund manager/brokerages like Vanguard and Fidelity have made fees front and center. Like Walmart, if you are the lowest cost provider and wield blue whale scale, you are going to compete on price. Competition has spurred a race to the bottom on fees. With many investment choices commoditized, the focus on fees has served customers well. 

If I wanted to nit-pick, I might say investors don’t fully account for more opaque fees when choosing funds. These can swamp the management fees. Turnover, slippage costs, borrowing costs and abysmal sweep account rates all have significant impacts on net performance. These hidden costs are not easily reduced to a number that can be compared to a management fee. Hint: it’s a good place to search for how managers are able to drive fees to zero. But that’s a digression. I’m not especially interested in retail. Their financial advisors are doing a good job using steak and wine to box out the fund managers. There’s only so much fee to go around.


Allocators have a more difficult job. They devote teams to parsing alternative investments. A sea of private investments and complex hedge fund strategies. Within that context the allocators must construct portfolios that trade-off between tolerable risks and the probability of meeting their mandates. 

The allocators rummage through a diverse mix of strategies each with their own mandates. Growth, wealth preservation, defensive, hedged alpha. A fund can be thought of as a payoff profile with an associated risk profile. A thoughtful allocator is crafting a portfolio like a builder. They want to know how the pieces interlock so the final product is useful and can withstand the eventual earthquake. 

A builder cannot think of materials without considering cost. Wood might make for a better floor than vinyl but at what price would you accept the inferior material? When builders estimate their costs they must consider not only the materials, but transportation costs and how the cost of labor may vary with the time required to install the material. 

So let’s go back to the allocators. If the menu they were choosing from wasn’t complicated enough, they must also evaluate the costs. This is a daunting topic. They face all the opaque costs the retail investors face. But since they are often investing in niche or custom strategies that are not necessarily under a public spotlight they have additional concerns. A basic due diligence process would review:

  • Which costs are allocated to the GPs vs the LPs
  • Liquidity schedules
  • Fund bylaws
  • Specific clauses like “most-favored-nations”
  • Netting risks1

Unlike their retail counterparts, the professional investor’s day job is devoted to more than just investments but terms. Like our builder, this cannot be done faithfully without understanding the costs. Mutual funds sport fixed fees but complex investments often have incentive fees (a fee that is charged as a percentage of performance, sometimes with a hurdle) making them harder to evaluate. Regretfully, I suspect a meaningful segment of pros do not have a strong grasp on how fees affect their investments. 

Understanding Fees

While it is challenging to price many of the features embedded in funds’ offering documents, there is little excuse for not understanding fees whether they are fixed or performance-based.  After all, if you are an investor this is one of the most basic levers that affect your net performance and does not rely on having skills. It’s a classic high impact, easy to achieve objective. It’s the best box in that prioritization matrix that floats around consulting circles. 

Let’s take a quick test. 

You have a choice to invest in 2 funds that have identical strategies.

They have the same Sharpe ratio of .5

There are 2 differences between the funds. The fee structure and volatility.

  Fund A Fund B
Expected Return 5% 15%
Annual Volatility 10% 30%
Annual Fee 1% 2%

Let’s assume the excess volatility is simply a result of leverage and that the leverage is free.

Which fund do you choose?

Normalizing Fees By Volatility

The correct way to think about this is to adjust the fee for volatility.

  • Fund A’s fee is 10% of its volatility (1% / 10%).
  • Fund B’s fee is 6.7% of it volatility (2% / 30%)

If you doubt that Fund B is cheaper from this reasoning you could simply sell Fund A and buy 1/3 as much of Fund B.

Let’s use real numbers. Suppose to had a $300,000 investment in Fund A. You would be paying 1% or $3,000 in fees. 

Instead, invest $100,000 in fund B. Your expected annual return and volatility would remain the same, but you would only pay 2% of $100k in fees or $2,000. Same risk/reward for 2/3rd the price. Compound that.

I am not alone in this observation. From his book Leveraged Returns, Rob Carver echoes that a fund’s fees can only be discussed in context with its volatility:

I calculate all costs in risk-adjusted terms: as an annual proportion of target risk. For target risk of 15%/year and costs of 1.5%/year, your risk-adjusted costs are 1.5%/15% = 0.10. “This is how much of your gross Sharpe ratio will get eaten up by costs.


A Clue That Some Allocators Get This Wrong

Allocators will often target lower vol products for the same fee when a higher vol fund would do. To be fee-efficient they should prefer that managers ran their strategies at a prudent maximum volatility. Optimally some point before they were overlevered or introduced possible path problems. There are many funds and CTAs that would just as easily target higher volatility for the same fee. Investors would be better off for 2 reasons:

  • Allocators could reduce their allocations

As we saw in the Fund B example, it is more fee-efficient for vol targeting to be done at the allocation level not the fund level.

  • Limit cash drag.

They would stop paying excess fees for a fund that had been forced to maintain large cash reserves since it was targeting a sub-optimal volatility. Why would an allocator be ok with paying fees for funds that are holding excessive t-bills?

If you are not convinced that investors’ preference for lower vol versions of strategies demonstrates a lack of fee numeracy then check out this podcast with allocator Chris Schindler.  As an investor at the highly sophisticated Ontario Teachers Pension he witnessed firsthand the folly of his contemporaries’ thinking around fees. While mingling at conferences he would hear other investors bragging that they never pay fees above a certain threshold.

As we saw from our example, these brags are self-skewers, revealing how poorly these managers understood the relationships between fees and volatility. Not surprisingly, these very same managers would be invested in bond funds and paying optically low nominal fees. Sadly, once normalized for volatility, these fees proved to be punitively high. 

This brings us to our next section. How would you like to pay for low volatility or defensive investments?

Tests to Compare Fixed Fee Funds with Incentive Fee Funds

A Low Volatility Example

Let’s choose between 2 identical funds which only vary by the fee structure.

Both funds expect to return 5% and have a 5% volatility. Yes, a Sharpe ratio of 1.

  • Fund A charges a fixed .75%
  • Fund B charges 10% of performance from when you invest. Fund B has a high watermark that crystallizes 2 annually.

Which fund do you choose?

A Large Cap Equity Example

This time let’s choose between funds that have SPX-like features

Both funds expect to return 7% and have a 16% volatility.

  • Fund A again charges a fixed .75%
  • Fund B again charges 10% of performance from when you invest. Fund B has a high watermark that crystallizes < annually.

Which fund do you choose?

Studying The Impact Of Fee Structure

I wrote simulations to study the impact of fees on the test examples.

The universal setup:

  • Each fund holds the exact same reference portfolio
  • 10 years simulation using monthly returns
  • Random monthly returns drawn from normal distribution 
  • 1000 trials
  • Fixed Fee Fund charges .75% per year deducted quarterly
  • Incentive Fee Fund charges 10% of profits crystallized annually

Case 1: Low-volatility 

Simulation parameters:

  • Monthly mean return of .42% (5% annual)
  • Monthly standard deviation of 1.44% (5% annually)3

This chart plots the outperformance of the fixed fee return vs incentive fee return fund annually vs the return of the portfolio which they both own. The relative performance of the 2 funds is due to fees alone. 


  • It takes a return of about 7% or higher for the fixed fee fund to outperform.
  • This makes sense. A 75 bp fee is difficult to overcome for a 5% vol asset.
  • If the asset returns 5% the performance fee would only be 50bps and we can see how the difference in fees approximates the underperformance of the fixed fee fund for 5% level of returns.

Case 2: Large Cap Equity Example

The universal setup remains the same. 

We modify the simulation parameters:

  • Monthly mean return of .58% (7% annual)
  • Monthly standard deviation of 4.62% (16% annually)


  • Most of the time the fixed fee fund outperforms. So long as the return is north of about 4% this is true.
  • The most the fixed fee fund can underperform is by the amount of the fixed fee. Consider the case in which both portfolios lose value every year. The incentive fee fund will never charge a fee, while you will get hit by the 75bps charge in the fixed fee fund. You can see these cases in the negative points on the left of the chart where the portfolio realizes an annual CAGR of -5%.
  • Conversely, the incentive fee can be very expensive since it captures a percentage of the upside. In cases where the underlying portfolio enjoys +20% CAGRs, the simple fixed fee fund is outperforming by about 150 bps per year. 

Bonus Case: The High Volatility Fund

Finally I will show the output for a low Sharpe, high volatility fund.

The universal setup remains the same. 

We modify the simulation parameters:

  • Monthly mean return of .42% (5% annual)
  • Monthly standard deviation of 10.10% (35% annually)


  • This case demonstrates how complicated the interactions of fees and volatility are. The fixed fee fund will massively outperform by even as much as 200bps per year when the portfolio compounds at 20% annually.
  • The fixed fee fund even outperforms at low to mid single-digit returns albeit modestly. 
  • The high volatility nature of the strategy means lots of negative simulations, thanks to geometric compounding (for further explanation I discuss it here). When a fund performs poorly you pay less incentive fees so it’s not surprising that in many of these case the fixed fee fund underperforms by nearly the entire amount of the management fee. 


Fixed Fees

  • Best when the volatility of the strategy is high and the returns are strong (again you are warned: most high volatility strategies don’t have strong returns because of geometric compounding).
  • The most a fixed fee investor can underperform an incentive fee investor is by the amount of the fixed fee.

Incentive Fees

  • Best when the strategy is low volatility or returns are negative. Or the asset is defensive in nature. For hedges or insurance like funds, you may prefer to pay a performance fee to minimize bleed.
  • The amount an incentive fee investor can underperform is technically unbounded since it’s a straight percent of profits.


  • Fee structures must be considered relative to the volatility and goals of the strategy. There are no absolutes. 
  • By dividing fixed fees by the fund’s volatility you can normalize and therefore compare fund fees on an apples-to-apples basis. Even seemingly low fixed fees can be very expensive when charged on low volatility funds. 
  • Incentive fees look like long options to the manager (which implies the investor is short this option). The investor has unbounded potential to underperform a fixed fee solution and can only outperform by the amount of the fixed fee (the left hand side of those charts). To further study the embedded optionality of incentive fees see Citigroup’s presentation.
  • Incentive fees are meant to align investors and management. Who can argue with “eat what you kill”? But they can also create bad incentives. If trapped below the high watermark, the manager has nothing to lose and may swing for the fences irresponsibly. In addition, a staff working at a fund that is underwater might be dusting off their resumes instead of focusing on getting back on track knowing that they need to work through uncompensated p/l before they see another bonus. 
  • Fixed fees can encourage management to diversify or hold more cash to lower the fund volatility. These maneuvers can be combined with heavy marketing in a strategy more colloquially known as “asset-gathering”.


Fees need to be considered in light of the strategy. This requires being thoughtful to understand the levers. Unless you are comparing 2 SP500 index funds, it’s rarely as simple as comparing the headline fees. If we all agree that fees are not only critical components of long-term performance, while being one of the few things an allocator can control, then misunderstanding them is just negligent. A one size fee doesn’t fit all  alternative investments so a one size rule for judging fees cannot also make sense. Compared to the difficulty of sourcing investments and crafting portfolios getting smart about fees is low-hanging fruit. 

How Tails Constrain Investment Allocations

You would need to be living under a rock to not know about the importance of small probabilities on asset distributions. By 2020, every investor has been Talebed to death by his golden hammer. But knowing and understanding are not the same. I know it’s painful to give birth. But if I claimed more than that I’d end up only understanding what it felt like to be slapped in the face.

I’m hoping the above discussion of the devilish nature of small probabilities makes the seemingly academic topic of fat-tails more visceral. But if it didn’t I’m going to try to drive it home in the context of a real-life investing decision.

Step 1: Understand the impact of fat tails

I ran a simple monte carlo assuming the SPX has a 7% annual return (or “drift” if you prefer to sound annoying). I assume a 16% annual vol or standard deviation and ran a lognormal process since we care about geometric returns. We’ll call this model the “naive simulation”. It does not have fat tails.

Based on these parameters, if you invest on January 1st:

  • You have a 5% chance of being down 23% at some point during the year.
  • You have a 50% chance of being down 7% at some point during the year.

Now be careful. These are not peak-to-trough drawdowns. They are actually a subset of drawdown since they are measured only with respect to your Jan 1st allocation. The chance of experiencing peak-to-trough drawdown of those sizes is actually higher, but these are the chances of your account being X% in the red.

That’s the naive simulation. To estimate the odds in a fat-tailed distribution we can turn to the options market which implies negative skewness and excess kurtosis (ie fat tails). I used 1-year option prices on SPY. Option prices answer the question, “what are the chances of expiring at different prices?” not “what are the chance of returning X at any point in the next year?”. To estimate what we want we will need to use the pricing from strikes that correspond to the equivalent one-touch option. Walking through that is overkill for this purpose but hit me offline if you want to see how I kluged it.

Let’s cut to the market-implied odds.

  • You have a 5% chance of being down 39% at some point during the year.
  • You have a 50% chance of being down 11% during the year.

Now you can see the impact of fat-tails: the gap between 23% and 39%. This is the impact of kurtosis in the options. Meanwhile, in the heart of the distribution, the downside moves from 7% to 11%. Not as dramatic and attributable to market skew.

When we shift probabilities in the tails of distribution vs the meat the impact on the payoffs is significant.

Repeating this insight in a different way may help your understanding. Consider tossing a pair of dice. Imagine playing a game that pays the fair odds for a roll (i.e. craps).

Now let’s chip the dice to change the probability of how they land.

  • In scenario 1, add 1% to the “7” and shave .5% from each tail.
  • In scenario 2, add 1% to the “7” and shave .5% from the meat, the “6” and “8”

By shaving from the tails we take a fair game and turn it into a negative 30% expected value per toss. This is far worse than almost any casino game you might play. By changing the tail probabilities the effect on the game is magnified because the odds are multiplied across an inversely proportional payoff!

Step 2: How should tail sensitivity affect allocations?

By now, the danger of poorly estimating should be a bit more clear. How do we use this when making allocation decisions? After all, most of the time whether they are 1% or 2% events, huge moves are usually not in play. But we must care because when these events hit the impact is huge.

Tail outcomes should dictate constraints based on what you can tolerate. I’ll work through a conservative framework so you can see the impact of naive tail probabilities versus market-implied tail probabilities. The exact answers don’t matter but I’m hopefully offering a way to make tail-thinking relevant to your allocation decisions.

Reasoning through sizing decisions

Suppose things are going well and you are able to save $50,000 per year after paying expenses. You decide that losing $50,000 in the stock market is the largest loss you can accept, reasoning that it’s a year’s worth of savings and that you could make up the lost sum next year. If you impose a restraint like that, well, the most you can allocate to stocks is $50,000. That’s too conservative especially if you have accumulated several hundred thousand dollars in savings.

So you must relax your tolerance. You decide you are willing to accept a $50,000 loss 5% of the time or 1 in 20 years. Roughly a generation. If we use the naive model’s output that we lose 23% of our investment with 5% likelihood then the maximum we can allocate to stocks is $50,000/.23 = $217,000.

The naive model says we can allocate $217k to stocks and satisfy our tolerance of losing $50k with 5% probability. But if the market’s fat-tails are implied more accurately by the option skew, then our max allocation can only be $128k ($50,000/.39).

If we constrain our allocation by our sensitivity to extreme losses, the max allocation is extremely sensitive to tail probabilities. In this example, we simply varied the tail probability between a naive model using a mean and variance to a market-implied model which adjusted for skew and kurtosis. The recommended allocation based on our tolerance dropped a whopping 42% from $217k to $128k.

Many will point out that this approach is extremely conservative. Constraining your max loss tolerance to the amount of money you can save in a year seems timid. But the probabilities we used here did understate the risk. Again these were not peak-to-trough drawdown probabilities but the narrower chance of incurring losses on your start of year allocation. If we are thinking about the true experience of investing and how you actually feel it, you probably want to consider the higher drawdown probabilities which are out of scope for a piece like this. I know many financial advisors read this letter, I’m curious how allocation models reason through risk tolerance.

Current examples to consider in context of small probabilities

1) Bernie

There are market watchers who believe that electing Bernie Sanders would send us back to living in caves. Democrats are trading for about 40% to win the election. Bernie is trading at about 45% to win the nomination, implying an 18% chance to win the election. Market watchers who fear a Bernie presidency are either totally overstating his alleged market impact or the market is already discounting his odds. If the latter is true and the market is efficient, math dictates that it should shoot much higher in the event he loses.

At 18%, Bernie is no longer in the tail of the distribution. So you could argue that as he went from single-digit probability to his current chances, the market strongly re-calibrated either his impact or the sustained rally in the meantime would have been much larger. One of these things must have happened by the necessity of math as odds shifting from a few percents to 18%.

Or there is a third option. The market never really believed that Bernie’s impact would be as deep as his detractors contend.

2) Tesla

We have all seen this stock double in the past month. There has been a lot of talk about far out-of-the-money call options trading on the stock. These are bets on the upside tails of the stock over relatively short time frames. I won’t comment too much on that other than to point out a different tail in the matter. All the credit for this observation goes to a friend who keenly remembered that a year ago the Saudi’s collared their position in TSLA. That means they bought puts and financed by calls sold on the stock. Given the size of the move, the calls they sold are definitely deep in the money. This hedge likely cost them over 3 billion dollars. Billion with a “b”. That’s 6% of there projected government deficit. Their investment in TSLA stock was supposed to be a tail hedge against electric cars destroying demand for oil permanently. In the meantime, they got smoked hedging the hedge. The other tail in this story is going to be that of the official who recommended the hedge. This is a government that nearly executed a 13-year old for protesting. Fair warning to anyone looking to be an execution trader for the kingdom. You are probably short the mother of all puts. Make sure you are getting paid at least as much as a logger.

And one last TSLA note. This keen observation by Professor Bakshi.

Sometimes Keynes’ beauty contest doesn’t just judge beauty. It can create it.

Market Mutations

recently described markets as biology not physics in recognition of how players adapt. Let’s discover 2 more opaque examples and their causes.

1) Structured products

Historically your bank would happily sell you an investment note which guarantees your principle (insofar as you are ok with your bank’s credit risk) and earns you a return which is linked to return of an equity index. To manufacture this investment product the bank would invest in bonds and a portion of the interest income would be directed to buy call options on the index. There are more shortcuts they use to create the product (for example, the investor typically doesn’t capture the dividends which are a significant portion of the expected return), but the important thing to understand is these notes require enough interest income to finance the call options. With interest rates near zero in most of the world, banks have had to get more…creative.

To keep these notes promising attractive rates of return, the issuers buy insurance against a sell-off from the investors. Not explicitly of course. Instead they embed a feature that “knocks” your note out and exposes you to the losses if the reference index falls far enough. Yes, the prospectus spells this out. But for whatever reason, retail investors fail to wonder why an investment product can offer seemingly attractive returns in a low risk-free rate environment. They continue to gobble them up, not realizing they are self-financing these returns by underwriting catastrophic risk.

Here’s where it gets interesting. Since interest rates have never been this low and the aging developed nations have never been this large, there is unprecedented demand for these notes. These products are intensely popular in Asia and Europe (a friend once quipped you could buy them at a 7-11 in Italy. I want to believe this because it sounds so ridiculous so I refuse to fact-check it). The issuing banks, who are not in the business of taking directional or outright volatility risk, must recycle the optionality that these notes spit off. The associated option flows from these popular products are correspondingly massive.

From a “market is biology” perspective, it’s useful to remember that anybody using historical data to make their case may not be fully appreciating that our current landscape includes a bunch of dormant, non-linear payoffs that kick in only when the market has already made a large down move. An extreme analogy would be like comparing NFL wide receivers through time without noticing that they got rid of pass interference rules.

Although the bulk of these notes have historically been tied to Asian indices like Korea, they are becoming increasingly linked to the SP500. Will the tail wag the dog? Let options fund manager Benn Eifert explain on his latest appearance on the Bloomberg Odd Lots episode titled How To Create Havoc In The U.S. Options Market. (Link)
2) How corporate governance responds to the age of passive indexing
Consider these points taken from Farnum Street Investment’s latest letter. (Link)

  • In 1965, the CEO-to-worker pay ratio was 20-to-1. By 2018, it had jumped to 278-to-1. How did pay structures get so lopsided? Shouldn’t someone have stepped in? Yes, someone should have stepped in: the owners of the companies. But if you’re a passive index holder, you abdicated that responsibility to Vanguard, Blackrock, State Street or Fidelity. It wasn’t a custodian like Vanguard’s job to mind the henhouse. It was the job of the owners of the company.

Hard Truth: If you own an index fund, you waive your right to complain about CEO compensation.

  • In 2019, Lyft went public. With the increased transparency of SEC filing, it was discovered the company had 46 million restricted stock units (RSU) outstanding. RSUs are a way to incentivize employees, but they can become a big bill for owners. In the case of Lyft, the RSUs would cost owners $2-4 billion, depending on the IPO price. This represented a 20-25% ownership stake of the company being granted to employees. Corporations who grant extravagant stock options do so at the expense of the owners. There are no free lunches.

Hard Truth: If you own an index fund, you waive your right to complain about option dilution.

  • From 2008-2017, the pharmaceutical giant Merck distributed 133% of profits back to shareholders via dividends and share buybacks. Yes, they paid out more than they took in. Those resources could have gone toward research, saving lives, and the next blockbuster drug. The strategy seems obviously shortsighted. How come no one stepped up to tell them to think long term? Analysis initiated by SEC Commissioner Robert Jackson Jr. revealed that in the eight days following a buyback announcement, executives on average sold five times as much stock as they had on an ordinary day. Management is effectively cashing out at the owners’ expense when they know the price will be supported by internal buybacks. How come no one is stopping them?

Hard Truth: If you own an index fund, you waive the right to complain about myopic corporate strategy and share buybacks.

  • Sir Winston Churchill once said, “Capitalism is the worst economic system, except for all the others.” That remains true, but proper capitalism requires thoughtful stewards, meritocratic outcomes, and engaged owners. If we all abdicate our responsibilities, we risk perversion of the system that’s created more positive effects for humanity than arguably any other single phenomenon. Hope is not lost as history tends to move in cycles. We’re in need of the pendulum to change direction.

Hard Truth: This too shall pass.

Investing Is Biology Not Physics

Since the 1980s, there has been a tradition of Wall Street luring physicists from academia. Option math has more in common with the laws of thermodynamics than it does with accounting. But if the nature of markets themselves resembled any science it would be biology. Markets are governed by predator-prey dynamics. Models are adaptive. The actors learn. Doublethink and tradecraft.

In physics, the rules are fixed. No matter how many of us use the laws of gravity to keep firmly planted on planet Earth, gravity doesn’t get crowded. It keeps me just as bound to its surface as it did the Neanderthals. In markets, if I raise a bunch of money by showing people that selling volatility “harvests” a risk premium and the strategy continues to work then people will give me money to do it even more. So the strategy’s assets will grow both via inflows and via returns. The only problem is that to continue delivering the same performance on the larger asset base the strategy needs to sell ever more options. Assumptions of market liquidity when a strategy manages X will not hold when the strategy manages 10x or 100x. That’s about as close as we get to a physical law in finance.

The nature of liquidity is biological. It is subject to the whims of masses. It is the physical point where the backtest meets reality. Reality is a recursive, perma-learning system, with constraints and desires whose steers are pulled by investors, politicians, and corporations.

One of the best discussions I’ve ever listened to about what this looks like in practice is investor Andy Redleaf on Ted Seides’ Capital Allocators podcast. Redleaf has been in the game for over 40 years and was an early options market maker when they were listed in the 1970s. Since then he has followed opportunities that present themselves as markets change. A true agnostic on the hunt for profitable niches. Especially niches with structural reasons for being extra profitable. The advantage of this approach is that when the reasons go away, you know it is safe to cut and run. The disadvantage is that you cannot be a one-trick pony. You need to keep finding easy games.

For the full discussion of market history, where sources of edge often lurk, investing challenges today, and why he bought a bank check out the episode including my notes. (Link)

Susquehanna took their understanding of markets as biological to a logical recruiting conclusion — hire game players. Poker, Magic, chess, sports bettors. All games that require multi-order thinking and adapting to your environment. If you know anyone with a strong game background (and ideally some programming chops) check out Moontower reader Metaling Mage’s call for an intern. He’s a former Susquehanna PM.

You can reach out to him for details but it’s safe to say based on where he is now that this is could be one of the most selective Wall Street internships on the markets side of the business.

Is There Actually An Equity Premium Puzzle?

The equity risk premium, or ERP, is defined as the excess return you get for investing in stocks over the risk-free rate. Simply, it’s the premium return you earn in exchange for dealing with path. The fact that you might experience a 20% drawdown every few years (with U.S. equity markets currently sitting on all-time highs it’s hard to believe that just 1 year ago the SP500 had a 20% drawdown). I admit this “no pain, no gain” explanation sounds a bit weird.

Student: Hey prof, why do I get paid extra for buying stocks instead of t-bills?

Master: Because if you weren’t offered a discounted price to buy stocks you wouldn’t. Duh.

Proof by induction can be unsatisfying. To be fair, my use of the word proof is straining its English definition. Instead, it’s typical to hear ERP referred to in the context of a puzzle since some economists with calculators decided that this roughly 6% historical premium has been excessive compared to what they would expect even risk-averse investors to demand.

Enter the Witch

But what if I told you that there is actually no ERP and therefore no puzzle. Well, you’d accuse me of heresy since I’m directly contradicting widely accepted financial orthodoxy. After all, I’m ignoring the fact that equities have in fact outperformed t-bills by a wide margin.

Let’s look at that assertion again — equities have outperformed t-bills by a wide margin.

Well, what do we mean by equities? Single stocks or indexes? This is where I let the witch take over. The heretic, BreakingTheMarket who states:

The Equity Premium Puzzle has lasted for 37 years without anyone recognizing the market index doesn’t represent stocks.

Mistaken Equivalency

Turns out the existence of an ERP depends on your definition of equities and an index of equities is just not just equities. It’s a strategy. An index is a rule-based weighting that rebalances intermittently. The difference cannot be overstated. Why?

“Stocks” and the “Stock Market Index” are not the same thing and never have been. One is an asset class, the other is a trading strategy of that asset class. They don’t behave the same and don’t have the same properties, return, or standard deviation. You can’t use one to replace the other.

The math makes it clear.

When you compare the geometric return of stocks not a stock index you do not find an ERP!

The key here is that the historical volatility or standard deviation of single stocks is .33 which is about twice what it has been for U.S. stock indices. He makes the case that a .55% premium is much more in line with what economists would predict or just dismiss it as noise.

Enjoy the full post Solving the Equity Premium Puzzle, and Uncovering a Huge Flaw in Investment Theory. (Link)

How This Ties Together With What We Have Learned In The Past

As you digest this, there should hopefully be a comforting reinforcement of past ideas, namely:

  • When we deal with multiplicative processes, like returns that compound wealth, we care about geometric or logreturns not arithmetic returns because of the “volatility drain”. (Link)
  • Portfolio components are not perfectly correlated so when we rebalance, we capture a premium geometric return. (Link)
  • The imperfectly correlated aspect of a portfolio contributes to what Fernholz called the excess growth component that diversification earns when you are in logreturn space. (Link).

If we presume stock index volatility is only 17% (as opposed to the 33% for single stocks), we can use napkin math to make additional observations.

  • Index ERP is closer to 6% – .5 * (.17^2) = 4.56%…the extra 4% represents Fernholz’s “excess growth rate”. This is why some pros refer to diversification as the only “free lunch” in investing.
  • The average cross-correlation of stocks in an index can be approximated by the ratio of index variance to average weighted stock variance. Using our estimates (.17^2) / (.33^2) = .27 which is in the ballpark of where long term average SP500 index correlations have realized (although option folks know how spikey that number can be, especially on short measures).

Summing Up

ERP doesn’t exist if you look at stock; only stock indexes!

  • Researchers commonly mistake equivalency between a single asset and a portfolio:
    • Treasury bills (and bonds) are a single investment item. An equity market index (SP500 for the original study and many others) is a portfolio of many investments, who’s composition changes all the time. They are not the same thing and shouldn’t be compared as if they are!

A Final Note

I chat with BreakingtheMarket on Twitter and follow his discussions with quants. So much of the merit of Twitter, and the internet in general, is the beauty of being able to learn and engage in conversations with talented, curious people whom you may not have found otherwise. Breaking the Market is not in finance. He’s an engineer with a strong math background who approached markets with a “beginner’s mind”. I don’t think it’s an accident that two of my favorite finance writers on the internet are from scientifically minded people from a different field. I think the best finance blog is which is penned by another finance outsider, the pseudonymous Jesse Livermore. Jesse did his first interview this year and it’s worth checking out, along with his widely influential writing. (Link to interview with my notes)

How Much of Momentum Is Caused By Randomness?

Randomness In Momentum Everywhere (Link)

This post from contends that randomness and rebalancing undoubtedly explain SOME of the findings in favor of a momentum effect.

Key Takeaways:

Rebalancing increases a portfolio’s returns

The more often you rebalance the greater the benefit. With the 30 components of the Dow, increasing the rebalancing frequency increases both the portfolio’s arithmetic and geometric returns.

Since most momentum studies examine portfolios which filter and rebalance momentum candidates, we would expect to see improvement over a passive benchmark due to rebalancing alone.

  • Consider one of the original momentum studies by Jegadeesh and Titman:

Their methodology is actually an equal-weight rebalancing scheme, with the 3 month “holding period”, serving as a 3 month rebalancing period, and a 6 month rebalancing period, a 9 month rebalancing period, and finally a 12 month rebalancing period. The finding that “momentum” is strongest over the shorter period and fades as the holding period grows is not a finding about momentum. It’s exactly what you would expect from random behavior when adjusting portfolio rebalancing frequency. Yes the slope of the momentum curve is much higher, but momentum stocks are also much, much more volatile than dow components.

This turns out to be a hint as why momentum is “found everywhere”. The act of rebalancing which is common to all the studies.

  • Note that finance blogger Jesse Livermore got close: momentum failed to work in individual securities but worked in indexes. Recall from Fernholz discussion of EGRs, that portfolios have better logreturns than the weighted average of their components because the cross-correlations reduce the variance of the basket. Arithmetic return and geometric return differ by the the amount of variance.

Momentum is really a volatility screen

  • Imagine two groups of 50 stocks. The first has an average return of 5% but volatility of 25%. The second has an average return of 10%, but a volatility of 15%. If you let the stocks randomly produce returns for a short period, and then select the 10 best stocks, is your sample more likely to come from the first group or the second?
  • Because the first group is more volatile, it is more likely to have extreme losers and winners. Momentum is a gigantic volatility screen, more so than a “momentum” screen. The momentum screen will lean toward picking stocks with higher expected returns. But importantly it will also be filled with high volatility stocks even if they have average or poor returns.

Fading momentum is explainable by geometric return math

Momentum is said to “fade over time” but this is exactly what happens with random returns as “All random compounded returns start out producing returns equivalent to the asset’s arithmetic returns. But with every repetition, the returns will converge toward a geometric return. A portfolio of stocks slows down this degradation of returns toward the geometric return, but it still happens.”

  • Note how a portfolio slows down the process of degradation vs single stocks
  • We already know momentum screens select high volatility stocks. High volatility stocks will inherently have a large spread between their arithmetic and geometric returns. Therefore, the shape of the momentum return stream over time isn’t really an anomaly at all, but is expected…You don’t need stock “momentum” to explain the results of the study. The rules of the strategy alone create the illusion of momentum, even with random coin flips.

Randomness as the benchmark

He concludes:

Technically, I’m not saying that randomness explains ALL of the momentum effect. It may. I’m saying randomness and rebalancing undoubtedly explain SOME of the findings of these papers. The process of selecting high volatility stocks and rebalancing them frequently produces most of “momentum’s” performance. If researchers compared their results to a random data set, they would see this.

How Math Is Sufficient To Explain Small Stock Outperformance

Takeaways from Diversification, Volatility, and Surprising Alpha by Fernholz et al. (Link)


  • It has been widely observed that capitalization-weighted indexes can be beaten by surprisingly simple, systematic investment strategies including equal and random-weighted portfolios.
  • This outperformance is generally attributed to beneficial factor exposures.
  • It turns out this outperformance needn’t invoke factors. It can be explained by stochastic math where correlation and variance play a larger and more predictable roles than returns.
  • Portfolio logreturns can be decomposed into an average growth and an excess growth component. They argue the excess growth component plays the major role in explaining the outperformance of naıve portfolios.

Some basics

Let’s establish some basic definitions.

Stock Returns

There are 3 types of returns commonly used to describe growth rates. But they are not equal.

Arithmetic returns > Geometric Returns > Log Returns

This is important because only logarithmic returns are an unbiased estimate of expected long term returns.  In other words, arithmetic and geometric returns will overestimate expected growths in wealth.

  • Logreturn of an asset= Arithmetic return – .5 * variance
  • .5 * variance is known as the volatility drag or variance drain. I’ve discussed this here in simple terms.

Portfolio Returns

The logreturn of a portfolio can be decomposed into 2 components:

Weighted avg of stock logreturns  + “excess growth rate” (aka EGR)

Understanding the EGR

EGR = (weighted average stock variance – portfolio variance) / 2

The relationship between stock variance and index variance

Now this part is not in the paper, but taking from my index options experience:

Portfolio variance = weighted average stock variance * average cross-correlation of the stocks

This is a common identity used to price index options. It makes intuitive sense.

  • If all components of the portfolio has a correlation of 1 the portfolio variance would be the same as the underlying stocks.
  • If you had a 2 stock portfolio and the correlation were -1 the portfolio variance would be zero. Iimagine a basket comprised of 50% SPY and 50% inverse SPY. It would never move in price (assuming no fees, frictions, etc) regardless of how high SPY variance was.
  • For an average correlation < 1,  the portfolio variance must be less than the average weighted stock variance.

The key insight: the lower the average correlation between the components the wider the spread between the portfolio and weighted average stock variances!

Back to the paper…

Observations about the EGR

Looking at the formula again:

EGR = (weighted average stock variance – portfolio variance) / 2

  • EGR boosts the portfolio returns beyond that of its components since portfolio variance < weighted average stock variance
  • EGR boosts portfolio returns with lower correlations
  • EGR boosts portfolio returns with high stock variance

Relationships Between Market Cap, Logreturns, and Variance

The authors then use a rank based computation to show:

  • Logreturns of individual stocks do not vary by market cap.
  • Variances of individual stocks do vary by market cap. Smaller stocks are more volatile.

This prompts the great reveal:

Small stocks don’t have higher returns but have higher variances which boost EGRs. The volatility and interaction of the stocks is boosting the portfolios that contain them without any need to rely on factors! The increased volatility of the individual stocks did not earn them a risk premium when considered in isolation, but at the portfolio level they contributed to excess growth.


  •  The authors contend that the excess growth component can be estimated relatively easily, since its value depends only on variances, or relative variances, which are not difficult to determine in practice. The average growth component, however, is more difficult to estimate. 
  • Small stocks are riskier and while this might mean higher single period arithmetic returns long term investors care about logreturns. In logreturn space, individual stocks don’t contribute excess returns. This is at odds with conventional wisdom.
  • Instead, the excess returns are coming at the portfolio level via the small stocks’ contribution to “excess growth rates” (EGRs).
  • They tested the expectations of this stochastic portfolio math on 5 commonly employed weighting strategies, some more diversified and some less diversified than the capitalization-weighted portfolio, confirmed these insights. In general, the more diversified portfolios outperform and the single less diversified portfolio underperforms, because the more diversified portfolios have a higher excess growth rate. This arises from the higher variances associated with the smaller stock exposure in these more diversified portfolios, and not because such stocks have inherently higher returns. This higher excess growth rate, in turn, increases the portfolios’ logarithmic return.

My Comments

  • The role of low correlation was not emphasized enough in the paper considering it drives the EGR by setting the gap between portfolio and average weighted stock variances.
  • You should read the paper if you’d like a refresher on computing arithmetic, geometric, and logreturns.
  • You should read the paper to see how they computed rankings since this work established that there was no relationship between logreturns and the market cap of a stock.

Your Portfolio Intuition Is Poor

Summary and takeaways from Bridge Alternatives’ Portfolio Intuition (Link)

Intuition Test


  • Your current portfolio has 5% return and 15% volatility for a Sharpe ratio of .33
  • You want to allocate 10% of your portfolio to a prospective asset
  • You want to maximize the Sharpe ratio of the resulting portfolio

Choose between A1 and A2

A1 A2
Return 4.00% 4.00%
Volatility 7.96% 46.04%
Correlation -.20 -.20

Unsurprisingly, most people prefer A1 since it has the same attributes as A2 with 1/6 the risk.

Now let’s run the numbers 

Expected return of the new portfolio is the same whether we choose A1 or A2:

Volatility of the new portfolio if we choose A1:

Sharpe ratio of original portfolio = .33

Sharpe ratio when we add A1 = .049/.13363 or .3667

The Sharpe ratio improved by about 10%

Now what is the Sharpe ratio if we add A2 instead of A1.

First, we must compute the volatility. Go ahead, plug and chug…

That’s right, the volatility is the same!

The volatility of the new portfolio is the same whether we add A1 or A2 which means the new combined portfolio has the same improvement to Sharpe whether we add A1 or A2. This is true despite A2 having a far worse Sharpe than A1! It is counterintuitive because portfolio math and the role of correlation is not intuitive.

To see why, look at the formula for portfolio volatility:

Let’s zoom in on the last 2 terms which come from adding the second asset:

Plot of change in overall portfolio volatility vs volatility of prospective asset (A1 or A2)

As we increase the asset’s risk, the first term grows exponentially, and the second term shrinks linearly (remember, the correlation is negative). It turns out that, at least temporarily, the shrinking effect from the negative correlation outweighs the exponential term.

There are 2 observations to note once you are done reeling from the bizarre impact of correlation.

  1. When adding a negatively correlated asset to a portfolio its risk must be incredibly high before it starts to degrade the Sharpe ratio of the final portfolio.
  2. Notice how, at least until we hit the vertex, if we move from left to right, representing an increase in risk, we’re actually reducing return. Put differently, if we added risk and didn’t reduce return we’d deliver more than a 10% improvement; risk has a positive payoff here, which is very cool. There is a significant range where we are reducing the prospective assets’ Sharpe and actually reducing the volatility of the new portfolio.

More Preference Tests

B1 B2 C1 C2 D1 D2
Return 10.54% 3.57% 9.33% 6.50% 6.43% -2.64%
Volatility 20.00% 20.00% 27.50% 12.50% 10.00% 40.00%
Correlation .80 -.20 .40 .40 .50 -.60

Most people agree:

  • B1 was slightly preferred to B2. For the same risk, B1 delivers much more return, though B2’s correlation is better.
  • C2 was preferred. It’s Sharpe is higher (about 0.52 versus about 0.34).
  • D1 was preferred to D2. D1’s Sharpe ratio is much higher. D2’s return is negative

The punchline, of course, is that every one of these assets improves the Sharpe of the portfolio by the same 10%. Your intuition would tell you would prefer a portfolio in the upper left green box since those assets have the best Sharpe (risk/reward), so it is probably uncomfortable to learn that the final portfolio is mathematically indifferent to all of these assets.

Correlation Is The Key

Here’s the same plot relating these equivalent portfolios by their respective correlations

As the correlation drops (corresponding to lines of “cooler” coloring), less return is required to deliver the same 10% improvement!

While Sharpe ratios are “mentally portable”, they are shockingly incomplete without being tied to correlation. To create a compact formula which links Sharpe ratios with correlation, it is helpful to view indifference curves.

Indifference Curves

RRR= Sharp Ratio of prospective asset
RRRb = Sharp Ratio of original portfolio

If Relative RRR > 1 the Sharpe of the prospective asset is greater than the Sharpe ratio of the original portfolio

The indifference curve represents an equivalent tradeoff between Sharpe ratio and correlation for various mixing weights. For example, the light green line assumes you will allocate 20% of the original portfolio to the prospective asset.


  • As the weight allocated to the asset increases (the lines move upward, from green to purple), the asset must be more performant in order to do no harm; it must be better relative to the portfolio. Put differently, as the role played by the asset increases, more is required of it, and that sounds about right.
  • A less performant asset, ie one with a worse Sharpe ratio than the original portfolio can compensate with low or negative correlations

Getting Practical

The investor’s natural question when evaluating a new asset or investment is:

“What is required from an asset (in terms of return, risk and correlation) in order to add value to my portfolio?”

With math that can be verified in the paper’s appendix we find a very handy identity:

This equality describes what’s required, in an absolute bare-minimum mathematical sense, of a prospective asset in order to do no harm. 

How to use it

For a given prospective Sharpe ratio, you very simply compute the maximum correlation the new asset can have to be accretive to the portfolio. For example, if the prospective asset has a Sharpe ratio of .10 and the original portfolio has a Sharp ratio of .40 then the prospective asset requires correlation no greater than .25 (ie .10/.40).

For a given correlation, you can compute the minimum required Sharpe ratio of the new asset to improve the portfolio. If the correlation is .80 and the original portfolio has a Sharpe ratio of .70 then the prospective asset must have a Sharpe ratio of at least .56 (ie .80 x .70).

Insights and Caveats

  • Correlation is best understood as a sort of performance hurdle. For assets exhibiting low correlation, less is required of their standalone performance (i.e. return over risk), all else equal.
  • Prospective assets with a Sharpe ratio greater than the original portfolio are always additive.
  • If you happen to find a truly zero-correlation asset it will be additive as long as it has positive returns. And as we saw with asset D2, a negative Sharpe Ratio asset can be additive if it has a negative correlation!
  • This cannot be used to somehow rank prospective assets. It can only serve as a binary filter: yes or no. This might feel like a real limitation. Sharpe ratios are absolutely rankable. They are measurements of the same unit (risk). But as we’ve shown in this paper, those rankings are not indicative of their true value within the context of a portfolio. Making decisions based only on return and risk is like ranking runners based on their times without asking how far they ran. It doesn’t make sense. If you take away one thing from this paper, this should be it!

My Own Conclusions

  • Correlations make portfolio math extremely unintuitive.
  • Negative and low correlations can make poor or losing stand-alone investments great additions to a portfolio. The implications for the diversifying power of low or negative-yielding assets are significant. Bonds, cash, commodities, gold.
  • Highly volatile assets with a negative correlation are tamed and even subtractive to the total risk of a portfolio.
  • While the importance of low or negatively correlated assets is well known it’s possible it remains underappreciated.

Further reading

Breaking The Market’s outstanding post Optimal Portfolios For Two Assets

You will learn:

  • How to mix assets by comparing their geometric returns.
  • Correlation’s effect on portfolio construction is not linear.
    • The closer correlations are to 1 the more they impact the recommended mix.
    • Negative correlations are deeply valuable in portfolio construction, adding to the long term return. Positive correlations are harmful, limiting the benefit of diversification.
    • The mixing range for the geometric returns is the combination of each asset’s variance, expanded or contracted based on the correlation between the two assets.
    • Negative correlation is wonderful.


You can save your own copy here

You can also play with the numbers directly below

Lesson from coin flip investing

The setup

  • You invest in 2 coins every week for the next 1000 weeks (19.2 yrs)
  • These coins pay a return each week
  • Every 4 weeks, you rebalance wealth equally between the 2 coins
  • Coins have an expected edge of 10%
  • Simulation is run 10,000x
  • Assume no transaction costs

Individual Coin Payouts

Coin Win Payout Loss Payout Expected Annual Return Expected Annual Volatility
A(Low Vol) 2.75% 2.50% 6.70% 18%
B (High Vol) 8.25% 7.50% 21.5% 54%

Results of the 2 Coin Portfolio1

Strategy CAGR Volatility Median Return Max Drawdown
Theoretical  14.1% 28.5% 10%2
Un-rebalanced simulated 17.9% 32% 6% 68%
Rebalanced simulated 13.9% 30% 9% 64%

Observations from many simulations like the one described

  1. The higher the portfolio volatility, the more the mean and median diverge
  2. Rebalancing pushes median returns closer to the theoretical mean
  3. The rebalancing benefit is positively correlated to the difference of volatility between the coins

How much to wager when you have edge? (Hint: median not mean outcomes!)

Link: Rational Decision-Making under Uncertainty: Observed Betting Patterns on a Biased Coin

  • Optimal bet size as a fraction of bankroll is 2p-1 where p is the probability of winning1. You will recognize this as the edge per trial reported as a percent. So a 60% coin has 20% edge.
  • The formula is a solution to a proportional betting system which implicitly assumes the gambler has log utility of wealth

Imagine tossing a 60% coin 100x and starting with a $25 bankroll

Arithmetic Mean Land

The mean of one flip is 20% positive expectancy.

Optimal bet size is 20% of bankroll since you have .20 expectancy per toss

Increase in wealth per toss betting a Kelly fraction: 20% of bankroll x .20 expectancy = 4%

Expected (mean) value of game after 100 flips betting 20% of your wealth each time

$25 * (1+.04) ^ 100 = $1,262

Median Land

The median of one flip betting a Kelly fraction is (1.2^.60 * .8^.40 – 1) or 2%

Median value of game after 100 flips betting 20% of your wealth each time

25 * (1.2^60) * (.8^40) = $187.25!

Things to note

  • The median outcome by definition is the increase in utility since Kelly betting implicitly assumes the gambler has log utility
  • After 100 flips, the median outcome is only about 1/10 of the mean outcome! The median outcome gives an idea of how much to discount the mean payoff. If your utility function is not a log function (ie does quadrupling your wealth make you twice as happy) then a different Kelly fraction should be used