How Math Is Sufficient To Explain Small Stock Outperformance

Takeaways from Diversification, Volatility, and Surprising Alpha by Fernholz et al. (Link)


  • It has been widely observed that capitalization-weighted indexes can be beaten by surprisingly simple, systematic investment strategies including equal and random-weighted portfolios.
  • This outperformance is generally attributed to beneficial factor exposures.
  • It turns out this outperformance needn’t invoke factors. It can be explained by stochastic math where correlation and variance play a larger and more predictable roles than returns.
  • Portfolio logreturns can be decomposed into an average growth and an excess growth component. They argue the excess growth component plays the major role in explaining the outperformance of naıve portfolios.

Some basics

Let’s establish some basic definitions.

Stock Returns

There are 3 types of returns commonly used to describe growth rates. But they are not equal.

Arithmetic returns > Geometric Returns > Log Returns

This is important because only logarithmic returns are an unbiased estimate of expected long term returns.  In other words, arithmetic and geometric returns will overestimate expected growths in wealth.

  • Logreturn of an asset= Arithmetic return – .5 * variance
  • .5 * variance is known as the volatility drag or variance drain. I’ve discussed this here in simple terms.

Portfolio Returns

The logreturn of a portfolio can be decomposed into 2 components:

Weighted avg of stock logreturns  + “excess growth rate” (aka EGR)

Understanding the EGR

EGR = (weighted average stock variance – portfolio variance) / 2

The relationship between stock variance and index variance

Now this part is not in the paper, but taking from my index options experience:

Portfolio variance = weighted average stock variance * average cross-correlation of the stocks

This is a common identity used to price index options. It makes intuitive sense.

  • If all components of the portfolio has a correlation of 1 the portfolio variance would be the same as the underlying stocks.
  • If you had a 2 stock portfolio and the correlation were -1 the portfolio variance would be zero. Iimagine a basket comprised of 50% SPY and 50% inverse SPY. It would never move in price (assuming no fees, frictions, etc) regardless of how high SPY variance was.
  • For an average correlation < 1,  the portfolio variance must be less than the average weighted stock variance.

The key insight: the lower the average correlation between the components the wider the spread between the portfolio and weighted average stock variances!

Back to the paper…

Observations about the EGR

Looking at the formula again:

EGR = (weighted average stock variance – portfolio variance) / 2

  • EGR boosts the portfolio returns beyond that of its components since portfolio variance < weighted average stock variance
  • EGR boosts portfolio returns with lower correlations
  • EGR boosts portfolio returns with high stock variance

Relationships Between Market Cap, Logreturns, and Variance

The authors then use a rank based computation to show:

  • Logreturns of individual stocks do not vary by market cap.
  • Variances of individual stocks do vary by market cap. Smaller stocks are more volatile.

This prompts the great reveal:

Small stocks don’t have higher returns but have higher variances which boost EGRs. The volatility and interaction of the stocks is boosting the portfolios that contain them without any need to rely on factors! The increased volatility of the individual stocks did not earn them a risk premium when considered in isolation, but at the portfolio level they contributed to excess growth.


  •  The authors contend that the excess growth component can be estimated relatively easily, since its value depends only on variances, or relative variances, which are not difficult to determine in practice. The average growth component, however, is more difficult to estimate. 
  • Small stocks are riskier and while this might mean higher single period arithmetic returns long term investors care about logreturns. In logreturn space, individual stocks don’t contribute excess returns. This is at odds with conventional wisdom.
  • Instead, the excess returns are coming at the portfolio level via the small stocks’ contribution to “excess growth rates” (EGRs).
  • They tested the expectations of this stochastic portfolio math on 5 commonly employed weighting strategies, some more diversified and some less diversified than the capitalization-weighted portfolio, confirmed these insights. In general, the more diversified portfolios outperform and the single less diversified portfolio underperforms, because the more diversified portfolios have a higher excess growth rate. This arises from the higher variances associated with the smaller stock exposure in these more diversified portfolios, and not because such stocks have inherently higher returns. This higher excess growth rate, in turn, increases the portfolios’ logarithmic return.

My Comments

  • The role of low correlation was not emphasized enough in the paper considering it drives the EGR by setting the gap between portfolio and average weighted stock variances.
  • You should read the paper if you’d like a refresher on computing arithmetic, geometric, and logreturns.
  • You should read the paper to see how they computed rankings since this work established that there was no relationship between logreturns and the market cap of a stock.

Your Portfolio Intuition Is Poor

Summary and takeaways from Bridge Alternatives’ Portfolio Intuition (Link)

Intuition Test


  • Your current portfolio has 5% return and 15% volatility for a Sharpe ratio of .33
  • You want to allocate 10% of your portfolio to a prospective asset
  • You want to maximize the Sharpe ratio of the resulting portfolio

Choose between A1 and A2

A1 A2
Return 4.00% 4.00%
Volatility 7.96% 46.04%
Correlation -.20 -.20

Unsurprisingly, most people prefer A1 since it has the same attributes as A2 with 1/6 the risk.

Now let’s run the numbers 

Expected return of the new portfolio is the same whether we choose A1 or A2:

Volatility of the new portfolio if we choose A1:

Sharpe ratio of original portfolio = .33

Sharpe ratio when we add A1 = .049/.13363 or .3667

The Sharpe ratio improved by about 10%

Now what is the Sharpe ratio if we add A2 instead of A1.

First, we must compute the volatility. Go ahead, plug and chug…

That’s right, the volatility is the same!

The volatility of the new portfolio is the same whether we add A1 or A2 which means the new combined portfolio has the same improvement to Sharpe whether we add A1 or A2. This is true despite A2 having a far worse Sharpe than A1! It is counterintuitive because portfolio math and the role of correlation is not intuitive.

To see why, look at the formula for portfolio volatility:

Let’s zoom in on the last 2 terms which come from adding the second asset:

Plot of change in overall portfolio volatility vs volatility of prospective asset (A1 or A2)

As we increase the asset’s risk, the first term grows exponentially, and the second term shrinks linearly (remember, the correlation is negative). It turns out that, at least temporarily, the shrinking effect from the negative correlation outweighs the exponential term.

There are 2 observations to note once you are done reeling from the bizarre impact of correlation.

  1. When adding a negatively correlated asset to a portfolio its risk must be incredibly high before it starts to degrade the Sharpe ratio of the final portfolio.
  2. Notice how, at least until we hit the vertex, if we move from left to right, representing an increase in risk, we’re actually reducing return. Put differently, if we added risk and didn’t reduce return we’d deliver more than a 10% improvement; risk has a positive payoff here, which is very cool. There is a significant range where we are reducing the prospective assets’ Sharpe and actually reducing the volatility of the new portfolio.

More Preference Tests

B1 B2 C1 C2 D1 D2
Return 10.54% 3.57% 9.33% 6.50% 6.43% -2.64%
Volatility 20.00% 20.00% 27.50% 12.50% 10.00% 40.00%
Correlation .80 -.20 .40 .40 .50 -.60

Most people agree:

  • B1 was slightly preferred to B2. For the same risk, B1 delivers much more return, though B2’s correlation is better.
  • C2 was preferred. It’s Sharpe is higher (about 0.52 versus about 0.34).
  • D1 was preferred to D2. D1’s Sharpe ratio is much higher. D2’s return is negative

The punchline, of course, is that every one of these assets improves the Sharpe of the portfolio by the same 10%. Your intuition would tell you would prefer a portfolio in the upper left green box since those assets have the best Sharpe (risk/reward), so it is probably uncomfortable to learn that the final portfolio is mathematically indifferent to all of these assets.

Correlation Is The Key

Here’s the same plot relating these equivalent portfolios by their respective correlations

As the correlation drops (corresponding to lines of “cooler” coloring), less return is required to deliver the same 10% improvement!

While Sharpe ratios are “mentally portable”, they are shockingly incomplete without being tied to correlation. To create a compact formula which links Sharpe ratios with correlation, it is helpful to view indifference curves.

Indifference Curves

RRR= Sharp Ratio of prospective asset
RRRb = Sharp Ratio of original portfolio

If Relative RRR > 1 the Sharpe of the prospective asset is greater than the Sharpe ratio of the original portfolio

The indifference curve represents an equivalent tradeoff between Sharpe ratio and correlation for various mixing weights. For example, the light green line assumes you will allocate 20% of the original portfolio to the prospective asset.


  • As the weight allocated to the asset increases (the lines move upward, from green to purple), the asset must be more performant in order to do no harm; it must be better relative to the portfolio. Put differently, as the role played by the asset increases, more is required of it, and that sounds about right.
  • A less performant asset, ie one with a worse Sharpe ratio than the original portfolio can compensate with low or negative correlations

Getting Practical

The investor’s natural question when evaluating a new asset or investment is:

“What is required from an asset (in terms of return, risk and correlation) in order to add value to my portfolio?”

With math that can be verified in the paper’s appendix we find a very handy identity:

This equality describes what’s required, in an absolute bare-minimum mathematical sense, of a prospective asset in order to do no harm. 

How to use it

For a given prospective Sharpe ratio, you very simply compute the maximum correlation the new asset can have to be accretive to the portfolio. For example, if the prospective asset has a Sharpe ratio of .10 and the original portfolio has a Sharp ratio of .40 then the prospective asset requires correlation no greater than .25 (ie .10/.40).

For a given correlation, you can compute the minimum required Sharpe ratio of the new asset to improve the portfolio. If the correlation is .80 and the original portfolio has a Sharpe ratio of .70 then the prospective asset must have a Sharpe ratio of at least .56 (ie .80 x .70).

Insights and Caveats

  • Correlation is best understood as a sort of performance hurdle. For assets exhibiting low correlation, less is required of their standalone performance (i.e. return over risk), all else equal.
  • Prospective assets with a Sharpe ratio greater than the original portfolio are always additive.
  • If you happen to find a truly zero-correlation asset it will be additive as long as it has positive returns. And as we saw with asset D2, a negative Sharpe Ratio asset can be additive if it has a negative correlation!
  • This cannot be used to somehow rank prospective assets. It can only serve as a binary filter: yes or no. This might feel like a real limitation. Sharpe ratios are absolutely rankable. They are measurements of the same unit (risk). But as we’ve shown in this paper, those rankings are not indicative of their true value within the context of a portfolio. Making decisions based only on return and risk is like ranking runners based on their times without asking how far they ran. It doesn’t make sense. If you take away one thing from this paper, this should be it!

My Own Conclusions

  • Correlations make portfolio math extremely unintuitive.
  • Negative and low correlations can make poor or losing stand-alone investments great additions to a portfolio. The implications for the diversifying power of low or negative-yielding assets are significant. Bonds, cash, commodities, gold.
  • Highly volatile assets with a negative correlation are tamed and even subtractive to the total risk of a portfolio.
  • While the importance of low or negatively correlated assets is well known it’s possible it remains underappreciated.

Further reading

Breaking The Market’s outstanding post Optimal Portfolios For Two Assets

You will learn:

  • How to mix assets by comparing their geometric returns.
  • Correlation’s effect on portfolio construction is not linear.
    • The closer correlations are to 1 the more they impact the recommended mix.
    • Negative correlations are deeply valuable in portfolio construction, adding to the long term return. Positive correlations are harmful, limiting the benefit of diversification.
    • The mixing range for the geometric returns is the combination of each asset’s variance, expanded or contracted based on the correlation between the two assets.
    • Negative correlation is wonderful.


You can save your own copy here

You can also play with the numbers directly below

Lesson from coin flip investing

The setup

  • You invest in 2 coins every week for the next 1000 weeks (19.2 yrs)
  • These coins pay a return each week
  • Every 4 weeks, you rebalance wealth equally between the 2 coins
  • Coins have an expected edge of 10%
  • Simulation is run 10,000x
  • Assume no transaction costs

Individual Coin Payouts

Coin Win Payout Loss Payout Expected Annual Return Expected Annual Volatility
A(Low Vol) 2.75% 2.50% 6.70% 18%
B (High Vol) 8.25% 7.50% 21.5% 54%

Results of the 2 Coin Portfolio1

Strategy CAGR Volatility Median Return Max Drawdown
Theoretical  14.1% 28.5% 10%2
Un-rebalanced simulated 17.9% 32% 6% 68%
Rebalanced simulated 13.9% 30% 9% 64%

Observations from many simulations like the one described

  1. The higher the portfolio volatility, the more the mean and median diverge
  2. Rebalancing pushes median returns closer to the theoretical mean
  3. The rebalancing benefit is positively correlated to the difference of volatility between the coins

How much to wager when you have edge? (Hint: median not mean outcomes!)

Link: Rational Decision-Making under Uncertainty: Observed Betting Patterns on a Biased Coin

  • Optimal bet size as a fraction of bankroll is 2p-1 where p is the probability of winning1. You will recognize this as the edge per trial reported as a percent. So a 60% coin has 20% edge.
  • The formula is a solution to a proportional betting system which implicitly assumes the gambler has log utility of wealth

Imagine tossing a 60% coin 100x and starting with a $25 bankroll

Arithmetic Mean Land

The mean of one flip is 20% positive expectancy.

Optimal bet size is 20% of bankroll since you have .20 expectancy per toss

Increase in wealth per toss betting a Kelly fraction: 20% of bankroll x .20 expectancy = 4%

Expected (mean) value of game after 100 flips betting 20% of your wealth each time

$25 * (1+.04) ^ 100 = $1,262

Median Land

The median of one flip betting a Kelly fraction is (1.2^.60 * .8^.40 – 1) or 2%

Median value of game after 100 flips betting 20% of your wealth each time

25 * (1.2^60) * (.8^40) = $187.25!

Things to note

  • The median outcome by definition is the increase in utility since Kelly betting implicitly assumes the gambler has log utility
  • After 100 flips, the median outcome is only about 1/10 of the mean outcome! The median outcome gives an idea of how much to discount the mean payoff. If your utility function is not a log function (ie does quadrupling your wealth make you twice as happy) then a different Kelly fraction should be used

Percents Are Tricky

Which saves more fuel?

1. Swapping a 25 mpg car for one that gets 60 mpg
2. Swapping a 10 mpg car for one that gets 20 mpg

[Jeopardy music…]

You know it’s a trap, so the answer must be #2. Here’s why:

If you travel 1,000 miles:

1. A 25mpg car uses 40 gallons. The 60 mpg vehicle uses 16.7 gallons.
2. A 10 mpg car uses 100 gallons. The 20 mpg vehicle uses 50 gallons

Even though you improved the MPG efficiency of car #1 by more than 100%, we save much more fuel by replacing less efficient cars. Go for the low hanging fruit. The illusion suggests we should switch ratings from MPG to GPM or to avoid decimals Gallons Per 1,000 Miles.

Think you got it?

Give “deflategate” a go. The Patriots controversy brought attention to a similar illusion — plays per fumble versus fumbles per play.

If you deal with data analysis you have probably come across the problem of normalizing data by percents and the pitfalls of dividing by small numbers (margins, price returns, etc).

The MPG vs GPM illusion is more clear if you are comfortable with XY plots from 8th grade math recap. Look at the slopes of x/1 versus 1000/x (in this case think of Y=M/G and the recipricol as gallons per mile. I multiplied gallons/mile by a constant 1000 to make the graph scale more legible).

The Volatility Drain

I don’t want to torment you this week, but if you trust me play along and you’ll be paid off with some non-obvious lessons.

Imagine the wish you made on your 10-year-old birthday candles comes true. You are magically given $1,000,000. But there’s a catch. You must expose it to either of the following risks:

1) You must put it all on a single spin at the roulette wheel at the Cosmo. You can choose any type of bet you want. Sprinkle the wheel, pick a color, a lucky number, whatever you want.


2) You can put all the money in play on a roulette wheel that has 70% black spaces. Place any bet you want, but you must bet it all. And one more catch…you are required to play this roulette wheel 10x in a row. Your whole bankroll including gains each time.

Think about what you want to do and why. Even if you cannot formalize your reasoning, take note of your intuition. I’ll wait.

Let’s proceed.

First of all, the correct answer for anyone without a private jet is #1. Just spread your million evenly, pay the Cosmo its $52,600 toll and try not to blow the rest of it before you get to McCarran. For many of you who computed the positive expected value of option #2 then you might feel torn.

Welcome to a constrained version of the St. Petersburg paradox.

The expected value of a single spin with a million dollars spread over the favorable blacks is $400,000 (.70 x $1,000,000 – .30 x $1,000,000). A giant 40% return.

But if you are forced to play the game 10x in a row, there is a 97% you will lose all your money (1-.70^10).

What’s going on?

This problem highlights the difference between arithmetic or simple average return vs a compounded return. If you made 100% in an investment over 10 years, the arithmetic average would be 10% per year while the compounded annual return would be 7.2%. I won’t demonstrate the math, but you can always ask me or just Google it. The mechanics are not the point. An understanding of the implications will be, so hang on.

In option #1, you will be in simple return land. In option #2, you are in compounded return land. Compounded returns are not intuitive, but they are much more important to your life. Let’s see why.

Sequencing and the geometric mean

  • Compound returns govern quantities that are sequenced such as your net worth or portfolio. If you earn 10% this year, then lose 10% next year, you are net down 1%., right? While the arithmetic average return was 0% per year, your compound return is -.50% per year (.99^2 – 1).
  • Let’s thicken the plot by increasing the volatility from 10% to 20%. If you win one and lose one, your arithmetic mean is 0, but now your compound return is -2% per year. Interesting.
  • Let’s turn to Breaking The Market  to see what happens when we tilt the odds in our favor and really ramp the vol higher. In his game, a  win earns 50%, while a loss costs you 40%.
    • The expected value of betting $1 on this game is 5%. But this is the arithmetic average. The geometric average is a loss of 5%!
    • If you played his game 20x, your mean outcome is positive but relies on the very unlikely cases in which you have an almost impossible winning streak. You usually lose money.
    • As BTM explains: Repeated games of chance have very different odds of success than single games. The odds of a series of bets – specifically a series of products (multiplication)- are driven by, and trend toward, the GEOMETRIC average. Single bets, or a group of simultaneous bets -specifically a series of sums (addition)-, are driven by the ARITHMETIC average.

The most important insights to remember!

  • Arithmetic means are greater than geometric means; the disparity is a function of the volatility.
  • Mean returns are greater than median and modal returns (Wikipedia pic). In other words, even in positive expected value games, if the volatility is high and you bet the bulk of your bankroll, your most likely outcomes are much worse than the mean. 

Using this in real life

Step 1

Recognize compounded returns when you see them. We have already seen them in the domain of betting and investing. 

Consider these questions.

  • I want to raise the price of my product by 60%, how many customers can I lose while maintaining current revenue?
  • If CA experiences a net population outflow of 20% in the next 20 years, how much would it need to raise taxes on those that stayed behind to make up the shortfall?
  • If muscle burns 2x as much calories at fat and I lose 40% of my muscle mass, how much less calories will I burn while at rest?

After groping around with those you may have found the general formula:  X / (1-X)


If you lose 20%, you need to recover 25% to get back to even. Lose 50%, and you need 100% to get back to even. 100% volatility and you are certain to go broke. Look at the slope of that sucker as you pass 2/3.

In other words, negative volatility is a death spiral. Let the brutality of the math sink in.

Why has nearly every real estate developer you know went bust at some point? Because they are in the most cyclical business in the world and love leverage. Leverage amplifies the volatility of their returns by multiples. Compounded returns are negatively skewed. Mercifully for them, zero (aka bankruptcy) is an absorbing barrier.

Step 2

Protect Yourself

  • Diversify your bets. In the earlier casino example, if you could divide your million dollars into 10 100k bets you would now have a basket of uncorrelated bets. If you could bet 1/10th of your bankroll on 10 such wheels you’d expect to make 400k in profit (7 wins out of 10 spins). With a standard deviation of 1.45 you now have a 95% chance of getting at least 5 heads and breaking even on the bet instead of a 97% to go bust in the version where you bet everything serially.
  • When a bet is very volatile, reduce your bet size. If you put 100% of your net worth into a 20% down payment on a home you lose half your net worth if housing prices ease 10%. In investing applications, variations of Kelly criterion are good starting points for bet sizing.
  • Remember that for parallel bets to not be exposed to disastrous volatility, your investments must not be highly correlated. Having a lot of investment in the stock market and high beta SF real estate simultaneously is an illusion of diversification. Likewise, if you own 10 businesses, you will likely want them in separate LLCs. For those in finance, you will immediately recognize the divergence in interests between a portfolio manager of a multi strat fund and the gp of the fund. Izzy Englander wants his strategies to diversify each other while he gets paid on the assets, while the individual PM wants to take maximum risk. Izzy risks his net worth, the PM just her job. If you take one thing away from this paragraph: a basket of options is worth more than an option on a basket.
  • Insurance is by necessity a negative expected value purchase. You buy it because it ensures financial survival. In arithmetic return land it’s a bad deal, but if the insurance avoids ruin, it may have a profoundly positive effect on compounded returns which is what we actually care about.
  • Finally, the power of portfolio rebalancing. If you hold several uncorrelated assets, by rebalancing periodically you narrow the gap between the median and mean expected returns. This is more apparent if there is wide differences in the volatilities of your assets.
    • I ran a bunch of Monte Carlo sims on “coin flip assets” with positive drift. Some takeaways were a bit surprising.
      • If the volatility of your portfolio is about 9% per year, median returns are about 90% of the mean returns. At this level of volatility, rebalancing has little effect.
      • If the volatility of your portfolio is about 15% per year, median returns are about 50% of the mean returns if you rebalance.
      • Rebalancing actually lowers your mean returns when the volatility of the portfolio is high even though it raises the median. My intuition is by taking profits in the higher volatility assets it truncates the chance of compounding at insane rates, but it also cuts the volatility by so much that it provides a much more stable compounded return. The higher the volatility the more of the mean return is driven by highly unlikely right upside moves.
      • The impact of high volatility is stark. It is extremely destructive to compounded returns.
For finance folk and the curious
  • Compounded returns are negatively skewed. Black-Scholes option models use a lognormal distribution to incorporate that insight. The higher the volatility, the greater the distance between the mean and mode of the investment. Example pic from Quora.
    • A recollection from the dot com bubble. Market watchers like to say the market was inefficient. The options market would disagree. Stock prices and volatilities were extremely high reflecting the fact that nobody understood the ramifications of the internet. Had you looked at the option-implied distributions is was not uncommon to see that a $250 stock had a modal implied price of $50. To be hand-wavey about it, the market was saying something like “AMZN has a 10% of being $2050 and a 90% chance of being worth $50.” In other words, if you bought AMZN there was a 90% chance you were going to lose 80% of your money. If you are itching to get technical on the topic Corey Hoffstein’s paper explores how risk-neutral probabilities relate to real-world probabilities.
    • For option wonks, (assuming no carry costs) you’ll recall the concept of variance drain. The median expected stock price is S – .5 * variance. The mode is S – 1.5*variance. The higher the variance, the lower the median and mode! The distribution gets “squished to the left” as the probability the stock declines increases in exchange for a longer right tail like we saw during the dotcom days.
    • The expensive skew embedded in SPX option prices reflects 2 realities. First, the average stock in the index will see its volatility increase but more critically the cross-correlation of the basket will increase. Since index option variance is average stock variance x correlation, there is a multiplicative effect of increasing either parameter. The extra rocket fuel comes from the parameters themselves being positively correlated to each other.

Levered ETF/ETN tool

Use this tool to estimate how much a levered fund would need to buy or sell to maintain its mandated levered exposure. You should make a copy of the sheet for your own use.

A few points to consider:

  • AUM changes faster than the position size by the amount of the leverage factor
  • Inverse funds require 2x the adjustment of their long counterparts! So a levered inverse SPY fund would require 2x the adjustment of a levered long SPY fund.
  • For more detailed explanation of why funds must adjust their positions see my explanation of shorting.

Preview below:

Are car leases confusing?

We leased a Toyota Highlander this year and found leases a bit trickier than meets the eye. Let’s have a look.

Think of a lease payment as having 2 components:
  1. Depreciation. For a 36 month lease, you will use ‘consume’ the vehicle
  2. A loan. During the 36 months, you are borrowing the amount of vehicle that has not been depreciated. A bank actually owns the vehicle and you are borrowing the yet “unused” portion of it. This is the portion you pay interest on. Lease lingo calls it a “money factor”. To convert a money factor to APR just multiply by 2400.
So how do you evaluate the cost of a lease?

There are 3 levers (sales price, residual and money factor) the salesperson can play with and the interaction of the levers is what makes shopping leases complicated. The “residual value” is the buy-out price of the lease at the end of the term, and is represented as a percentage of the purchase price.

Let’s pretend you are looking at a $50,000 car for a 36-month lease. Let’s say the residual is $30,000 or 60% and the money factor is .00001 (ie 2.4% APR). Your lease payment has a depreciation component of $20,000, the amount of car you will consume, divided over 36 months plus a financing charge of 2.4% divided by 12 months times the amount of car remaining. These numbers average out to $555 for the depreciation and $80 for the financing charge for a total payment of $635 per month.

It’s typical to want a higher residual which translates into less depreciation but here’s the catch — the higher the residual, the more car you are “borrowing”. So a high residual AND a high money factor will lead to smaller depreciation expenses but HIGHER financing charges making the lease more expensive than a lower residual lease.

You may find that if you buy the car outright you get quoted a different price. It may be the case that you could buy the car and sell it after 3 years, effectively creating your own lease, with more favorable economics but remember that the lease is an option to buy or as I prefer to call it — an option to sell (you can “put” it back to the dealer). For the finance inclined, it’s actually a put struck at the residual value. If the car is worth more in the secondary market in 3 years than the residual, you will buy the car at the residual and flip it. If not, you will simply ‘put’ it back to the dealer.

Here’s my spreadsheet allowing you to compare leasing, buying, and what we like to do — a one pay lease, where you make all the payments up front in exchange for a lower money factor. I put a field for “savings account rate” in it to be complete about your opportunity costs when borrowing less money. The calculator assumes no money down and no taxes. Taxes are state specific, and if you do put money down then whether it improves or detracts from the economics depends on whether your loan amount is at a higher or lower rate than your savings account.

This table shows the interaction between money factor and residual. For a given APR you always want a higher residual. It gets trickier when comparing across both axes.