hurst

In a random walk where trials are independent, variance scales linearly with time. Since standard deviation is the square root of variance, volatility scales with sqrt(T).

This sublinear power law scaling gets smuggled into option math that answers practical questions. For example, assuming implied vol is constant, a 12-month ATF straddle is twice the price of a 3-month ATF straddle because sqrt (12/3) = 2.

This scaling is commonly used to convert raw vega into weighted vega. Raw vega is an extremely low-resolution number. If you own 50k 12-month vega vs being short 40k 3-month vega then it appears like you are long vol. But 12-month IV doesn’t whip around as much as 3-month IV, so this position will not act like it’s long vol on a large move higher in vol as the term structure will not “parallel shift” higher. The 3-month will increase faster as the term structure steepens into a downward sloping shape. A shape referred to as “inverted” or “backwardated”.

A simple way to modify raw vega is to scale all your monthly vegas by 1/sqrt(T) by normalizing them to a fixed DTE, for example 3 months. In that case, using the same math we did above, a 12-month vega is cut in half relative to the 3-month.

So your re-weighted vega is now short 15k vega instead of being long 10k vega!

12-month vega x scaling factor relative to 3m vega = +50k * 1/sqrt(12/3) = +25k

3-month vega x scaling factor relative to 3m vega = -40k * 1/sqrt(3/3) = -40k

Net: -15k

That volatility changes should move in proportion to 1/sqrt(T) is not a commandment brought down from Moses. It’s a convenient scaling factor that corresponds better, even if imperfectly, to empirical vol surface behavior. It also has a handy interpretation. If IV’s change in proportion to 1/sqrt(T) then ATM time spreads are unchanged (net of theta). In other words, the 3m/12month straddle spread is unchanged in such a regime.

Again, this scaling doesn’t need to hold. Sometimes we have parallel shifts in term structure and sometimes term structures steepen faster or slower than sqrt(T) scaling would predict. But the scaling is still a better prediction than the raw vega measure, which would have you believe IVs from all months are directly comparable without adjusting for how slow long-dated IVs change or how fast a weekly IV can move.

Random walks and the derivative pricing theory built upon them assume returns are independent. In hindsight, random walks still exhibit stretches that can be labeled “trend” (like a run of heads) or “mean reversion” (period of frequent alternating). But it’s one thing to label these stretches and hindsight vs predict them.

It should be self-evident that being able to predict trends or reversion would be marvelously profitable for a directional trader. But, direction aside, it would be a gift to volatility traders as well. It would influence not only how they priced vertical spreads and time spreads but the deltas in their models and their delta-hedging strategies. In other words, it would change everything if you had an edge on the probability of the next move being up or down, even if you did not have an edge on the fair value of the stock (this would occur if you had an edge on probability but not on the magnitude of up move vs down move). Option structures allow fine-grained bets that can isolate probability from magnitude.

If an asset trends over weeks or months, you will underestimate its volatility by scaling its daily volatility by sqrt(T). That makes sense. If it trended, that’s similar to saying the moves were auto-correlated and therefore dependent. Again, this is descriptive, not predictive, but relating measures of volatility to this interdependence lets us see how sensitive option pricing is to the random walk assumption. A few articles I’ve written in this vein:

how a high implied vol can be cheap | 9 min read
The Option Market’s Point Spread (Part 2) | 11 min read
Thinking In N not T | 6 min read

These articles have a unifying concern. If prices are random, then sure, the power function that specifies how volatility scales is the familiar:

But if prices trend or mean-revert, the exponent is no longer 1/2.

Over any historical sample, H can be observed to be something other than 1/2. For it to be 1/2 would mean that annualized volatility over 2 different sampling windows was identical. In hindsight, that will rarely occur. But it’s also true for any exponent you pick. It’s hard to make the persistent case for a value other than 1/2, especially when it carries the financial totem of randomness.

In Retail Options Trading, Euan Sinclair says markets aren’t random, but they’re close to random. The question of whether there’s enough life growing in the gap between “random” and “almost random” for a skilled hunter to eat is existential professional investors’ careers.

We need to examine randomness.

Returning to the context of volatility scaling and its relationship to randomness, Euan reaches for a popular quant tool. The Hurst exponent. That’s why I picked H for the exponent in the general version of the volatility power law.

Euan’s definitions:

H = 0.5 is a random walk. No memory.
H < 0.5 is mean-reverting. Up tends to be followed by down.
H > 0.5 is trending, or “persistent.” Up tends to be followed by more up.

It’s time to do some learning moontower-style and start with the basics.

What The Hurst Exponent Actually Measures

Our Favorite Starting Point: Coin Flips

Flip a fair coin 100 times. Score +1 for heads, −1 for tails, and keep a running sum.

After 100 flips, how far from zero is that running sum?

Three stylized regimes to compare:

Perfectly correlated flips (every flip copies the last one): the running sum after 100 flips is ±100. It grows linearly with N.
Perfectly anti-correlated flips (+1, −1, +1, −1, …): the running sum never escapes ±1. It doesn’t grow with N at all.
Independent flips: the running sum lands around ±√N or in this case ±10.

Think of these as regimes that correspond to three scaling exponents:

Correlated (trending) N^1
Anti-correlated (mean-reverting): N^0
Independent (random walk) N^0.5

The exponent is the answer to “what power of N does the cumulative range scale with?”

Strip out the step size to isolate the regime

The ±1 coin gave a running sum with range around √N. If the coin paid ±10 instead, the range would be 10·√N. Bigger steps, bigger range. We want to strip out that distortion. If we measured price range on raw market data, a jumpy stock would always look more “trending” than a calm one, just because its steps are bigger. We’d be measuring volatility tangled up with regime, when we want regime alone.

The fix is to divide the range by the standard deviation of the steps: R/S

For the ±1 coin, R ≈ √N and S = 1, so R/S ≈ √N.

For the ±10 coin, R ≈ 10·√N and S = 10, so R/S ≈ √N. Same answer. The step size cancels out.

That’s the rescaled range. R/S only cares about the regime of the series, not its scale.

From coins to assets

Now we can adapt this to asset returns.

So we have two measurements over a window of T days of log returns:

S = the standard deviation of the returns (the step size in the coin example)
R = the range (max − min) of the cumulative sum of the de-meaned returns. How far the running total wandered between its high and its low.

We de-mean before computing R, so we strip out drift. We don’t care that the thing went up over the window, we care how it wandered around that trend. We divide by S to strip out the volatility scale.

The √T Benchmark

If returns are independent, R/S also grows like √T for the same underlying reason:

The variances of independent things add, so the spread grows by √T.

Now generalize it. Instead of forcing the exponent to be 0.5, let the data tell you:

R/S ~ T^H

H = 0.5: matches √T. Independent.
H > 0.5: R/S grows faster than √T. Trending. Moves reinforce each other.
H < 0.5: R/S grows slower than √T. Mean-reverting. Moves fight each other.

Reading H Off A Plot

The scaled range takes the functional form of a power law. If we take logs of both sides, the power law becomes a straight line, and the exponent H becomes the slope of the line.

log₂(R/S) = H · log₂(T)

Compute R/S at a few different T’s, plot them log-log, and the slope is H. It doesn’t matter which type of log we use. We could choose log₁₀ or ln, but using log₂ gives a clean way to narrate it: every time you double T, R/S multiplies by 2^H.

H = 0.5: each doubling multiplies R/S by √2 ≈ 1.41
H = 1.0: each doubling doubles R/S
H = 0.0: each doubling leaves R/S untouched

The Implementation Recipe

Pick several T’s (say 5, 10, 20, 40).
At each T, chop the sample into non-overlapping chunks. (see appendix)
For each chunk: de-mean, cumulative sum, R = max − min, S = std dev, then R/S.
Average R/S across the chunks at that T.
Fit a line through the (log₂T, log₂(R/S)) points. The slope is H.

Worked Examples

Computing one R/S by hand

Take a single 5-day chunk of returns, in %: +1, +3, −2, +4, −1.

Mean: (1 + 3 − 2 + 4 − 1) / 5 = +1%
De-mean (subtract the mean from each): 0, +2, −3, +3, −2
Cumulative sum (running total of the de-meaned series): 0, +2, −1, +2, 0
R is the range of that running total: max − min = (+2) − (−1) = 3
S is the standard deviation of the original five returns ≈ 2.28 (population stdev, STDEV.P)
R/S = 3 / 2.28 ≈ 1.32

That 1.32 is one chunk’s R/S.

Notice that since √5 ≈ 2.24, this little stretch wandered less than a random walk would, so it reads mean-reverting

We just repeat this for several windows.

Say you’ve got 80 days of returns.

Compute R/S at T = 5, 10, 20, 40:

The Hurst exponent, H ≈ 0.43, is extracted as the slope from the log-log plot, which is is linear transformation of a power function.

H<.50 corresponds to mean-reversion. Every doubling of T multiplies R/S by 2^0.43 ≈ 1.35, a hair under the 1.41 you’d get from a pure random walk. The wandering is growing slower than random diffusion would predict.

Applications of H

If H isn’t 0.5, then √T annualization is wrong for that asset. H > 0.5 means your long-horizon vol is higher than √252 × daily vol claims. H < 0.5 means it’s lower.

The articles I linked to in the intro wrestle with this same idea but in a simpler point-to-point manner in the form of a trend ratio (ie vol sampled weekly ÷ vol sampled daily).

If you assume the asset is “self-similar,” then the exponent H governs the scaling at every horizon then besides looking for trend or mean reversion strategies you can now research a world of option relationships that are potentially mispriced if the assumption of independence is strongly embedded in volatility scaling models.

To be reductionist, my trend ratio calcs were a two-point estimate of H. Autocorrelation patches function as a lagged estimate of the same thing. Hurst is the version that uses the whole curve instead of two points or one lag.

The assumption that markets are self-similar is wrong. The more wrong it is, the less you have to gain from Hurst vs point-to-point extrapolations, but all of this is dominated by the biggest elephant in the room. Can past data help you predict trend or mean-reversion at all? Which just circles back to Euan. If you are going to bother trading, you must believe, at worst, they are merely “almost random”.

A Sense Of Proportion

H looks like a number between 0 and 1, so a move from 0.50 to 0.55 feels insignificant. The vol-annualization lens is the cleanest way to debunk that.

Consider a stock with 1% daily vol.

At H = 0.50: 1% × 252^0.5 = 15.9% annual
At H = 0.55: 1% × 252^0.55 = 19.4% annual

A 0.05 bump in H means a 22% increase in annualized vol. This obviously affects your opinion of option prices but it’s also meaningful for position sizing and risk or VaR.

Most equity-index Hurst estimates sit in a narrow-looking 0.45 to 0.55 band, but that “small” band obscures significant differences.

The Catch: The Naive Number Lies

Now go back to Sinclair’s warning, because this is where it earns its keep.

Classic R/S — the recipe above, the one in his book, the one everybody reaches for first — is biased. Run it on a series you know is a memoryless random walk, at a 252-day window, and it does not hand you back 0.5. It hands you back something noticeably higher. The estimator manufactures a little fake memory all on its own, before the data even gets a vote.

So when SPY’s rolling H sits below 0.5, you have to ask how much of that is the market and how much is the ruler. This isn’t a fringe complaint. Lo built a modified R/S statistic back in 1991 precisely because the classic version confuses genuine long memory with garden-variety short-range stuff like volatility clustering, and equity returns are drowning in volatility clustering.

The fix is not exotic. Simulate a big pile of random walks the same length as your estimation window, run the exact same R/S recipe on them, and see what H the estimator coughs up on data you built to have none. Whatever offset it shows is the lie. Subtract it. Now a true random walk reads 0.5, and a reading that survives the correction is one you can actually look at.

This is the same humility you already preach about your own VRP work. A single rolling-window H is one draw. Treating it as gospel is exactly the “sample size of 1” trap. Calibrate it or don’t believe it.

Sandbox

I’ve heard of many traders, including option traders using Hurst in their research. It feels like it’s accelerated in the past 5 years. I didn’t take a harder look at it until Euan gave a brief intro to it in Retail Options Trading and LLM’s made it easier to tutor yourself on a quant method. It’s a technique that’s well-known, but anecdotally I’ve heard a wide range of mileage from it (I’m guessing every pro option trader in a seat today has at least heard of it in trading contexts).

If autocorrelation adnrealized vol ratios at different frequencies are worth looking at then Hurst is worth at least “spaghetti on the wall”. I built a Jupyter notebook to tinker using yfinance data. You can use it, fork it, whatever:

https://colab.research.google.com/github/Kris-SF/data-pipelines/blob/main/quant-analysis/hurst_analysis.ipynb

If I were to bring this “in the lab” to see how it can become a metric or even signal I’d start with tinkering to see how it its output jives with my intuition of how a certain asset behaved over a particular period.

Once I had a feel for it, I’d throw the metric up on a scatterplot against other metrics to develop a sense of what is normal. Are there any correlations between H and IV skews or IV term structures? How do changes in Hurst coincide with changes in realized vol (rv is an input to R/S therefore and ultimately H so maybe we are hunting for a residual variable to track?)

If you have organized data, in the world of LLMs all of this work is more fun and faster. For now, I hope this primer on Hurst was a digestible first step for explaining the theory behind it and why it can be relevant.

You can find additional notes below.

Appendix: What “chop into non-overlapping chunks” really means

T is a window length, just how many days of wandering you measure at once. You pick several because H isn’t a property of any single window. It’s the rate at which R/S grows as the window lengthens. A handful of T’s gives you points to fit a slope through.

You have 251 daily returns. You want one number, H. That’s the entire goal.

Pick a few window sizes: 5, 10, 20, 40.

For each window size you do the exact same thing:

T = 5: chop the 251 days into back-to-back groups of 5. You get 50 groups. Compute R/S for each group, then average all 50. That’s your R/S at 5.
T = 10: chop into groups of 10. You get 25 groups. R/S for each, average them. R/S at 10.
T = 20: groups of 20, so 12 groups. Average. R/S at 20.
T = 40: groups of 40, so 6 groups. Average. R/S at 40.

Now you have four points: (5, R/S@5), (10, R/S@10), (20, R/S@20), (40, R/S@40). Plot them log-log, draw the best-fit line, and the slope is H.

You want enough windows to fit a line, but longer windows are comprised of fewer blocks (like the T=40 window) so they’re shakier sample from which you are computing an average R/S.

Appendix: Bias

The body said classic R/S reads high on a random walk.

The finite-sample problem

Even on a true coin-flip walk, R/S over a short window doesn’t average to exactly √T. It sits a little above. Hurst, Anis, and Lloyd worked out the expected R/S of a random walk in closed form back in the 70s, so one fix is to divide your measured R/S by that expected value at each T before you fit. It’s conceptually similar to the familiar Bessel n−1 adjustment done to sample variance since we don’t know the true population variance.

Claude suggested 2 ways to apply a correction:

Use the closed-form expected R/S directly
Simulate a pile of random walks and measure what your exact regression spits out.

They differ because the log of an average isn’t the average of a log (Jensen’s inequality). The closed-form route leaves a residual bias of a few hundredths. The simulation route, because it runs the identical regression you use in practice, lands a true random walk back at 0.5.

After much back-and-forth, I took Claude’s rec and had the notebook use the simulation route.

The nice thing about LLMs is they know a lot of the academic history of a measure. Like I said this is a starting point for your own exploration.

Better estimators exist.

Classic R/S is the cleanest to teach and the weakest to trade. Lo’s modified R/S (1991) is built to ignore short-range dependence like volatility clustering, which plain R/S happily mislabels as memory. Detrended Fluctuation Analysis (Peng et al., 1994) is the workhorse in the econophysics literature. If you ever size a position off an H, cross-check it with one of those rather than lean on R/S alone.