alien concert

I want to share this amazing podcast episode recommended by musician and fellow option refugee Mat Cashman:

🎙️Bowie, Jazz, and the Unplayable Piano (Spotify)

This year is the 50th anniversary of the highest-selling jazz solo album of all time. The live performance of Keith Jarrett’s Köln concert. The concert should never have happened but in his Cautionary Tales podcast, economist Tim Harford weaves it into research that makes you stop and think about the value of variance in breaking out of local maximums. [In the episode you’ll hear about fascinating RCTs of exams given in different fonts as well as a natural experiment that occurred during the 48-hour London tube strike in 2014.]

The episode, besides being highly entertaining, strikes a similar chord to Wednesday’s post obvious subtle effect. As a matter of evolution and progress, we are long vol — we discard what doesn’t work and retain what does which is how an option works — options benefit from variance. Experimentation, contrived struggle against arbitrary constraints (when you’re guitar instructor tells you — let’s see what happens if you can only play your low E string) and shower thoughts are all non-deterministic learning or growth. The source of “alien” solutions that avail us the possibility of step-change vs incremental progress. Techniques for growth that appeal to our impulse towards coherence or explainability are sensible. But their payoff is limited because their legibility means they will also be overbid.

The willingness to take risk or look foolish or just be a bit weird is paradoxically useful when you’re trying to achieve conventional success.

[Counterrealization I haven’t fully thought out — a lot of risk-coded behavior these days is just herding and therefore is unlikely to come with the rewards you’d want for the risk.]

Harford emphasizes that “injecting randomness” into computer algorithms is common practice in a quest to avoid local maximums. I asked Gemini if RL (reinforcement learning) requires injecting randomness — lo and behold the explore-exploit problem shows up:

Reinforcement Learning (RL) extensively uses concepts of injecting randomness, primarily for the purpose of exploration.

Here’s why and how randomness is used in RL:

The Exploration-Exploitation Tradeoff: RL agents need to balance exploiting their current knowledge of the environment to get rewards with exploring new actions and states to discover better strategies or higher rewards.
Preventing Suboptimal Policies: Without exploration, agents might get stuck in local optima and fail to find the best possible strategy.
Examples:
- Epsilon-Greedy Strategy: A common method where the agent takes the best-known action most of the time (exploitation), but with a small probability (epsilon), it takes a random action (exploration).
- Adding Noise to Action Outputs: In continuous control tasks, noise (like Gaussian noise) can be added to the agent’s action outputs to encourage exploration.

I’ve got a pile of dry-erase index cards scattered in my office as I try to organize a long essay of how explore/exploit relates to option theory and decision-making IRL — but the notes only seem to grow — especially when I come across something like this episode.

Nat’s tweet will need to haunt me for another day.

https://x.com/nateliason/status/1933532158489944148

On that note — Happy Father’s Day.

This McSweeney listicle caught me yesterday morning just as I was making coffee and listening to War on Drugs on Spotify’s Fleet Foxes radio (h/t

Dave Nadig

What Your Favorite Sad Dad Band Says About You

editor