AI Traders - Party at the Moontower

Any moontower.ai subscriber can prompt our trained agent. Even if you aren’t a sub you can give it a try for free. Our team plans have included an API but we just launched an MCP allowing users to connect their own AI’s to our API endpoints.

Article content — https://x.com/KrisAbdelmessih/status/2070224748848742758?s=20

This gives users maximum flexibility. We are tuning our agent on a regular basis, but if you prefer your own tool stack and AI you have that choice now.

We use evals for automatically RLHF’ing Moontower Agent and I also have a manual process where I give the agent and the MCP (using Claude Code) the same prompt, and then judge them myself. Very old-fashioned. I’ll share more about what we’re learning from this in the future, but in the meantime, here’s a relevant article from the market-making firm Optiver:

Where AI Trading Models Work and Where They Still Fall Short (4 min read)

Optiver’s Applied AI team did a different kind of eval. They gave several leading large language models the same assessments they give human interns and junior traders.

The results indicate where LLMs excel…

grasping trading theory
calculating fair value
recognizing risk

…and where they still stumble:

multi-step reasoning
updating beliefs on the fly
maximizing expected value under pressure

Even before AI was dominating the conversation, traders have always been obsessed with learning from data. A common example is in transaction analysis. Looking at the trades you did filtered by counterparty, venue, method (ie voice/electronic) as you suss out where you are most likely to be adversely selected. This is a hard problem even with structured data. For example, it might be straightforward to filter by how you do against live option orders (as opposed to delta neutral packages), but there are so many possible permutations. Should I consider how the quote was framed before the order came in? Do I treat a resting order differently than if I’m hit or lifted? Does time of day matter?

But now consider the scope of the unstructured data problem. The counterfactual. The order a broker showed me, I passed on and proceeded to trade without my participation. You’d need to record every phone call (actually this is already done for compliance reasons. In fact, when I interned at a bank in 1995 one of my tasks was to change the giant reel of tape!). But you’d need to link the audio of what the order was to the print when it hit the tape. Or track the fact that it never even traded. It’s like tracking the p/l of a non-trade that could have been. With transcription so cheap, this is feasible now, but it wasn’t when I was thinking about it. You could have traders note when they passed on a trade, but this would be so tedious that it was always a non-starter on a high-volume market-making desk.

My guess is that some trading shops might be doing things like this now (if not, you’re welcome for the idea). But this Optiver article made me wonder when trading rooms will be mic’d up. Jarvis listening to all the conversations, meetings, and debates to cheaply turn unstructured data to structured data.

Your voice, its quiver, your cadence, your pauses, your keystrokes, your glances, your heart rate. Insofar as humans will still be trading, it’s hard to imagine the data obsession that’s already penetrated the MLB not make its way to desk talent.

You’ll know singularity is close when the employee handbook stipulates bathroom breaks as the only acceptable cause to remove your electrodes. Buy stock in Gillette. Every man on a W2 will need to shave their chest for a clean connection.

Elm Wealth let AI compete with humans in their popular Crystal Ball Challenge. You can give it a try yourself:

https://crystal-ball.elmwealth.com/

Elm’s founder Victor Haghani:

A couple of weeks ago we let you loose on our Crystal Ball Challenge: tomorrow’s headlines, $1 million to trade in stocks and bonds, and four AI models to beat. Humans showed up in force, logging thousands of plays and adding over 1,500 entries on the leaderboard.

Here is how the AI models are doing against human players so far:

– Claude: winning 65% of the time
– ChatGPT: 50%, a coin flip
– ️ Grok: 43%
– Gemini: 40%

Both the Wall Street Journal and The Economist covered the experiment this month, and both keyed on the same finding: the AIs are great at reading market-moving news, but they struggle to size their bets appropriately. Knowing what to trade turns out to be the easy part. Knowing how much is what trips them up.

If you have not played yet, three of the four AIs are losing more than half their matchups. Pick your fight. If you have played but not lately, your spot on the leaderboard might no longer safe.

And finally, just before I scheduled this to send out I came across Dwarkesh’s:

The next big breakthrough will be AIs learning on the job

Subtitle: “Labs are throwing away the most valuable data”.

transcript

Related

Share this:

Like this:

Leave a ReplyCancel reply

Discover more from Party at the Moontower