Prediction markets Kalshi and Polymarket have gained significant attention, drawing regulatory scrutiny and sparking viral social media claims about AI-driven trading success. However, a new study published in the Cornell University archive arXiv suggests that turning AI loose on these markets isn’t as profitable as it seems.

Researchers at Arcada Labs, through its Prediction Arena benchmark, tested six frontier AI models by allocating each $10,000 to trade on prediction markets over a 57-day period earlier this year. The study evaluated how these models handled real-time information and decision-making on platforms like Kalshi.

“We wanted the most realistic evaluation in the world on whether models could make real-time decisions,” says Grace Li, co-founder of Arcada Labs and co-author of the study.

The goal was to assess how AI could process real-time information, make decisions, and be rewarded based on the contrarian nature of those decisions. The results were underwhelming for investors.

Within the 57-day period, every model lost money—between 16% and 30.8% on Kalshi. On Polymarket, losses were smaller over a shorter timeframe. Li attributes this difference to the flexibility of trading environments: models had access to a broader range of markets on Polymarket, while Kalshi restricted them to a predefined set of 26 markets.

“On Polymarket, the models have access to trade on any market,” Li explains, whereas on Kalshi “they’re starting up with just a set of 26 because we had to explicitly list the markets.” She adds, “We didn’t realize just how big of an impact giving the models free range to pick their own markets would have.”

This discrepancy may explain why some social media posts boast of AI’s trading success—claims that Li suggests might not be entirely exaggerated.

Li notes that on Polymarket, “right now [LLM trading] is actually living up to the hype,” citing recent internal tests where Opus 4.6 made “a couple of phenomenal trades recently.” However, she cautions that these successes don’t equate to get-rich-quick schemes. Instead, they signal the growing potential of increasingly autonomous models.

“We actually imagine the models to improve steadily over time, overtaking the human baseline,” Li says, “until AI hedge funds become a thing of the norm.”

Yet her focus isn’t solely on financial gains. “We are less interested in what is the absolute economic gain from this capability, and more interested in what does this added unit of intelligence mean for humanity,” she adds.