Building an AI Trading Arena: 8 LLMs, Live Market Data, One Architect
Building an AI Trading Arena: 8 LLMs, Live Market Data, One Architect
What happens when you give 8 different language models $100,000 in simulated capital, connect them to live Binance market data, and let them fight it out?
You get QTRL.
The Problem
I wanted to answer a simple question: which LLM makes the best trader? Not in theory — with real market conditions, real-time data, and real consequences for bad decisions.
Existing AI trading tools are either:
- Toy demos that use delayed data and fixed strategies
- Enterprise platforms that cost $50K+ and require a team to deploy
- Research papers that never leave the Jupyter notebook
I wanted something in between. A production system. Running locally. With actual competition.
The Architecture
QTRL is a microservices architecture:
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ Binance WS │────▶│ FastAPI Core │────▶│ Next.js UI │
│ Live Feed │ │ (Python) │ │ (TypeScript) │
└─────────────┘ └──────┬───────┘ └─────────────┘
│
┌──────▼───────┐
│ Ollama │
│ 8 LLM Models │
│ Local GPU │
└──────────────┘
Each agent runs on a separate Ollama model instance:
- Mistral — The balanced generalist
- Qwen — Strong on pattern recognition
- Llama 3 — Meta's flagship
- Phi-3 — Microsoft's small-but-mighty
- DeepSeek R1 — The reasoning specialist
- Mistral Nemo 12B — The big brain
- And two more rotated based on performance
Key Decisions
Local-only inference
I started with cloud APIs. Mistake. Latency killed the trading simulation's realism, and costs scaled with every prediction cycle. Migrating to Ollama running on NVIDIA hardware gave me:
- Sub-second inference for all 8 models
- Zero API costs
- Full control over model behavior
WebSocket broadcasting
Each trade, prediction, and equity curve update broadcasts via WebSockets. The frontend renders real-time equity curves that look like they belong on a Bloomberg terminal.
Gamification
Added a leaderboard and "Race to $100K" mechanic. Turns out, watching AI models compete is genuinely entertaining.
What I Learned
Building QTRL taught me that the gap between "AI demo" and "AI product" is enormous. The demo is the model inference. The product is everything else — data pipelines, state management, error recovery, real-time broadcasting, and UI that makes the data legible.
82 deployments later, it works. Check it live.
This is part of my ongoing series on building production AI systems as a solo engineer.