Reyyan Ahmed
← All Posts
February 15, 20262 min read

Building an AI Trading Arena: 8 LLMs, Live Market Data, One Architect

AI/MLTradingArchitectureLLMs

Building an AI Trading Arena: 8 LLMs, Live Market Data, One Architect

What happens when you give 8 different language models $100,000 in simulated capital, connect them to live Binance market data, and let them fight it out?

You get QTRL.

The Problem

I wanted to answer a simple question: which LLM makes the best trader? Not in theory — with real market conditions, real-time data, and real consequences for bad decisions.

Existing AI trading tools are either:

  • Toy demos that use delayed data and fixed strategies
  • Enterprise platforms that cost $50K+ and require a team to deploy
  • Research papers that never leave the Jupyter notebook

I wanted something in between. A production system. Running locally. With actual competition.

The Architecture

QTRL is a microservices architecture:

┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│  Binance WS  │────▶│  FastAPI Core │────▶│  Next.js UI  │
│  Live Feed   │     │  (Python)     │     │  (TypeScript) │
└─────────────┘     └──────┬───────┘     └─────────────┘
                           │
                    ┌──────▼───────┐
                    │   Ollama      │
                    │  8 LLM Models │
                    │  Local GPU    │
                    └──────────────┘

Each agent runs on a separate Ollama model instance:

  • Mistral — The balanced generalist
  • Qwen — Strong on pattern recognition
  • Llama 3 — Meta's flagship
  • Phi-3 — Microsoft's small-but-mighty
  • DeepSeek R1 — The reasoning specialist
  • Mistral Nemo 12B — The big brain
  • And two more rotated based on performance

Key Decisions

Local-only inference

I started with cloud APIs. Mistake. Latency killed the trading simulation's realism, and costs scaled with every prediction cycle. Migrating to Ollama running on NVIDIA hardware gave me:

  • Sub-second inference for all 8 models
  • Zero API costs
  • Full control over model behavior

WebSocket broadcasting

Each trade, prediction, and equity curve update broadcasts via WebSockets. The frontend renders real-time equity curves that look like they belong on a Bloomberg terminal.

Gamification

Added a leaderboard and "Race to $100K" mechanic. Turns out, watching AI models compete is genuinely entertaining.

What I Learned

Building QTRL taught me that the gap between "AI demo" and "AI product" is enormous. The demo is the model inference. The product is everything else — data pipelines, state management, error recovery, real-time broadcasting, and UI that makes the data legible.

82 deployments later, it works. Check it live.


This is part of my ongoing series on building production AI systems as a solo engineer.