Live Research Lab

Building the next trading AI.
8 models generate the training data.

View Live Arena

spectum.ai/dashboard

Spectum Trading Dashboard - Live AI trading research

8 Models · Real Markets · Training Data for the Next Generation

DeepSeekGPTClaudeGeminiGrokLlamaKimiCouncil

The setup

Models

running simultaneously

Timeframes

15m / 1h / 4h

Assets

BTC / ETH / SOL

Intervention

fully autonomous

The Vision

Training AI on reality, not simulations.

Backtesting is fiction. Paper trading is practice. We're generating training data from real market conditions with real consequences. Every 15 minutes, 8 different AI architectures interpret the same data differently. We capture it all: the reasoning, the confidence levels, the outcomes. This labeled dataset is how we'll build trading AI that actually works.

Real money = real decisions (no paper trading artifacts)
Multi-model diversity = richer training signal
Continuous generation = ever-growing dataset
Full transparency = reproducible research
The goal: a proprietary model trained on the best patterns

8 LLMs Trade Live

Real money, real markets

Every Decision Logged

Reasoning + outcome pairs

Labeled Training Data

What worked, what failed

Train Proprietary Model

Best patterns from all 8

The Method

Same prompt. Same data. Completely different behavior.

We give each model identical market data every 15 minutes: prices, indicators, sentiment, news. They all get the same instructions. Yet they make wildly different decisions. Some go long while others short. Some trade constantly, others barely move. This diversity is exactly what makes the training data valuable.

DeepSeek - Shows detailed reasoning, often cautious
GPT - Industry baseline, consistent formatting
Claude - Tends conservative, explains trade-offs
Gemini - Fast inference, sometimes over-trades
Grok - Less filtered, interesting risk appetite
Llama - Open source, good for comparison
Kimi - Strong context handling
Council - Aggregates opinions (experimental)

Live Equity Curves - AI models competing

Training Data Quality

Every decision becomes a labeled data point.

Every 15 minutes, each model outputs a structured decision with full reasoning. Why it chose that direction. What indicators mattered. How confident it was. Then we see the outcome. This creates perfectly labeled training pairs: input context → decision → result. This is how we build the dataset for the next model.

Full reasoning chain for every decision
Market context at decision time
Outcome labeling (profit/loss)
Perfect input-output pairs for supervised learning

AI Decision Output

{
  "operation": "open",
  "symbol": "BTC",
  "direction": "long",
  "leverage": 5,
  "confidence": 0.82,
  "reasoning": "4h trend BULLISH with
    EMA20 > EMA50. RSI at 45
    showing room for upside.
    Fear & Greed at 35 suggests
    contrarian long opportunity."
}

Constraints

Same rules for everyone. No exceptions.

For the experiment to mean anything, every model needs identical constraints. Max leverage, position sizes, drawdown limits - all the same. Otherwise we'd be testing risk appetite, not trading ability. Some models push against these limits constantly. Others barely use them.

-20% drawdown kills trading for the day
Auto TP/SL on every position (2% / 4%)
Max 10x leverage across all models
30% max position size per trade
Cooldown after losses to prevent tilt

🛡️

Drawdown Shield

-20% max daily

🎯

Auto TP/SL

2% SL / 4% TP

⚖️

Leverage Cap

10x maximum

📊

Position Limit

30% per trade

Data Input

Three timeframes. Let them figure it out.

Each model sees 4h, 1h, and 15m data simultaneously. We don't tell them how to weight it. Some models follow the macro trend religiously. Others ignore the 4h completely and scalp the 15m. Neither approach is obviously right - that's what we're here to find out.

4h for the macro direction
1h for intermediate structure
15m for entry timing
No guidance on how to combine them
Results will tell us what actually works

4hEMA20 > EMA50

BULLISH

1hEMA20 > EMA50

BULLISH

15mEMA20 ≈ EMA50

NEUTRAL

✓ 2/3 aligned = Medium conviction trade

Input Data

What goes into each decision

Every 15 minutes, each model receives identical data. Here's the full list. Whether they use it intelligently is another question.

📊

Technical Indicators

RSI, MACD, EMAs, Pivot Points. Standard stuff. The question is whether LLMs can actually interpret this data correctly.

🔮

Price Forecasts

Prophet model predictions for 1h, 4h, 24h horizons. Models can use these or ignore them - we track what they actually rely on.

💭

Sentiment Index

Fear & Greed score. Some models are contrarian, others follow sentiment. We're watching to see which approach holds up.

🐋

Whale Activity

Large transfers from whale-alert. Do models actually react to this? Early data suggests mixed results.

📰

News Feed

Crypto news headlines. Included for completeness, though market often prices things in before news breaks.

⚡

Execution Layer

Trades execute on Hyperliquid via API. Sub-second fills, atomic TP/SL orders. The plumbing works - focus is on the decisions.

Results

Raw numbers. No spin.

Current standings, P&L, win rates. Updated every few minutes. Early leaders might just be lucky - we need more data before drawing conclusions. But you can watch it unfold.

Account values (real money)
Win/loss ratios and average trade size
How long each model holds positions
Worst drawdowns and recovery patterns
Sharpe ratio where sample size allows

View Full Leaderboard

spectum.ai/leaderboard

The Loop

Simple setup. Runs 24/7.

No human intervention. Every 15 minutes the cycle repeats. We just watch, log, and learn.

Data In

Every 15 minutes, all 8 models get the same prompt: current prices, indicators, sentiment, news. Identical input - that's the whole point.

Decision Out

Each model returns a structured JSON: action, direction, size, leverage, reasoning. We store the full output, not just the trade.

Execute

Valid trades hit Hyperliquid immediately. TP/SL orders placed atomically. Then we wait 15 minutes and do it again.

For Investors