Day 36 — The Backtest Said Lose

Where I'm at

Iran fired missiles at a US base overnight. Crypto is bleeding. BTC dropped from $70k to $68k while I was asleep. The dashboards are red across the board.

The bots are still running. Still buying the dip. Still doing exactly what they're designed to do. No liquidations. No crashes. The system held through another geopolitical shock — the third in two weeks.

But that's not what's keeping me up tonight. What's keeping me up is this: the live bots are making money and the backtest says they shouldn't be.

• • •

Yesterday I built the autoresearch system. Five hundred experiments running overnight, optimizing parameters, finding better numbers. The machine doesn't guess — that's what I wrote. I believed it.

This morning I checked the results. Every single score was negative. Not just the default parameters. Every combination the machine tested. The best result — a lookback window of 360 minutes instead of 1,440 — still scored negative 2,243. The worst were far worse. The machine ran hundreds of experiments and concluded: this strategy loses money.

But my live bots are up $54.57 across fourteen trades. HYPE is the best performer at $23.64. ASTER is second at $17.90. Real trades, real profits, real money that appeared in my account while the backtest was telling me it's impossible.

Same strategy. Same symbols. Different answer.

That contradiction is the most important thing that's happened since I started this project. Because it means one of two things: either the backtest is broken and I can't trust it, or the live window has been unusually lucky and the profits are about to disappear.

• • •

I spent the afternoon trying to figure out which one it is. The answer turned out to be neither — or both.

I went through every winning trade from the live bots and looked at the actual entry points. Not where the config says they should enter. Where they actually did.

The bots are configured to enter when price drops 1% below the 24-hour high. That's the trigger. But in practice, the winning trades entered at 3% to 6% below the high. Not 1%. The price was already falling fast when the 1% threshold triggered, and by the time the order executed, the actual entry was much lower.

The 1% trigger isn't wrong — it's just not what's making the money. The money comes from the fact that real markets move in bursts, not gentle slopes. Price doesn't politely dip 1% and wait. It crashes through 1% on the way to 3% or 5%, and the bot catches it mid-fall. The DCA layers below that — each one spaced about 1% apart — work perfectly because they're catching a real pullback, not a statistical blip.

The backtest doesn't capture this. The backtest uses clean historical candles where a 1% drop is a 1% drop. In live trading, a 1% drop is the beginning of a 4% drop that happens in twelve seconds. The entry timing looks identical in the config. It's completely different in execution.

The strategy works. The simulation doesn't simulate the right thing.

That realization changed what I'm optimizing. I was tuning parameters — how many layers, what step size, what lookback window — for a simulation that doesn't match reality. Rearranging deck chairs. The real question isn't "what parameters make the backtest profitable?" It's "what does the live data say about how entries actually work, and how do I build a system around that?"

The live data says: entries happen 3-6% below the peak, not 1%. The DCA layers work perfectly once you're in. The entry trigger is the problem — not because it's wrong, but because the backtest models it incorrectly.

• • •

There's also the research system that wasn't researching.

Yesterday I built the autoresearch loop. Today I discovered the cron job that was supposed to run it overnight was never set up. The system was built on Day 35, ran once manually, and then sat there doing nothing while I assumed it was churning through experiments in the background.

Day 29 I found a ghost script that had been silently running for days and causing chaos. Day 36 I found a script that was silently not running for a day and producing nothing. Same failure mode, opposite direction. The system doesn't tell you when a cron isn't scheduled. It doesn't tell you when something isn't happening. You only find out when you go check and the results folder is empty.

Set up the crons. 23:00 UTC for research, 08:00 for the morning report. The system runs on free compute — pure parameter search, no AI model needed, 1,900 experiments overnight. Now it actually runs every night. The machine that finds the answers needs to actually be turned on.

• • •

Thirty-six days. The thing I keep coming back to tonight isn't the Iran escalation or the red dashboards or the $54.57 in profit. It's the contradiction.

Yesterday I was euphoric. The machine runs 500 experiments. The machine finds what intuition misses. The machine doesn't guess. Today the machine told me the strategy loses money while my live account says it doesn't. The machine was testing the wrong version of reality.

Here's what that taught me, and it's the most important lesson since Day 33: the quality of your answer is limited by the quality of your question. The autoresearch loop works — it genuinely finds better parameters within the space it's given. But if the space is wrong — if the simulation doesn't match how live markets actually behave — you get a perfectly optimized answer to the wrong question.

Day 35 I wrote that the scoring function matters more than the loop. That's still true. But Day 36 adds something: the simulation matters more than the scoring function. If the model you're testing against doesn't match reality, the score is meaningless. Garbage in, optimized garbage out.

The live data is the truth. The backtest is an approximation. When they disagree, trust the live data and figure out why the approximation is wrong.

Day 36 complete. Fourteen live trades in profit. Every backtest score negative. Both are true. The answer is in the gap between them. Tomorrow: test a 2% entry below the 1-hour high instead of the 24-hour high. That's not a parameter tweak. That's a different question entirely.

Day 36 of ∞ — @astergod Building in public. Learning in public.

The BacktestSaid Lose

Where I'm at

The Backtest
Said Lose