The Premise: Speed vs. Validation

This project was an experiment in accelerated engineering. I designed the proprietary feature sets based on years of market analysis, but the entire deployment pipeline—from data ingestion to social media posting—was prompt-engineered using LLMs.

This was not "vibe-coding" where I blindly trusted the output. I know how to write the code myself. Instead, I treated the LLM as a force multiplier to build a complex system in days rather than weeks. This approach drastically increases speed, but it also raises the stakes for validation. When you generate code at machine speed, you need to validate it with equal rigor.

The goal was to test a promising model live: connecting its internal logic to the Gemini API to "speak" its findings to the public in real-time.

The Timeline: Predicting the Crash

The experiment ran for approximately three weeks in October 2025. Despite the backend issues discussed later, the model's behavior during this period was fascinating. It successfully navigated standard volatility and then, remarkably, predicted the massive "10/10 Crash" with high precision.

October 3, 2025

The Buy Signal

Early in the deployment, the system correctly identified a local dip. The proprietary features (visualized in the top chart) aligned to trigger a high-confidence entry.

October 9, 2025

The Exit (Sell Signal)

The day before the crash. While broader retail sentiment was neutral, Singularity's internal structure metrics degraded. The bot flipped bearish and exited all positions, effectively "going to cash" before the chaos.

October 10, 2025

The Crash Prediction

The Black Swan Event. Moments before the market bottomed out, the system posted this prediction. It accurately forecast the sharp downward spike followed by the structural recovery.

Post-Crash

The Recovery

Following the crash, the system continued to track the recovery, validating that its "mental model" of the market had remained intact through the volatility.

Forensics: The "<=" Bug

In backtesting, this model showed a correlation of 0.2+ (predicting returns 24 hours out), a number so good it was suspicious. To protect against this, I built a custom validator that regenerated historical features and compared them against the live bot's output.

The validator showed a 0.9999+ correlation between the backtest logic and the live run. I thought this meant the system was sound. I was wrong.

The Silent Failure

The bug was a single character in the preprocessing pipeline: using <= (less than or equal to) instead of < (less than) when time-stamping data. This allowed the feature generator to peek exactly one data point into the future.

Because the validator ran the same code as the live bot, it reproduced the bug perfectly. The code was consistent, but the features were fundamentally flawed compared to the training data reality. The model was receiving "polluted" features in production, leading to a massive performance drop-off.

Backtest Correlation (Bugged) ~0.20
Live Run Correlation (Actual) ~0.014

The irony is that despite this massive degradation in statistical edge, the robust underlying logic (the "physics" of the market structure) was strong enough to still catch the 10/10 crash. It serves as a powerful lesson: Unit tests check if code works; validation checks if the premise is true.

Retrospective

I took the project offline to re-architect the pipeline. This experiment was one of my most sophisticated live tests to date. It proved that I can rapidly deploy complex AI agents using LLMs, but it also reinforced the need for independent, adversarial validation frameworks that don't just mimic the production code.

Every failure teaches a little more about how to construct robust systems—and how to verify them.