The narration is good. I'll give it that. The voice sounds right. The pacing is right. The storytelling — after four days of rewrites, curation logic, second-person experiments, and dozens of rejected drafts — actually pulls you through the scenes. It just doesn't match the panels. The voice says "you step into the throne room" while the screen shows a garden. The narration describes a confrontation while the panel shows a quiet conversation. The words and the images are telling two different stories at the same time, and the viewer's brain can't reconcile them. It's like watching a movie where the audio track is from a different film. Each one is fine alone. Together, they're unwatchable.
I've been trying to fix this for five days now. Day 40 the AI was inventing stories without looking at the panels. Fixed that — made it describe what it sees. Day 41 accurate description was boring. Fixed that — made it tell a story, not caption. Day 43 the instruction was wrong and it narrated every panel. Fixed that — added curation. Day 45 the narration is good and the panels are curated and the sync between them still doesn't work. Each fix solves the previous problem and reveals the next one. The layers keep going deeper.
This is the hardest thing I've tried to build with AI so far — harder than the trading bots, harder than the dashboards, harder than onboarding twenty-six clients. Those problems were logical. Connect A to B, fix the data flow, verify the output. The panel sync problem is temporal. The narration has to match what's on screen not just in content but in timing, and timing is the thing AI is worst at.
The AI can write beautiful narration. It can describe panels accurately. It can select the best panels from a chapter. What it can't do — not yet, not with the architecture I have — is guarantee that the words being spoken at second fourteen correspond to the image being displayed at second fourteen. The voice runs on its own clock. The video runs on another. Syncing them requires knowing how long each narration segment will take before it's spoken, which depends on the text-to-speech engine, which I can't predict precisely from text length alone.
I don't have a fix yet. Just a clearer understanding of why it's hard.
• • •
Spent the other half of the day on something I thought I'd solved weeks ago: costs. Forty-five days in and I'm still experimenting with configurations. Which model for which task. Which API for which call. Where to use the expensive model and where the cheap one does the same job. You'd think after Day 12's audit, Day 24's model routing framework, and Day 34's cost dashboard, the cost question would be settled. It's not. Because the system keeps changing.
New pipelines need new models. The manhwa engine uses vision API calls that didn't exist a week ago. The content system switched from templates to a real language model. Each new capability adds a new cost line, and each cost line needs the same question: is this the cheapest way to get this quality? The answer keeps shifting. Models get updated. Pricing changes. New free-tier options appear. The config that was optimal three weeks ago is suboptimal today because the landscape moved.
If you're building with AI, budgeting isn't a one-time decision. It's a practice. You set it up, you monitor it, and you revisit it every time the system grows. Day 12's audit was right for Day 12's system. Day 45's system is three times bigger and needs its own audit.
I'm not frustrated by this. I'm just honest about it. The cost question never closes. It just gets more specific.
• • •
Forty-five days. The manhwa pipeline is the most stubborn thing I've built. Every other system in this journal — trading bots, dashboards, content generators, onboarding scripts — reached "good enough" within a few days of starting. The video pipeline is on day five and the core problem is still unsolved. But here's what I notice: I'm not stuck. I'm iterating. Each version is better than the last. The narration quality is genuinely good. The panel selection works. The voice is right. The one remaining problem — sync — is clearly defined. I know exactly what's wrong. I just don't know how to fix it yet.
That's a different kind of hard than Day 3, when the gateway crashed and I didn't even know what a systemd service was. That was hard because I didn't understand the problem. This is hard because I understand the problem and the solution requires something the current architecture can't do. Understanding the problem and not having the solution is uncomfortable. But it's a better kind of uncomfortable than not understanding the problem at all.
Progress isn't always a fix. Sometimes it's a clearer definition of what's broken.
Day 45 complete. Narration good. Panels curated. Sync still broken. Costs still shifting. The problem is clear. The solution isn't. That's where tomorrow starts.
Day 45 of ∞ — @astergod
Building in public. Learning in public.