← Blog

Dante Took Hours. The Shahnameh Is Seven Times Bigger. So I Ran 1,000 AI Calls at Once.

9 June 2026

Earlier I wrote about the cost lesson behind my original-language reader. Short version: I almost spent hundreds generating the Divine Comedy on a flagship model, caught it at one-sixth of the way through, and finished the whole thing plus a second book on a cheap "mini" model for pocket change. If you missed it, start there.

That post was about cost. This one is about the next wall: time.

Because then I took on something much bigger.

The book

Ferdowsi's Shahnameh. The Persian Book of Kings. It's the longest epic poem ever written by a single author. Around 50,000 couplets across 777 sections, from the first mythical king down to the fall of the Sasanians.

For scale: the Divine Comedy is about 14,000 lines. The Shahnameh is roughly 100,000. Call it seven times the size. It's now in the reader, in the original Persian, read right to left.

The wall nobody warns you about

Here's the thing about precomputing an AI layer over a book. Even on the cheap model, it's slow.

Every grammar note and every translation is its own API call. The reader makes thousands of them per book. The Divine Comedy took hours. Notes from Underground took hours. One call at a time adds up fast when there are thousands of them.

Now multiply by seven. At that rate the Shahnameh was an overnight job. Maybe two nights. Cost was no longer the problem. Waiting was.

The bet: stop being polite

This job is embarrassingly parallel. The grammar note for one couplet does not depend on the note for the next one. They can all run at the same time. The only reason it was slow is that I was being polite, sending a handful of calls at a time.

So I stopped being polite. A handful of workers became sixty. Then four hundred. Then a thousand calls in flight at once.

The speedup was close to linear until it wasn't:

  • 60 workers: about 190 couplets a minute.
  • 400 workers: about 1,350 a minute. Roughly seven times faster.
  • 1,000 workers: about 2,370 a minute. Roughly twelve times faster.

The returns taper past a few hundred, because the bottleneck stops being how many calls you run at once and becomes how fast a single call comes back. But twelve times faster is twelve times faster. The biggest single run did 263 sections, almost 18,700 couplets, in under fifteen minutes. That one run moved about 39 million tokens.

The warning: money leaves fast

Now the part to feel in your gut.

At a thousand calls in parallel, money leaves your account fast.

Notes from Underground, the whole novella, cost about €13. The Shahnameh cost about €170. That tracks with the size. But the €170 did not trickle out over a quiet night. It went in roughly an hour of wall-clock, and a single fifteen-minute run burned about €60.

Parallelism does not change what you spend. It changes how fast you spend it. The same bill that would have crept up overnight lands in minutes. That is the whole point, and it is also the whole risk. You can empty a budget before you think to check the meter.

The cost driver is still generated output, same as the last post. This job writes long grammar notes and full sentence translations, and generated tokens bill far faster than the text you feed in. A thousand workers just means a thousand of those happening at the same time.

Rate limits, and the backoff that saves you

Two things keep this from being reckless.

First, you will hit rate limits. Every API account has caps: only so many requests and so many tokens per minute. Throw a thousand calls at most accounts and you go straight through the ceiling. The calls start bouncing back with "too many requests."

Second, the tool handles that for you. It runs an adaptive limiter. It starts at the worker count you set, and the moment it sees a rate-limit error it cuts the number of parallel calls in half. Then, as calls keep succeeding, it creeps back up. It is borrowed straight from how the internet manages congestion: back off hard when you hit a wall, ramp up gently when the road is clear.

So you set the worker count high and let the tool find your real limit. A big account runs flat out. A smaller account quietly throttles itself down to whatever it can sustain, instead of erroring out and dropping work. On my account, a thousand workers held steady with almost no rate-limit hits. A smaller account would settle lower on its own, without me changing a thing.

The pattern has a name: additive increase, multiplicative decrease. Halve on failure, add one on success. It is the same logic that stops the whole internet from collapsing under its own traffic.

The lesson

Two posts, two halves of one skill.

First, make each unit cheap. Calibrate on a small sample, pick the model that is good enough, not the most expensive one.

Then, when the work is independent, parallelize it to collapse the time. Cheap-per-unit and massively parallel is what turns a 50,000-couplet epic into an afternoon's work for the price of a nice dinner.

Just respect the second half. Parallelism compresses your spend into minutes. Set a budget. Watch the first minute of a big run. Lean on the backoff. Bet on parallelism when the work is independent and each piece is already cheap. That is exactly when it pays.

Read it

The reader is live and free: read it here. The Shahnameh just went in, alongside Dante and Dostoevsky. The code is open source under the AGPL: on GitHub.

I build in public. Subscribe to the newsletter to follow what ships next.