Streaming AI Responses on Cloudflare Workers
Jun 5, 2025 · 5 min read
I needed to stream AI responses from Cloudflare Workers to a frontend. The naive approach — wait for the full response, then send it — added seconds of latency. Here's the streaming setup that worked.
The Worker
Cloudflare Workers support the Web Streams API natively. The key is to pipe the AI provider's response stream directly through the Worker without buffering:
Testing with Curl
The -N flag disables buffering, so each chunk appears as it arrives.
The Pitfall: Response Buffering
Cloudflare's default response handling buffers up to a certain size before flushing. If your chunks are small (common with token-by-token generation), the stream feels stuck. The fix is to explicitly set Transfer-Encoding: chunked and keep the connection alive with periodic heartbeats.
Why This Matters
Streaming transforms the UX. Instead of a blank screen while the model thinks, users see tokens appearing character by character. The perceived speed difference is dramatic even when the total generation time is the same.