Streaming AI Responses on Cloudflare Workers

Jun 5, 2025 · 5 min read

Cloudflare WorkersAIStreamingServerlessEdge

I needed to stream AI responses from Cloudflare Workers to a frontend. The naive approach — wait for the full response, then send it — added seconds of latency. Here's the streaming setup that worked.

The Worker

Cloudflare Workers support the Web Streams API natively. The key is to pipe the AI provider's response stream directly through the Worker without buffering:

Testing with Curl

The -N flag disables buffering, so each chunk appears as it arrives.

The Pitfall: Response Buffering

Cloudflare's default response handling buffers up to a certain size before flushing. If your chunks are small (common with token-by-token generation), the stream feels stuck. The fix is to explicitly set Transfer-Encoding: chunked and keep the connection alive with periodic heartbeats.

Why This Matters

Streaming transforms the UX. Instead of a blank screen while the model thinks, users see tokens appearing character by character. The perceived speed difference is dramatic even when the total generation time is the same.