Text Streaming: 10x Speed Boost
Nov 5, 2024
Matthew Fedoseev
How We Achieved a 10x Speedup in Text Streaming with Simple Optimization
In our fast-paced world of AI-driven applications, ensuring high performance and scalability is essential. Here's how we tackled and resolved a significant performance bottleneck in our text streaming pipeline.
While working on a recent AI project that involved streaming text from endpoints, we noticed a significant bottleneck. While loading small chunks of text seemed almost instant, larger files (>1MB) took nearly 25 seconds to stream. This was surprising, considering the data was coming from Redis deployed on premise within the same network.
After some investigation, we identified the bottleneck:
At first glance, this implementation might seem fine. Indeed, it works well when streaming tokens from an LLM or binary file objects. However, when handling raw strings, the iter()
function processes strings character by character, resulting in:
- Very small chunks (1 character per iteration)
- Excessive I/O operations
- Significant TCP overhead
These inefficiencies caused streaming delays for large files.
The Solution: Chunking the Data
To address this issue, we introduced a simple generator to chunk the strings into smaller pieces before streaming:
This small change reduced the streaming time for a 1.2 MB text file from 25 seconds to about 2.3 - a 10x speed improvement.
Why Chunking Works
Chunking improves performance by:
- Reducing I/O Overhead: Each chunk is sent as soon as it’s ready, avoiding per-character delays.
- Balancing Memory and Speed: The chunk size (8192 characters) aligns with typical I/O buffer sizes (4 KB - 64 KB).
- Scalability: This approach scales seamlessly for both small and large files.
For our use case, we chose 8192 characters per chunk as our default, but this parameter can be tuned based on your workload. Keep in mind encoding: if you’re working with multi-byte characters, you may need to adjust the chunk size.
This optimization was crucial for our AI project, enabling our text processing pipelines to efficiently handle both small and large files. By identifying and resolving the bottleneck, we not only achieved a significant performance boost but also future-proofed the system for growing demands. As AI applications continue to evolve, small yet impactful optimizations like these will be key to maintaining performance and meeting new challenges.