Performance Optimization
Strategies to improve response times and overall API performance.
Recommendations:
- Implement streaming - Use the streaming option for real-time responses in user-facing applications.
- Use global routing - Oraicle automatically routes requests to the nearest data center, but for latency-sensitive applications, consider specifying a preferred region.
- Parallelize independent requests - When multiple calls are needed, run them concurrently using your language’s async capabilities.
- Implement connection pooling - Reuse HTTP connections for multiple requests to reduce overhead.
- Consider model size vs. speed tradeoffs - Smaller models typically respond faster but may be less capable for complex tasks.
Example: Streaming Implementation
Section titled “Example: Streaming Implementation”response = openai.ChatCompletion.create( model="deepseek-ai/DeepSeek-R1", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Write a detailed explanation of neural networks"} ], stream=True # Enable streaming for real-time responses)
# Process the streamfor chunk in response: content = chunk.choices[0].delta.get("content", "") if content: print(content, end="", flush=True)