Performance Optimization
Strategies to improve response times and overall API performance.
Recommendations:
- Implement streaming - Use the streaming option for real-time responses in user-facing applications.
 - Use global routing - Oraicle automatically routes requests to the nearest data center, but for latency-sensitive applications, consider specifying a preferred region.
 - Parallelize independent requests - When multiple calls are needed, run them concurrently using your language’s async capabilities.
 - Implement connection pooling - Reuse HTTP connections for multiple requests to reduce overhead.
 - Consider model size vs. speed tradeoffs - Smaller models typically respond faster but may be less capable for complex tasks.
 
Example: Streaming Implementation
Section titled “Example: Streaming Implementation”response = openai.ChatCompletion.create(  model="deepseek-ai/DeepSeek-R1-0528",  messages=[      {"role": "system", "content": "You are a helpful assistant."},      {"role": "user", "content": "Write a detailed explanation of neural networks"}  ],  stream=True  # Enable streaming for real-time responses)
# Process the streamfor chunk in response:  content = chunk.choices[0].delta.get("content", "")  if content:      print(content, end="", flush=True)