Skip to content

Performance Optimization

Strategies to improve response times and overall API performance.

Recommendations:

  • Implement streaming - Use the streaming option for real-time responses in user-facing applications.
  • Use global routing - Oraicle automatically routes requests to the nearest data center, but for latency-sensitive applications, consider specifying a preferred region.
  • Parallelize independent requests - When multiple calls are needed, run them concurrently using your language’s async capabilities.
  • Implement connection pooling - Reuse HTTP connections for multiple requests to reduce overhead.
  • Consider model size vs. speed tradeoffs - Smaller models typically respond faster but may be less capable for complex tasks.
response = openai.ChatCompletion.create(
model="deepseek-ai/DeepSeek-R1",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Write a detailed explanation of neural networks"}
],
stream=True # Enable streaming for real-time responses
)
# Process the stream
for chunk in response:
content = chunk.choices[0].delta.get("content", "")
if content:
print(content, end="", flush=True)