Performance Optimization

Strategies to improve response times and overall API performance.

Recommendations:

Implement streaming - Use the streaming option for real-time responses in user-facing applications.
Use global routing - Oraicle automatically routes requests to the nearest data center, but for latency-sensitive applications, consider specifying a preferred region.
Parallelize independent requests - When multiple calls are needed, run them concurrently using your language’s async capabilities.
Implement connection pooling - Reuse HTTP connections for multiple requests to reduce overhead.
Consider model size vs. speed tradeoffs - Smaller models typically respond faster but may be less capable for complex tasks.

Example: Streaming Implementation

response = openai.ChatCompletion.create(
  model="deepseek-ai/DeepSeek-R1",
  messages=[
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Write a detailed explanation of neural networks"}
  ],
  stream=True  # Enable streaming for real-time responses
)

# Process the stream
for chunk in response:
  content = chunk.choices[0].delta.get("content", "")
  if content:
      print(content, end="", flush=True)