Skip to content

Cost Optimization

Oraicle’s pay-per-usage model means you only pay for what you use, but these techniques will help minimize your costs further.

Recommendations:

  • Implement caching - Cache responses for identical or similar queries.
  • Use compression techniques - Summarize long documents before sending them as context.
  • Optimize prompt templates - Shorter, more focused prompts use fewer tokens while often producing better results.
  • Adjust temperature settings - Lower temperature values (0.1-0.4) typically produce more concise responses.
  • Implement token limits - Set appropriate max_tokens values to prevent unnecessarily long responses.
  • Batch processing - When generating embeddings for multiple texts, use batch operations rather than individual calls.
response = openai.ChatCompletion.create(
model="openbmb/MiniCPM3-4B", # Cost-efficient model
messages=[
{"role": "system", "content": "You are a concise assistant that gives brief, accurate answers."},
{"role": "user", "content": "Explain quantum computing"}
],
temperature=0.3, # Lower temperature for more focused output
max_tokens=150, # Limit response length
presence_penalty=0.6 # Discourage repetition
)