Cost Optimization
Oraicle’s pay-per-usage model means you only pay for what you use, but these techniques will help minimize your costs further.
Recommendations:
- Implement caching - Cache responses for identical or similar queries.
 - Use compression techniques - Summarize long documents before sending them as context.
 - Optimize prompt templates - Shorter, more focused prompts use fewer tokens while often producing better results.
 - Adjust temperature settings - Lower temperature values (0.1-0.4) typically produce more concise responses.
 - Implement token limits - Set appropriate max_tokens values to prevent unnecessarily long responses.
 - Batch processing - When generating embeddings for multiple texts, use batch operations rather than individual calls.
 
Example: Cost-Efficient API Call
Section titled “Example: Cost-Efficient API Call”response = openai.ChatCompletion.create(  model="openbmb/MiniCPM3-4B",  # Cost-efficient model  messages=[      {"role": "system", "content": "You are a concise assistant that gives brief, accurate answers."},      {"role": "user", "content": "Explain quantum computing"}  ],  temperature=0.3,         # Lower temperature for more focused output  max_tokens=150,          # Limit response length  presence_penalty=0.6     # Discourage repetition)