Cost Optimization
Oraicle’s pay-per-usage model means you only pay for what you use, but these techniques will help minimize your costs further.
Recommendations:
- Implement caching - Cache responses for identical or similar queries.
- Use compression techniques - Summarize long documents before sending them as context.
- Optimize prompt templates - Shorter, more focused prompts use fewer tokens while often producing better results.
- Adjust temperature settings - Lower temperature values (0.1-0.4) typically produce more concise responses.
- Implement token limits - Set appropriate max_tokens values to prevent unnecessarily long responses.
- Batch processing - When generating embeddings for multiple texts, use batch operations rather than individual calls.
Example: Cost-Efficient API Call
Section titled “Example: Cost-Efficient API Call”response = openai.ChatCompletion.create( model="openbmb/MiniCPM3-4B", # Cost-efficient model messages=[ {"role": "system", "content": "You are a concise assistant that gives brief, accurate answers."}, {"role": "user", "content": "Explain quantum computing"} ], temperature=0.3, # Lower temperature for more focused output max_tokens=150, # Limit response length presence_penalty=0.6 # Discourage repetition)