Build → Assistant → Select Assistant → LLM
Choose LLM Model
CallHQ currently supports the following model providers:- OpenAI: Models like GPT-4, GPT-3.5
- Google: Models like Gemini 1.5 Pro, Gemini 2.0 Flash
- The provider (e.g.,
openaiorgoogle) - The model variant (e.g.,
GPT-4orGemini 2.0 Flash)
- Use Gemini Flash for faster response times
- Use GPT-4 for richer context understanding
Temperature
The temperature controls how creative or deterministic your assistant’s responses are:- Lower values (e.g., 0.2): More focused and predictable.
- Higher values (e.g., 0.7+): More open-ended and creative.
0.5
Use lower values for customer service, and higher values for creative or conversational agents.
Max Tokens
The Max Tokens field controls how much the assistant can say in a single response. Example:50 tokensequals approximately 30–40 words.
Best Practices
- Use OpenAI models when you need highly nuanced answers.
- Use Gemini Flash when latency is a concern (faster responses).
- Keep temperature balanced between 0.4–0.6 for most assistants.
- Set Max Tokens based on your expected turn length and user tolerance.