LLM Settings

Choose LLM Model
Temperature
Max Tokens
Best Practices

The LLM tab controls the core reasoning capabilities of your assistant. This is where you choose which language model (LLM) to use and how it should behave during conversations. To access this section:
Build → Assistant → Select Assistant → LLM

Choose LLM Model

CallHQ currently supports the following model providers:

OpenAI: Models like GPT-4, GPT-3.5
Google: Models like Gemini 1.5 Pro, Gemini 2.0 Flash

Use the dropdowns to select:

The provider (e.g., openai or google)
The model variant (e.g., GPT-4 or Gemini 2.0 Flash)

Choose a model based on the nature of your assistant:

Use Gemini Flash for faster response times
Use GPT-4 for richer context understanding

Temperature

The temperature controls how creative or deterministic your assistant’s responses are:

Lower values (e.g., 0.2): More focused and predictable.
Higher values (e.g., 0.7+): More open-ended and creative.

Default: 0.5 Use lower values for customer service, and higher values for creative or conversational agents.

Max Tokens

The Max Tokens field controls how much the assistant can say in a single response. Example:

50 tokens equals approximately 30–40 words.

This ensures that responses stay concise and do not overwhelm the user.

Best Practices

Use OpenAI models when you need highly nuanced answers.
Use Gemini Flash when latency is a concern (faster responses).
Keep temperature balanced between 0.4–0.6 for most assistants.
Set Max Tokens based on your expected turn length and user tolerance.

Agent Configuration Transcriber Setup

⌘I

Get Started

Building Guide

Assistant

Phone Numbers

Chat

Outbound Campaign

Sample Assistants

Choose LLM Model

Temperature

Max Tokens

Best Practices

Get Started

Building Guide

Assistant

Phone Numbers

Chat

Outbound Campaign

Sample Assistants

​Choose LLM Model

​Temperature

​Max Tokens

​Best Practices

Choose LLM Model

Temperature

Max Tokens

Best Practices