Thinking Effort - Voxworks

Why Thinking Effort Matters

Language models can struggle with certain types of reasoning, particularly:

Numbers and calculations — Arithmetic, quantities, totals, percentages
Dates and scheduling — Day of week calculations, time differences, availability checks
Logic and comparisons — If/then reasoning, comparing options, eligibility checks
Multi-step reasoning — Tasks requiring several logical steps to reach a conclusion

These quantitative and logical tasks benefit significantly from deeper thinking. When the model has more time to reason, it makes fewer errors with numbers, dates, and complex logic. However, deeper thinking comes with a direct tradeoff: increased latency. Every step set to deep thinking adds delay before the assistant responds. In Voxworks, the latency differences between effort levels are subtle:

Fast — Approximately 200ms faster than normal
Normal — Baseline latency
Deep — Approximately 500ms slower than normal

These differences are small enough that they won’t be obviously discernible on individual steps, but the cumulative effect matters if many steps use deep thinking. The key is to use deep thinking strategically — on steps what will benefit most from accuracy.

What is Thinking Effort?

When the assistant generates a response, it can use different levels of reasoning:

Fast — Quick, direct responses for simple situations
Normal — Balanced reasoning for standard interactions
Deep — Deep reasoning for complex or important moments

Effort Levels

Level	Response Speed	Reasoning Depth	Best For
fast	Fastest	Surface-level	Simple acknowledgments, quick replies
normal	Balanced	Moderate	Standard conversation, most steps
deep	Slower	Deep	Complex questions, important decisions

When to Use Each Level

Fast

Use for steps where you want quicker responses:

Acknowledgments and confirmations
Simple follow-up questions
Transitions between topics
Routine conversation

User: "Yes, that time works."
Assistant: "Great! I'll send you a confirmation." [fast effort sufficient]

Normal (Default)

Use for:

Standard questions requiring context
Responses that need to incorporate multiple factors
Most conversational turns

Assistant: "What time works best for you next week?"
User: "How about Thursday afternoon?"

Deep

Use for:

Complex questions or objections
Sensitive topics requiring careful handling
Important decision points
When accuracy is critical

Interaction with Other Settings

Combined With	Effect
Patient eagerness	Wait longer + think deeper = very deliberate
Keen eagerness	Fast effort is typical; deep effort adds delay
Patient silence tolerance	Deep effort makes sense — user is thinking too

Best Practices

Default to normal — Start with normal effort and adjust from there
Elevate strategically — Use deep effort for moments that matter
Consider step complexity — If a step has more than 3 conditions or requires quantitative/logical reasoning, consider using deep effort
Test response quality — Verify fast effort responses are still good

Next Steps

Response Eagerness — Control response timing
Silence Tolerance — Handle idle users
Overview — See all conversation dynamics

​Why Thinking Effort Matters

​What is Thinking Effort?

​Effort Levels

​When to Use Each Level

​Fast

​Normal (Default)

​Deep

​Interaction with Other Settings

​Best Practices

​Next Steps