Supported Models
Protocol tiers and API endpoints.
Llama 3 70B
Tier 1 • Local LLM
Context: 8k Tokens
VRAM: 80GB
Mixtral 8x7B
Tier 2 • Local LLM
Context: 32k Tokens
VRAM: 48GB
OpenAI GPT-4o
Tier 3 • API Fallback
Context: 128k Tokens
Latency: ~1.2s
Google Gemini 1.5 Pro
Tier 3 • API Fallback
Context: 1m Tokens
Latency: ~2.5s
Anthropic Claude 3.5 Sonnet
Tier 3 • API Fallback
Context: 200k Tokens
Latency: ~1.5s