Awesome Free LLM APIs

Admin

3 Jul, 2026

Free AI API Providers and Inference Gateways: 2026 Comprehensive Comparison Guide

An authoritative, structured repository of zero-cost AI model endpoints, context limits, modalities, and API rate constraints for developers and researchers.

Direct AI Model Providers

These companies develop proprietary or open-weight models and offer direct API developer keys with free operational tiers.

Cohere 🇨🇦

Base URL: https://api.cohere.com/v2 | Get API Key

Free "Trial" API key, no credit card. 1,000 API calls/month. Non-commercial use only.

Model Name & ID	Context	Max Out	Modality	Rate Limits
Command A+ (218B) command-a-plus-05-2026	128K	4K	Text	20 RPM
Command A (111B) command-a-03-2025	256K	4K	Text	20 RPM
Command R+ command-r-plus-08-2024	128K	4K	Text	20 RPM
Command R command-r-08-2024	128K	4K	Text	20 RPM
Command R7B command-r7b-12-2024	128K	4K	Text	20 RPM

Google Gemini 🇺🇸

Base URL: https://generativelanguage.googleapis.com/v1beta | Get API Key

Free tier unavailable in EU/UK/Switzerland. Free-tier prompts may be used by Google to improve products. [1]

Model Name & ID	Context	Max Out	Modality	Rate Limits
Gemini 3.5 Flash gemini-3.5-flash	1M	64K	Text + Image + Audio + Video	15 RPM, 1,500 RPD
Gemini 3.1 Flash-Lite gemini-3.1-flash-lite	1M	65K	Text + Image + Audio + Video	30 RPM, 1,500 RPD
Gemini 2.5 Flash gemini-2.5-flash	1M	65K	Text + Image + Audio + Video	15 RPM, 1,500 RPD
Gemini 2.5 Pro gemini-2.5-pro	2M	65K	Text + Image + Audio + Video	5 RPM, 50 RPD

Mistral AI 🇫🇷

Base URL: https://api.mistral.ai/v1 | Get API Key

Free "Experiment" plan, no credit card. ~1B tokens/month. Prompts may be used to improve models.

Model Name & ID	Context	Max Out	Modality	Rate Limits
Mistral Medium 3.5 (128B) mistral-medium-2604	256K	256K	Text + Image + Code	~1 RPS, 500K TPM
Mistral Small 4 mistral-small-2603	256K	256K	Text + Image + Code	~1 RPS, 500K TPM
Mistral Large 3 mistral-large-2411	256K	256K	Text	~1 RPS, 500K TPM
Mistral Nemo (12B) open-mistral-nemo	128K	128K	Text	~1 RPS, 500K TPM
Codestral codestral-2501	256K	256K	Code	~1 RPS, 500K TPM
Pixtral Large pixtral-large-2411	128K	128K	Text + Image	~1 RPS, 500K TPM

Z AI (Zhipu AI) 🇨🇳

Base URL: https://open.bigmodel.cn/api/paas/v4 | Get API Key

Permanent free models, no credit card required.

Model Name & ID	Context	Max Out	Modality	Rate Limits
GLM-4.7-Flash glm-4.7-flash	200K	128K	Text	1 concurrent request
GLM-4.6V-Flash glm-4.6v-flash	128K	~4K	Text + Image	1 concurrent request

Multi-Model Inference Gateways

These providers aggregate multiple open architecture models, optimizing processing speed through hardware innovations or token routing networks.

Cerebras 🇺🇸

Base URL: https://api.cerebras.ai/v1 | Explore Gateway

Free tier, no credit card. Ultra-fast inference (~2,600 tok/s). 1M tokens/day cap. 8K context cap on free tier.

Model Name & ID	Context	Max Out	Modality	Rate Limits
gpt-oss-120b gpt-oss-120b	128K (8K on free)	8K	Text	30 RPM, 14,400 RPD, 1M TPD
zai-glm-4.7 zai-glm-4.7	128K (8K on free)	8K	Text	10 RPM, 100 RPD, 1M TPD

Cloudflare Workers AI 🇺🇸

Base URL: https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run | Explore Gateway

10,000 Neurons/day free. 50+ models available on free tier.

Model Name & ID	Context	Max Out	Modality	Rate Limits
@cf/meta/llama-3.3-70b-instruct-fp8-fast	131K	Shared w/ context	Text	10K neurons/day (shared)
@cf/meta/llama-4-scout-17b-16e-instruct	Up to 10M	Shared w/ context	Multimodal	10K neurons/day (shared)
@cf/openai/gpt-oss-120b	128K	Shared w/ context	Text	10K neurons/day (shared)
@cf/moonshotai/kimi-k2.7-code	262K	Shared w/ context	Text (code)	10K neurons/day (shared)
@cf/google/gemma-4-26b-a4b-it	256K	Shared w/ context	Text	10K neurons/day (shared)
@cf/zhipuai/glm-4.7-flash	131K	Shared w/ context	Text	10K neurons/day (shared)
@cf/mistralai/mistral-small-3.1-24b-instruct	128K	Shared w/ context	Text	10K neurons/day (shared)
@cf/deepseek-ai/deepseek-r1-distill-qwen-32b	32K	Shared w/ context	Text (reasoning)	10K neurons/day (shared)
+ 42 more models	Varies	Varies	Text, Image, Audio, Embeddings	10K neurons/day (shared)

GitHub Models 🇺🇸

Base URL: https://models.github.ai/inference | Explore Gateway

Free prototyping for all GitHub users. 45+ models. Per-request limits (8K in / 4K out).

Model Name & ID	Context	Max Out	Modality	Rate Limits
gpt-5 openai/gpt-5	200K	32K	Text	10 RPM, 50 RPD
gpt-4.1 openai/gpt-4.1	1M	32K	Text	10 RPM, 50 RPD
gpt-4.1-mini openai/gpt-4.1-mini	1M	32K	Text	15 RPM, 150 RPD
gpt-4o openai/gpt-4o	128K	16K	Text + Vision	10 RPM, 50 RPD
o4-mini openai/o4-mini	200K	100K	Text (reasoning)	10 RPM, 50 RPD
Llama-4-Scout-17B-16E meta/Llama-4-Scout-17B-16E	512K	~4K	Text + Vision	15 RPM, 150 RPD
Llama-4-Maverick-17B-128E meta/Llama-4-Maverick-17B-128E	256K	~4K	Text + Vision	10 RPM, 50 RPD
Meta-Llama-3.3-70B meta/Meta-Llama-3.3-70B	131K	~4K	Text	15 RPM, 150 RPD
DeepSeek-R1 deepseek/DeepSeek-R1	64K	8K	Text (reasoning)	15 RPM, 150 RPD
Mistral-Small-3.1 mistralai/Mistral-Small-3.1	128K	~4K	Text + Vision	15 RPM, 150 RPD
+ 35 more models	Varies	Varies	Text / Image	Varies by tier

Groq 🇺🇸

Base URL: https://api.groq.com/openai/v1 | Explore Gateway

Free tier, no credit card. Ultra-fast LPU inference. [2]

Model Name & ID	Context	Max Out	Modality	Rate Limits
llama-3.3-70b-versatile	131K	32K	Text	30 RPM, 1,000 RPD
llama-3.1-8b-instant	131K	131K	Text	30 RPM, 1,000 RPD
llama-4-scout-17b-16e-instruct	131K	8K	Text + Vision	30 RPM, 1,000 RPD
qwen3-32b	131K	131K	Text	30 RPM, 1,000 RPD
gpt-oss-120b	131K	32K	Text	30 RPM, 1,000 RPD

Hugging Face 🇺🇸

Base URL: https://router.huggingface.co/v1 | Explore Gateway

100K monthly Inference Provider credits for free users. Routes to Fireworks, Together, Hyperbolic, Nebius, Novita, DeepInfra and others. Thousands of models.

Model Name & ID	Context	Max Out	Modality	Rate Limits
Meta-Llama-3.1-8B-Instruct meta-llama/Meta-Llama-3.1-8B-Instruct	128K	~4K	Text	Credit-metered
Mistral-7B-Instruct-v0.3 mistralai/Mistral-7B-Instruct-v0.3	32K	~4K	Text	Credit-metered
Mixtral-8x7B-Instruct-v0.1 mistralai/Mixtral-8x7B-Instruct-v0.1	32K	~4K	Text	Credit-metered
Phi-3.5-mini-instruct microsoft/Phi-3.5-mini-instruct	128K	~4K	Text	Credit-metered
Qwen2.5-7B-Instruct Qwen/Qwen2.5-7B-Instruct	131K	~4K	Text	Credit-metered
+ thousands of community models	Varies	Varies	Text, Image, Audio, Embeddings	100K credits/month free

Kilo Code 🇺🇸

Base URL: https://api.kilo.ai/api/gateway | Explore Gateway

Free models with no credit card required. `kilo-auto/free` auto-router routes to minimax/minimax-m2.5:free (80%) and stepfun/step-3.5-flash:free (20%). [5]

Model Name & ID	Context	Max Out	Modality	Rate Limits
x-ai/grok-code-fast-1:free	256K	—	Text (code)	~200 req/hr
minimax/minimax-m2.5:free	196K	8K	Text	~200 req/hr
bytedance-seed/dola-seed-2.0-pro:free	—	—	Text	~200 req/hr
nvidia/nemotron-3-super-120b-a12b:free	262K	32K	Text	~200 req/hr
arcee-ai/trinity-large-thinking:free	—	—	Text (reasoning)	~200 req/hr
openrouter/free	Varies	Varies	Text	~200 req/hr

LLM7.io 🇬🇧

Base URL: https://api.llm7.io/v1 | Explore Gateway

Zero-friction API gateway. No registration needed for basic access. 30+ models. GDPR-compliant.

Model Name & ID	Context	Max Out	Modality	Rate Limits
deepseek-r1-0528	—	—	Text (reasoning)	30 RPM (120 with token)
deepseek-v3-0324	—	—	Text	30 RPM (120 with token)
gemini-2.5-flash-lite	—	—	Text + Vision	30 RPM (120 with token)
gpt-4o-mini	—	—	Text + Vision	30 RPM (120 with token)
mistral-small-3.1-24b	32K	—	Text	30 RPM (120 with token)
qwen2.5-coder-32b	—	—	Text (code)	30 RPM (120 with token)
+ ~24 more models	Varies	Varies	Text	30 RPM (120 with token)

ModelScope 🇨🇳

Base URL: https://api-inference.modelscope.cn/v1 | Explore Gateway

Free API-Inference for registered users. Requires Alibaba Cloud account binding + real-name verification. [6]

Model Name & ID	Context	Max Out	Modality	Rate Limits
Qwen/Qwen3.5-35B-A3B	—	—	Text	2,000 RPD total; <=500 RPD/model
Qwen/Qwen3.5-27B	—	—	Text	2,000 RPD total; <=500 RPD/model
+ API-Inference-enabled models	Varies	Varies	LLM, MLLM	Dynamic quotas + concurrency

NVIDIA NIM 🇺🇸

Base URL: https://integrate.api.nvidia.com/v1 | Explore Gateway

Free with NVIDIA Developer Program membership. 100+ models. Rate-limited (no daily token cap).

Model Name & ID	Context	Max Out	Modality	Rate Limits
deepseek-ai/deepseek-r1	128K	~163K	Text (reasoning)	~40 RPM
nvidia/nemotron-3-super-120b-a12b	262K	262K	Text	~40 RPM
nvidia/nemotron-3-nano-30b-a3b	128K	32K	Text	~40 RPM
nvidia/llama-3.1-nemotron-ultra-253b-v1	128K	4K	Text	~40 RPM
meta/llama-3.1-405b-instruct	128K	4K	Text	~40 RPM
qwen/qwen2.5-72b-instruct	128K	8K	Text	~40 RPM
google/gemma-4-31b	128K	8K	Text	~40 RPM
mistralai/mistral-large-2-instruct	128K	4K	Text	~40 RPM
minimax/minimax-m2.7	128K	8K	Text	~40 RPM
+ 90 more models	Varies	Varies	Text, Image, Video, Speech, Embeddings	~40 RPM

Ollama Cloud 🇺🇸

Base URL: https://api.ollama.com | Explore Gateway

Free tier with qualitative usage limits. 400+ models from Ollama library. Not OpenAI SDK-compatible. [3]

Model Name & ID	Context	Max Out	Modality	Rate Limits
gpt-oss:120b-cloud	128K	Model-dependent	Text	Session/weekly limits
deepseek-v3.1:671b-cloud	128K	Model-dependent	Text	Session/weekly limits
qwen3-coder:480b-cloud	128K	Model-dependent	Text (code)	Session/weekly limits
kimi-k2:1t-cloud	262K	Model-dependent	Text	Session/weekly limits
glm-4.6:cloud	128K	Model-dependent	Text	Session/weekly limits
deepseek-r1:cloud	128K	Model-dependent	Text (reasoning)	Session/weekly limits
+ 30 more cloud models	Varies	Varies	Text	Session/weekly limits

OpenRouter 🇺🇸

Base URL: https://openrouter.ai/api/v1 | Explore Gateway

~22 free models (marked with `:free` suffix). OpenAI SDK-compatible. [4]

Model Name & ID	Context	Max Out	Modality	Rate Limits
qwen/qwen3-coder:free	1M	262K	Text (code)	20 RPM, 200 RPD
nvidia/nemotron-3-ultra-550b-a55b:free	1M	65K	Text	20 RPM, 200 RPD
nvidia/nemotron-3-super-120b-a12b:free	1M	262K	Text	20 RPM, 200 RPD
openai/gpt-oss-120b:free	131K	131K	Text	20 RPM, 200 RPD
openai/gpt-oss-20b:free	131K	8K	Text	20 RPM, 200 RPD
meta-llama/llama-3.3-70b-instruct:free	131K	~16K	Text	20 RPM, 200 RPD
nousresearch/hermes-3-llama-3.1-405b:free	131K	~16K	Text	20 RPM, 200 RPD
google/gemma-4-31b-it:free	262K	32K	Multimodal	20 RPM, 200 RPD
poolside/laguna-m.1:free	262K	32K	Text	20 RPM, 200 RPD
qwen/qwen3-next-80b-a3b-instruct:free	262K	~32K	Text	20 RPM, 200 RPD
+ ~12 more free models	Varies	Varies	Text / Image	20 RPM, 200 RPD

OVHcloud AI Endpoints 🇫🇷

Base URL: https://oai.endpoints.kepler.ai.cloud.ovh.net/v1 | Explore Gateway

Free anonymous tier (no API key, no signup): 2 RPM per IP per model. 20+ open-weight models hosted in EU. OpenAI SDK-compatible. [7]

Model Name & ID	Context	Max Out	Modality	Rate Limits
Qwen3.5-397B-A17B	131K	~32K	Text	2 RPM (anonymous)
gpt-oss-120b	128K	~32K	Text	2 RPM (anonymous)
gpt-oss-20b	128K	~8K	Text	2 RPM (anonymous)
Meta-Llama-3_3-70B-Instruct	131K	~4K	Text	2 RPM (anonymous)
Llama-3.1-8B-Instruct	131K	~4K	Text	2 RPM (anonymous)
Qwen3.6-27B	131K	~32K	Text	2 RPM (anonymous)
Qwen3.5-9B	131K	~8K	Text	2 RPM (anonymous)
Qwen3-32B	131K	~32K	Text	2 RPM (anonymous)
Qwen3-Coder-30B-A3B-Instruct	262K	~32K	Text (code)	2 RPM (anonymous)
Qwen2.5-VL-72B-Instruct	128K	~8K	Text + Vision	2 RPM (anonymous)
Mistral-Small-3.2-24B-Instruct Mistral-Small-3.2-24B-Instruct-2506	128K	~4K	Text	2 RPM (anonymous)
Mistral-Nemo-Instruct-2407	128K	~4K	Text	2 RPM (anonymous)
Mistral-7B-Instruct-v0.3	32K	~4K	Text	2 RPM (anonymous)

SambaNova 🇺🇸

Base URL: https://api.sambanova.ai/v1 | Explore Gateway

Free tier, no credit card. Ultra-fast RDU inference. 20 RPM, 200K tokens/day. [8]

Model Name & ID	Context	Max Out	Modality	Rate Limits
DeepSeek-V3.1	128K	~8K	Text	20 RPM, 20 RPD, 200K TPD
DeepSeek-V3.2 (Preview)	128K	~8K	Text	20 RPM, 20 RPD, 200K TPD
Meta-Llama-3.3-70B-Instruct	128K	~8K	Text	20 RPM, 20 RPD, 200K TPD
gpt-oss-120b	128K	~8K	Text	20 RPM, 20 RPD, 200K TPD
MiniMax-M2.7	128K	~8K	Text	20 RPM, 20 RPD, 200K TPD
gemma-4-31B-it (Preview)	128K	~8K	Text	20 RPM, 20 RPD, 200K TPD

SiliconFlow 🇨🇳

Base URL: https://api.siliconflow.cn/v1 | Explore Gateway

Permanently free models, no credit card required. 200+ paid models also available.

Model Name & ID	Context	Max Out	Modality	Rate Limits
Qwen/Qwen3-8B	131K	131K	Text	30 RPM, 60K TPM
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	131K	Configurable	Text (reasoning)	30 RPM, 60K TPM

Platform Disclaimers & Operational Notes

Free tier not available in the EU, UK, or Switzerland (available regions).
Groq rate limits were reduced in 2026. Most models now get 1,000 RPD on the free tier (down from 14,400). Llama 4 Maverick has been deprecated. See rate limits.
Ollama Cloud measures usage by GPU time, not tokens or requests. Free tier described as "light usage" with session limits resetting every 5 hours and weekly limits every 7 days. Pro (50x more) and Max (250x more) plans available. Not OpenAI SDK-compatible; uses the Ollama API.
Free models default to 200 RPD per model. A one-time purchase of $10+ in credits unlocks 1,000 RPD for free models. OpenRouter also offers a Free Models Router (openrouter/free) and model fallbacks for chaining models in priority order. Free providers may log prompts for training.
Kilo Code free model list may change over time. nvidia/nemotron-3-super-120b-a12b:free is for trial use only — prompts are logged by NVIDIA. Auto-router kilo-auto/free routes to minimax/minimax-m2.5:free (80%) and stepfun/step-3.5-flash:free (20%).
API-Inference is free for registered users. Current published limits are 2,000 requests/day per user (total across models), with per-model daily quotas dynamically adjusted and capped at 500; concurrency is also dynamically rate-limited. Requires Alibaba Cloud account binding and real-name verification (limits, intro).
OVHcloud AI Endpoints offers a permanent free anonymous tier (2 requests per minute per IP, per model) with no signup or API key required. Higher rate limits (400 RPM per Public Cloud project per model) require an API key and are billed pay-as-you-go per token; new Public Cloud accounts get up to $200 in free trial credits. Models are hosted in EU data centers.
SambaNova grants $5 in initial credits (valid 30 days) on top of the permanent free tier. The free tier itself persists indefinitely with 20 RPM, 20 RPD, and 200K TPD per model. No credit card required. OpenAI SDK-compatible.

Developer Glossary: API Rate Limit Metrics

RPM: Requests per minute
RPD: Requests per day
TPM: Tokens per minute
TPD: Tokens per day
RPS: Requests per second

Data compiled for architectural reference. Capabilities, deprecations, and endpoints reflect structural configurations accurate as of mid-2026.

Awesome Free LLM APIs

Free AI API Providers and Inference Gateways: 2026 Comprehensive Comparison Guide