Awesome Free LLM APIs

Free AI API Providers and Inference Gateways: 2026 Comprehensive Comparison Guide

An authoritative, structured repository of zero-cost AI model endpoints, context limits, modalities, and API rate constraints for developers and researchers.

Free AI API Providers and Inference Gateways: 2026 Comprehensive Comparison Guide

Direct AI Model Providers

These companies develop proprietary or open-weight models and offer direct API developer keys with free operational tiers.

Cohere πŸ‡¨πŸ‡¦

Base URL: https://api.cohere.com/v2 | Get API Key

Free "Trial" API key, no credit card. 1,000 API calls/month. Non-commercial use only.

Model Name & ID Context Max Out Modality Rate Limits
Command A+ (218B)
command-a-plus-05-2026
128K 4K Text 20 RPM
Command A (111B)
command-a-03-2025
256K 4K Text 20 RPM
Command R+
command-r-plus-08-2024
128K 4K Text 20 RPM
Command R
command-r-08-2024
128K 4K Text 20 RPM
Command R7B
command-r7b-12-2024
128K 4K Text 20 RPM

Google Gemini πŸ‡ΊπŸ‡Έ

Base URL: https://generativelanguage.googleapis.com/v1beta | Get API Key

Free tier unavailable in EU/UK/Switzerland. Free-tier prompts may be used by Google to improve products. [1]

Model Name & ID Context Max Out Modality Rate Limits
Gemini 3.5 Flash
gemini-3.5-flash
1M 64K Text + Image + Audio + Video 15 RPM, 1,500 RPD
Gemini 3.1 Flash-Lite
gemini-3.1-flash-lite
1M 65K Text + Image + Audio + Video 30 RPM, 1,500 RPD
Gemini 2.5 Flash
gemini-2.5-flash
1M 65K Text + Image + Audio + Video 15 RPM, 1,500 RPD
Gemini 2.5 Pro
gemini-2.5-pro
2M 65K Text + Image + Audio + Video 5 RPM, 50 RPD

Mistral AI πŸ‡«πŸ‡·

Base URL: https://api.mistral.ai/v1 | Get API Key

Free "Experiment" plan, no credit card. ~1B tokens/month. Prompts may be used to improve models.

Model Name & ID Context Max Out Modality Rate Limits
Mistral Medium 3.5 (128B)
mistral-medium-2604
256K 256K Text + Image + Code ~1 RPS, 500K TPM
Mistral Small 4
mistral-small-2603
256K 256K Text + Image + Code ~1 RPS, 500K TPM
Mistral Large 3
mistral-large-2411
256K 256K Text ~1 RPS, 500K TPM
Mistral Nemo (12B)
open-mistral-nemo
128K 128K Text ~1 RPS, 500K TPM
Codestral
codestral-2501
256K 256K Code ~1 RPS, 500K TPM
Pixtral Large
pixtral-large-2411
128K 128K Text + Image ~1 RPS, 500K TPM

Z AI (Zhipu AI) πŸ‡¨πŸ‡³

Base URL: https://open.bigmodel.cn/api/paas/v4 | Get API Key

Permanent free models, no credit card required.

Model Name & ID Context Max Out Modality Rate Limits
GLM-4.7-Flash
glm-4.7-flash
200K 128K Text 1 concurrent request
GLM-4.6V-Flash
glm-4.6v-flash
128K ~4K Text + Image 1 concurrent request

Multi-Model Inference Gateways

These providers aggregate multiple open architecture models, optimizing processing speed through hardware innovations or token routing networks.

Cerebras πŸ‡ΊπŸ‡Έ

Base URL: https://api.cerebras.ai/v1 | Explore Gateway

Free tier, no credit card. Ultra-fast inference (~2,600 tok/s). 1M tokens/day cap. 8K context cap on free tier.

Model Name & ID Context Max Out Modality Rate Limits
gpt-oss-120b
gpt-oss-120b
128K (8K on free) 8K Text 30 RPM, 14,400 RPD, 1M TPD
zai-glm-4.7
zai-glm-4.7
128K (8K on free) 8K Text 10 RPM, 100 RPD, 1M TPD

Cloudflare Workers AI πŸ‡ΊπŸ‡Έ

Base URL: https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run | Explore Gateway

10,000 Neurons/day free. 50+ models available on free tier.

Model Name & ID Context Max Out Modality Rate Limits
@cf/meta/llama-3.3-70b-instruct-fp8-fast 131K Shared w/ context Text 10K neurons/day (shared)
@cf/meta/llama-4-scout-17b-16e-instruct Up to 10M Shared w/ context Multimodal 10K neurons/day (shared)
@cf/openai/gpt-oss-120b 128K Shared w/ context Text 10K neurons/day (shared)
@cf/moonshotai/kimi-k2.7-code 262K Shared w/ context Text (code) 10K neurons/day (shared)
@cf/google/gemma-4-26b-a4b-it 256K Shared w/ context Text 10K neurons/day (shared)
@cf/zhipuai/glm-4.7-flash 131K Shared w/ context Text 10K neurons/day (shared)
@cf/mistralai/mistral-small-3.1-24b-instruct 128K Shared w/ context Text 10K neurons/day (shared)
@cf/deepseek-ai/deepseek-r1-distill-qwen-32b 32K Shared w/ context Text (reasoning) 10K neurons/day (shared)
+ 42 more models Varies Varies Text, Image, Audio, Embeddings 10K neurons/day (shared)

GitHub Models πŸ‡ΊπŸ‡Έ

Base URL: https://models.github.ai/inference | Explore Gateway

Free prototyping for all GitHub users. 45+ models. Per-request limits (8K in / 4K out).

Model Name & ID Context Max Out Modality Rate Limits
gpt-5
openai/gpt-5
200K 32K Text 10 RPM, 50 RPD
gpt-4.1
openai/gpt-4.1
1M 32K Text 10 RPM, 50 RPD
gpt-4.1-mini
openai/gpt-4.1-mini
1M 32K Text 15 RPM, 150 RPD
gpt-4o
openai/gpt-4o
128K 16K Text + Vision 10 RPM, 50 RPD
o4-mini
openai/o4-mini
200K 100K Text (reasoning) 10 RPM, 50 RPD
Llama-4-Scout-17B-16E
meta/Llama-4-Scout-17B-16E
512K ~4K Text + Vision 15 RPM, 150 RPD
Llama-4-Maverick-17B-128E
meta/Llama-4-Maverick-17B-128E
256K ~4K Text + Vision 10 RPM, 50 RPD
Meta-Llama-3.3-70B
meta/Meta-Llama-3.3-70B
131K ~4K Text 15 RPM, 150 RPD
DeepSeek-R1
deepseek/DeepSeek-R1
64K 8K Text (reasoning) 15 RPM, 150 RPD
Mistral-Small-3.1
mistralai/Mistral-Small-3.1
128K ~4K Text + Vision 15 RPM, 150 RPD
+ 35 more models Varies Varies Text / Image Varies by tier

Groq πŸ‡ΊπŸ‡Έ

Base URL: https://api.groq.com/openai/v1 | Explore Gateway

Free tier, no credit card. Ultra-fast LPU inference. [2]

Model Name & ID Context Max Out Modality Rate Limits
llama-3.3-70b-versatile 131K 32K Text 30 RPM, 1,000 RPD
llama-3.1-8b-instant 131K 131K Text 30 RPM, 1,000 RPD
llama-4-scout-17b-16e-instruct 131K 8K Text + Vision 30 RPM, 1,000 RPD
qwen3-32b 131K 131K Text 30 RPM, 1,000 RPD
gpt-oss-120b 131K 32K Text 30 RPM, 1,000 RPD

Hugging Face πŸ‡ΊπŸ‡Έ

Base URL: https://router.huggingface.co/v1 | Explore Gateway

100K monthly Inference Provider credits for free users. Routes to Fireworks, Together, Hyperbolic, Nebius, Novita, DeepInfra and others. Thousands of models.

Model Name & ID Context Max Out Modality Rate Limits
Meta-Llama-3.1-8B-Instruct
meta-llama/Meta-Llama-3.1-8B-Instruct
128K ~4K Text Credit-metered
Mistral-7B-Instruct-v0.3
mistralai/Mistral-7B-Instruct-v0.3
32K ~4K Text Credit-metered
Mixtral-8x7B-Instruct-v0.1
mistralai/Mixtral-8x7B-Instruct-v0.1
32K ~4K Text Credit-metered
Phi-3.5-mini-instruct
microsoft/Phi-3.5-mini-instruct
128K ~4K Text Credit-metered
Qwen2.5-7B-Instruct
Qwen/Qwen2.5-7B-Instruct
131K ~4K Text Credit-metered
+ thousands of community models Varies Varies Text, Image, Audio, Embeddings 100K credits/month free

Kilo Code πŸ‡ΊπŸ‡Έ

Base URL: https://api.kilo.ai/api/gateway | Explore Gateway

Free models with no credit card required. `kilo-auto/free` auto-router routes to minimax/minimax-m2.5:free (80%) and stepfun/step-3.5-flash:free (20%). [5]

Model Name & ID Context Max Out Modality Rate Limits
x-ai/grok-code-fast-1:free 256K Text (code) ~200 req/hr
minimax/minimax-m2.5:free 196K 8K Text ~200 req/hr
bytedance-seed/dola-seed-2.0-pro:free Text ~200 req/hr
nvidia/nemotron-3-super-120b-a12b:free 262K 32K Text ~200 req/hr
arcee-ai/trinity-large-thinking:free Text (reasoning) ~200 req/hr
openrouter/free Varies Varies Text ~200 req/hr

LLM7.io πŸ‡¬πŸ‡§

Base URL: https://api.llm7.io/v1 | Explore Gateway

Zero-friction API gateway. No registration needed for basic access. 30+ models. GDPR-compliant.

Model Name & ID Context Max Out Modality Rate Limits
deepseek-r1-0528 Text (reasoning) 30 RPM (120 with token)
deepseek-v3-0324 Text 30 RPM (120 with token)
gemini-2.5-flash-lite Text + Vision 30 RPM (120 with token)
gpt-4o-mini Text + Vision 30 RPM (120 with token)
mistral-small-3.1-24b 32K Text 30 RPM (120 with token)
qwen2.5-coder-32b Text (code) 30 RPM (120 with token)
+ ~24 more models Varies Varies Text 30 RPM (120 with token)

ModelScope πŸ‡¨πŸ‡³

Base URL: https://api-inference.modelscope.cn/v1 | Explore Gateway

Free API-Inference for registered users. Requires Alibaba Cloud account binding + real-name verification. [6]

Model Name & ID Context Max Out Modality Rate Limits
Qwen/Qwen3.5-35B-A3B Text 2,000 RPD total; <=500 RPD/model
Qwen/Qwen3.5-27B Text 2,000 RPD total; <=500 RPD/model
+ API-Inference-enabled models Varies Varies LLM, MLLM Dynamic quotas + concurrency

NVIDIA NIM πŸ‡ΊπŸ‡Έ

Base URL: https://integrate.api.nvidia.com/v1 | Explore Gateway

Free with NVIDIA Developer Program membership. 100+ models. Rate-limited (no daily token cap).

Model Name & ID Context Max Out Modality Rate Limits
deepseek-ai/deepseek-r1 128K ~163K Text (reasoning) ~40 RPM
nvidia/nemotron-3-super-120b-a12b 262K 262K Text ~40 RPM
nvidia/nemotron-3-nano-30b-a3b 128K 32K Text ~40 RPM
nvidia/llama-3.1-nemotron-ultra-253b-v1 128K 4K Text ~40 RPM
meta/llama-3.1-405b-instruct 128K 4K Text ~40 RPM
qwen/qwen2.5-72b-instruct 128K 8K Text ~40 RPM
google/gemma-4-31b 128K 8K Text ~40 RPM
mistralai/mistral-large-2-instruct 128K 4K Text ~40 RPM
minimax/minimax-m2.7 128K 8K Text ~40 RPM
+ 90 more models Varies Varies Text, Image, Video, Speech, Embeddings ~40 RPM

Ollama Cloud πŸ‡ΊπŸ‡Έ

Base URL: https://api.ollama.com | Explore Gateway

Free tier with qualitative usage limits. 400+ models from Ollama library. Not OpenAI SDK-compatible. [3]

Model Name & ID Context Max Out Modality Rate Limits
gpt-oss:120b-cloud 128K Model-dependent Text Session/weekly limits
deepseek-v3.1:671b-cloud 128K Model-dependent Text Session/weekly limits
qwen3-coder:480b-cloud 128K Model-dependent Text (code) Session/weekly limits
kimi-k2:1t-cloud 262K Model-dependent Text Session/weekly limits
glm-4.6:cloud 128K Model-dependent Text Session/weekly limits
deepseek-r1:cloud 128K Model-dependent Text (reasoning) Session/weekly limits
+ 30 more cloud models Varies Varies Text Session/weekly limits

OpenRouter πŸ‡ΊπŸ‡Έ

Base URL: https://openrouter.ai/api/v1 | Explore Gateway

~22 free models (marked with `:free` suffix). OpenAI SDK-compatible. [4]

Model Name & ID Context Max Out Modality Rate Limits
qwen/qwen3-coder:free 1M 262K Text (code) 20 RPM, 200 RPD
nvidia/nemotron-3-ultra-550b-a55b:free 1M 65K Text 20 RPM, 200 RPD
nvidia/nemotron-3-super-120b-a12b:free 1M 262K Text 20 RPM, 200 RPD
openai/gpt-oss-120b:free 131K 131K Text 20 RPM, 200 RPD
openai/gpt-oss-20b:free 131K 8K Text 20 RPM, 200 RPD
meta-llama/llama-3.3-70b-instruct:free 131K ~16K Text 20 RPM, 200 RPD
nousresearch/hermes-3-llama-3.1-405b:free 131K ~16K Text 20 RPM, 200 RPD
google/gemma-4-31b-it:free 262K 32K Multimodal 20 RPM, 200 RPD
poolside/laguna-m.1:free 262K 32K Text 20 RPM, 200 RPD
qwen/qwen3-next-80b-a3b-instruct:free 262K ~32K Text 20 RPM, 200 RPD
+ ~12 more free models Varies Varies Text / Image 20 RPM, 200 RPD

OVHcloud AI Endpoints πŸ‡«πŸ‡·

Base URL: https://oai.endpoints.kepler.ai.cloud.ovh.net/v1 | Explore Gateway

Free anonymous tier (no API key, no signup): 2 RPM per IP per model. 20+ open-weight models hosted in EU. OpenAI SDK-compatible. [7]

Model Name & ID Context Max Out Modality Rate Limits
Qwen3.5-397B-A17B 131K ~32K Text 2 RPM (anonymous)
gpt-oss-120b 128K ~32K Text 2 RPM (anonymous)
gpt-oss-20b 128K ~8K Text 2 RPM (anonymous)
Meta-Llama-3_3-70B-Instruct 131K ~4K Text 2 RPM (anonymous)
Llama-3.1-8B-Instruct 131K ~4K Text 2 RPM (anonymous)
Qwen3.6-27B 131K ~32K Text 2 RPM (anonymous)
Qwen3.5-9B 131K ~8K Text 2 RPM (anonymous)
Qwen3-32B 131K ~32K Text 2 RPM (anonymous)
Qwen3-Coder-30B-A3B-Instruct 262K ~32K Text (code) 2 RPM (anonymous)
Qwen2.5-VL-72B-Instruct 128K ~8K Text + Vision 2 RPM (anonymous)
Mistral-Small-3.2-24B-Instruct
Mistral-Small-3.2-24B-Instruct-2506
128K ~4K Text 2 RPM (anonymous)
Mistral-Nemo-Instruct-2407 128K ~4K Text 2 RPM (anonymous)
Mistral-7B-Instruct-v0.3 32K ~4K Text 2 RPM (anonymous)

SambaNova πŸ‡ΊπŸ‡Έ

Base URL: https://api.sambanova.ai/v1 | Explore Gateway

Free tier, no credit card. Ultra-fast RDU inference. 20 RPM, 200K tokens/day. [8]

Model Name & ID Context Max Out Modality Rate Limits
DeepSeek-V3.1 128K ~8K Text 20 RPM, 20 RPD, 200K TPD
DeepSeek-V3.2 (Preview) 128K ~8K Text 20 RPM, 20 RPD, 200K TPD
Meta-Llama-3.3-70B-Instruct 128K ~8K Text 20 RPM, 20 RPD, 200K TPD
gpt-oss-120b 128K ~8K Text 20 RPM, 20 RPD, 200K TPD
MiniMax-M2.7 128K ~8K Text 20 RPM, 20 RPD, 200K TPD
gemma-4-31B-it (Preview) 128K ~8K Text 20 RPM, 20 RPD, 200K TPD

SiliconFlow πŸ‡¨πŸ‡³

Base URL: https://api.siliconflow.cn/v1 | Explore Gateway

Permanently free models, no credit card required. 200+ paid models also available.

Model Name & ID Context Max Out Modality Rate Limits
Qwen/Qwen3-8B 131K 131K Text 30 RPM, 60K TPM
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B 131K Configurable Text (reasoning) 30 RPM, 60K TPM

Platform Disclaimers & Operational Notes

  1. Free tier not available in the EU, UK, or Switzerland (available regions).
  2. Groq rate limits were reduced in 2026. Most models now get 1,000 RPD on the free tier (down from 14,400). Llama 4 Maverick has been deprecated. See rate limits.
  3. Ollama Cloud measures usage by GPU time, not tokens or requests. Free tier described as "light usage" with session limits resetting every 5 hours and weekly limits every 7 days. Pro (50x more) and Max (250x more) plans available. Not OpenAI SDK-compatible; uses the Ollama API.
  4. Free models default to 200 RPD per model. A one-time purchase of $10+ in credits unlocks 1,000 RPD for free models. OpenRouter also offers a Free Models Router (openrouter/free) and model fallbacks for chaining models in priority order. Free providers may log prompts for training.
  5. Kilo Code free model list may change over time. nvidia/nemotron-3-super-120b-a12b:free is for trial use only — prompts are logged by NVIDIA. Auto-router kilo-auto/free routes to minimax/minimax-m2.5:free (80%) and stepfun/step-3.5-flash:free (20%).
  6. API-Inference is free for registered users. Current published limits are 2,000 requests/day per user (total across models), with per-model daily quotas dynamically adjusted and capped at 500; concurrency is also dynamically rate-limited. Requires Alibaba Cloud account binding and real-name verification (limits, intro).
  7. OVHcloud AI Endpoints offers a permanent free anonymous tier (2 requests per minute per IP, per model) with no signup or API key required. Higher rate limits (400 RPM per Public Cloud project per model) require an API key and are billed pay-as-you-go per token; new Public Cloud accounts get up to $200 in free trial credits. Models are hosted in EU data centers.
  8. SambaNova grants $5 in initial credits (valid 30 days) on top of the permanent free tier. The free tier itself persists indefinitely with 20 RPM, 20 RPD, and 200K TPD per model. No credit card required. OpenAI SDK-compatible.

Developer Glossary: API Rate Limit Metrics

RPM
Requests per minute
RPD
Requests per day
TPM
Tokens per minute
TPD
Tokens per day
RPS
Requests per second

Data compiled for architectural reference. Capabilities, deprecations, and endpoints reflect structural configurations accurate as of mid-2026.

Previous Post
No comment
Add Comments
comment url
Check out our product collection on Shopee.
View Products