Multi-Provider AI Gateway
High-availability AI routing microservice
Overview
The AI Gateway is a Python FastAPI microservice that acts as a smart proxy in front of multiple AI providers. It selects the best provider based on availability, cost budget, and routing rules — and falls back automatically when a provider fails or exceeds its budget. Responses are semantically cached in Redis to cut AI costs by up to 40%.
Key Features
How It Works
Request Arrives
NestJS backend sends a prompt to the AI Gateway REST endpoint with task type (score, parse, optimize, test).
Cache Check
Gateway computes a semantic hash of the prompt and checks Redis. If a similar request was made recently, returns the cached response instantly.
Provider Selection
If no cache hit, the router checks which providers are available and under budget. Picks the highest-priority provider.
AI Call + Fallback
Calls the selected provider. If it fails (timeout / rate limit), automatically retries with the next provider in the chain.
Log & Return
Logs token count, cost, and latency to the usage_log table. Returns structured JSON response to the NestJS backend.
Tech Stack
Runtime
- Python 3.11
- FastAPI
- Uvicorn
AI Providers
- Google Gemini
- OpenAI GPT-4o
- Anthropic Claude
- Ollama (LLaMA)
Caching
- Redis
- Semantic similarity hashing
Queue
- Background task queue
- Async workers
Data
- PostgreSQL (usage logs)
- Pydantic schemas
Architecture
The gateway sits between the NestJS backend and all AI providers, acting as a smart router with caching, budget control, and async processing built in.
Data Flow
- 1Request in → FastAPI validates schema → Router checks Redis cache for semantic match.
- 2Cache miss → Router picks provider by priority and budget → calls AI provider API.
- 3Provider fails → automatic fallback to next in chain (Gemini → OpenAI → Claude → Ollama).
- 4Response → cached in Redis with semantic key → usage logged to PostgreSQL → returned to NestJS.
- 5Heavy tasks (large resume processing) queued as async background jobs to avoid request timeout.
Interested in building something similar?
Get in Touch