Back to Projects
Internal

Multi-Provider AI Gateway

High-availability AI routing microservice

PythonFastAPIRedisGeminiOpenAIClaude

Overview

The AI Gateway is a Python FastAPI microservice that acts as a smart proxy in front of multiple AI providers. It selects the best provider based on availability, cost budget, and routing rules — and falls back automatically when a provider fails or exceeds its budget. Responses are semantically cached in Redis to cut AI costs by up to 40%.

Key Features

Provider routing: Gemini → OpenAI → Claude → Ollama (configurable order)
Automatic fallback on provider timeout, rate limit, or error
Per-provider budget caps with live usage tracking
Semantic response caching via Redis (similarity-based key matching)
Async job queue for non-blocking resume processing
Resume parsing with structured JSON output
Usage logging to PostgreSQL for cost analytics
REST API consumed by the NestJS backend

How It Works

1

Request Arrives

NestJS backend sends a prompt to the AI Gateway REST endpoint with task type (score, parse, optimize, test).

2

Cache Check

Gateway computes a semantic hash of the prompt and checks Redis. If a similar request was made recently, returns the cached response instantly.

3

Provider Selection

If no cache hit, the router checks which providers are available and under budget. Picks the highest-priority provider.

4

AI Call + Fallback

Calls the selected provider. If it fails (timeout / rate limit), automatically retries with the next provider in the chain.

5

Log & Return

Logs token count, cost, and latency to the usage_log table. Returns structured JSON response to the NestJS backend.

Tech Stack

Runtime

  • Python 3.11
  • FastAPI
  • Uvicorn

AI Providers

  • Google Gemini
  • OpenAI GPT-4o
  • Anthropic Claude
  • Ollama (LLaMA)

Caching

  • Redis
  • Semantic similarity hashing

Queue

  • Background task queue
  • Async workers

Data

  • PostgreSQL (usage logs)
  • Pydantic schemas

Architecture

The gateway sits between the NestJS backend and all AI providers, acting as a smart router with caching, budget control, and async processing built in.

Consumers
NestJS Backend
Caller service
Gateway Core
FastAPI Server
Routes & validation
Provider Router
Priority + budget logic
Redis Cache
Semantic hashing
Async Queue
Background workers
AI Providers
Gemini API
Primary · Cheapest
OpenAI GPT-4o
Fallback #1
Claude API
Fallback #2
Ollama (LLaMA)
Local · Free
Storage
PostgreSQL
Usage logs & cost

Data Flow

  1. 1Request in → FastAPI validates schema → Router checks Redis cache for semantic match.
  2. 2Cache miss → Router picks provider by priority and budget → calls AI provider API.
  3. 3Provider fails → automatic fallback to next in chain (Gemini → OpenAI → Claude → Ollama).
  4. 4Response → cached in Redis with semantic key → usage logged to PostgreSQL → returned to NestJS.
  5. 5Heavy tasks (large resume processing) queued as async background jobs to avoid request timeout.

Interested in building something similar?

Get in Touch