Caching Strategies with Redis#
Introduction#
In the previous module, we learnt REST API Design & Security, ensuring our application is structured and secure. However, security and structure mean nothing if your API is slow.
As your application scales, fetching data directly from the primary database (PostgreSQL/MySQL) for every request becomes a bottleneck. Redis (Remote Dictionary Server) solves this by storing frequently accessed data in memory, delivering sub-millisecond response times. This guide moves beyond basic key-value storage to professional caching strategies and optimization for modern AI applications.
Redis Fundamentals#
Redis is an in-memory data structure store, used as a database, cache, and message broker. Unlike traditional databases that write to disk (slow), Redis keeps everything in RAM (fast).
Core Data Structures#
You must choose the right tool for the job. Do not just use strings for everything.
Strings: Basic text or binary data. Ideal for caching HTML fragments, API responses (JSON), or session tokens.
Lists: Linked lists of strings. ideal for Message Queues or Chat History (keeping the last N messages).
Hashes: Maps between string fields and string values. Perfect for storing User Profiles or Object attributes.
Sets: Unordered collection of unique strings. Good for Tags or Social Graph (followers).
Sorted Sets (ZSet): Unique strings ordered by a score. Mandatory for Leaderboards or Rate Limiting.
Basic Operations#
import redis
# Connection
r = redis.Redis(host='localhost', port=6379, db=0)
# String: Set Key with 60s expiration (TTL)
r.set('user:123', '{"name": "Alice"}', ex=60)
# List: Add message to chat history
r.rpush('chat:session:abc', 'Hello!')
Caching Patterns#
Cache-Aside (Lazy Loading)#
This is the most common pattern. The application code is responsible for loading data into the cache.
The Flow:
Application receives a request (e.g.,
GET /user/123).Cache Hit (Data exists in Redis):
Redis returns the data immediately.
Application returns data to the user.
Cache Miss (Data NOT in Redis):
Application queries the Database for the data.
Application Writes the data to Redis with a TTL (Time To Live).
Application returns data to the user.
Pros: Resilient to cache failure (app falls back to DB). Data is only cached when requested. Cons: Initial request (“Cold Start”) is slower.
def get_user_profile(user_id):
cache_key = f"user:{user_id}"
# 1. Try Cache
cached_data = redis_client.get(cache_key)
if cached_data:
return json.loads(cached_data)
# 2. Cache Miss -> DB
user = db.query(User).get(user_id)
# 3. Write to Cache (Critical: Always set TTL)
if user:
redis_client.set(cache_key, json.dumps(user.dict()), ex=300) # 5 min TTL
return user
Write-Through vs Write-Behind#
Write-Through: Write to Cache AND DB simultaneously. Ensures consistency but slower write latency.
Write-Behind: Write to Cache immediately, update DB asynchronously. Extremely fast writes but risk of data loss if cache crashes before DB update.
TTL (Time to Live) & Cost Optimization#
Why use TTL? (Cost Management)#
Redis stores data in RAM, which is significantly more expensive than Disk storage (SSD/HDD). “Time to Live” is not just about data freshness; it is a critical strategy for Cost Optimization.
If you cache everything without expiration:
Memory Leak: Redis will eventually run out of RAM.
High Expense: Scaling Redis to 100GB+ of RAM is very costly compared to a 100GB Database.
Connection Saturation: Old, unused keys clog up the system, potentially slowing down connections for active users.
Cloud Costs (AWS ElastiCache): Managed services charge by the hour/node. If you don’t expire data, you are forced to upgrade to larger instances (vertical scaling) just to store stale data, significantly increasing your hourly bill.
Strategy: “Lease” the Cache#
Think of caching as “leasing” memory space. You do not own it forever; you rent it for a specific purpose.
Short Lease (Seconds/Minutes): For volatile data that changes fast or is only needed instantly (e.g., Real-time analytics, user session active state).
Medium Lease (Hours): For user-specific content that might be viewed multiple times in a session but isn’t permanent (e.g., Conversation History).
Eviction Policy: When Redis is full, it must delete data. Explicit TTLs help Redis delete the right data (expired) instead of randomly evicting useful data.
Cache Invalidation#
The hardest problem in Computer Science.
Time-based: Rely on TTL (Passive). This is the safest way to ensure you don’t pay for “dead” data.
Event-based: Explicitly
deletethe cache key when data is updated (Active).Example: When
PATCH /user/123is called, performredis.delete("user:123").
Caching for RAG Applications#
In the era of GenAI, Redis is critical for reducing LLM costs and latency.
1. Caching Chat History (Memory)#
LLMs are stateless. To have a conversation, you must send the entire chat history with every prompt. Fetching this from Postgres every time is inefficient.
Strategy: Store the “CONTEXT” window in Redis Lists.
Key:
chat:{conversation_id}Value: List of JSON objects
[{"role": "user", "content": "..."}, ...]Optimization: Use
LTRIMto keep only the last 20 messages.
def add_message(conv_id, role, content):
key = f"chat:{conv_id}"
msg = json.dumps({"role": role, "content": content})
pipe = redis_client.pipeline()
pipe.rpush(key, msg)
pipe.ltrim(key, -20, -1) # Keep only last 20 items
pipe.expire(key, 86400) # 24h TTL
pipe.execute()
2. Semantic Caching (Embeddings)#
Users often ask similar questions (e.g., “Reset password?” vs “How to change password?”). Standard cache misses these because the strings are different.
Strategy:
Vectorize the user query (Embedding).
Search Redis Vector Store for similar past queries (Cosine Similarity > 0.95).
If found, return the cached answer. This saves an expensive LLM call.
Summary#
Redis is not just a “fast database”. It is a strategic layer that protects your primary database and enhances user experience. Whether it’s caching user sessions, API responses, or LLM context, mastering Redis is essential for building high-performance, scalable systems.