Caching Strategies with Redis#

Introduction#

In the previous module, we learnt REST API Design & Security, ensuring our application is structured and secure. However, security and structure mean nothing if your API is slow.

As your application scales, fetching data directly from the primary database (PostgreSQL/MySQL) for every request becomes a bottleneck. Redis (Remote Dictionary Server) solves this by storing frequently accessed data in memory, delivering sub-millisecond response times. This guide moves beyond basic key-value storage to professional caching strategies and optimization for modern AI applications.

Redis Fundamentals#

Redis is an in-memory data structure store, used as a database, cache, and message broker. Unlike traditional databases that write to disk (slow), Redis keeps everything in RAM (fast).

Core Data Structures#

You must choose the right tool for the job. Do not just use strings for everything.

  1. Strings: Basic text or binary data. Ideal for caching HTML fragments, API responses (JSON), or session tokens.

  2. Lists: Linked lists of strings. ideal for Message Queues or Chat History (keeping the last N messages).

  3. Hashes: Maps between string fields and string values. Perfect for storing User Profiles or Object attributes.

  4. Sets: Unordered collection of unique strings. Good for Tags or Social Graph (followers).

  5. Sorted Sets (ZSet): Unique strings ordered by a score. Mandatory for Leaderboards or Rate Limiting.

Basic Operations#

import redis

# Connection
r = redis.Redis(host='localhost', port=6379, db=0)

# String: Set Key with 60s expiration (TTL)
r.set('user:123', '{"name": "Alice"}', ex=60)

# List: Add message to chat history
r.rpush('chat:session:abc', 'Hello!')

Caching Patterns#

Cache-Aside (Lazy Loading)#

This is the most common pattern. The application code is responsible for loading data into the cache.

The Flow:

  1. Application receives a request (e.g., GET /user/123).

  2. Cache Hit (Data exists in Redis):

    • Redis returns the data immediately.

    • Application returns data to the user.

  3. Cache Miss (Data NOT in Redis):

    • Application queries the Database for the data.

    • Application Writes the data to Redis with a TTL (Time To Live).

    • Application returns data to the user.

Pros: Resilient to cache failure (app falls back to DB). Data is only cached when requested. Cons: Initial request (“Cold Start”) is slower.

def get_user_profile(user_id):
    cache_key = f"user:{user_id}"

    # 1. Try Cache
    cached_data = redis_client.get(cache_key)
    if cached_data:
        return json.loads(cached_data)

    # 2. Cache Miss -> DB
    user = db.query(User).get(user_id)

    # 3. Write to Cache (Critical: Always set TTL)
    if user:
        redis_client.set(cache_key, json.dumps(user.dict()), ex=300) # 5 min TTL

    return user

Write-Through vs Write-Behind#

  • Write-Through: Write to Cache AND DB simultaneously. Ensures consistency but slower write latency.

  • Write-Behind: Write to Cache immediately, update DB asynchronously. Extremely fast writes but risk of data loss if cache crashes before DB update.

TTL (Time to Live) & Cost Optimization#

Why use TTL? (Cost Management)#

Redis stores data in RAM, which is significantly more expensive than Disk storage (SSD/HDD). “Time to Live” is not just about data freshness; it is a critical strategy for Cost Optimization.

If you cache everything without expiration:

  1. Memory Leak: Redis will eventually run out of RAM.

  2. High Expense: Scaling Redis to 100GB+ of RAM is very costly compared to a 100GB Database.

  3. Connection Saturation: Old, unused keys clog up the system, potentially slowing down connections for active users.

  4. Cloud Costs (AWS ElastiCache): Managed services charge by the hour/node. If you don’t expire data, you are forced to upgrade to larger instances (vertical scaling) just to store stale data, significantly increasing your hourly bill.

Strategy: “Lease” the Cache#

Think of caching as “leasing” memory space. You do not own it forever; you rent it for a specific purpose.

  • Short Lease (Seconds/Minutes): For volatile data that changes fast or is only needed instantly (e.g., Real-time analytics, user session active state).

  • Medium Lease (Hours): For user-specific content that might be viewed multiple times in a session but isn’t permanent (e.g., Conversation History).

  • Eviction Policy: When Redis is full, it must delete data. Explicit TTLs help Redis delete the right data (expired) instead of randomly evicting useful data.

Cache Invalidation#

The hardest problem in Computer Science.

  • Time-based: Rely on TTL (Passive). This is the safest way to ensure you don’t pay for “dead” data.

  • Event-based: Explicitly delete the cache key when data is updated (Active).

    • Example: When PATCH /user/123 is called, perform redis.delete("user:123").

Caching for RAG Applications#

In the era of GenAI, Redis is critical for reducing LLM costs and latency.

1. Caching Chat History (Memory)#

LLMs are stateless. To have a conversation, you must send the entire chat history with every prompt. Fetching this from Postgres every time is inefficient.

Strategy: Store the “CONTEXT” window in Redis Lists.

  • Key: chat:{conversation_id}

  • Value: List of JSON objects [{"role": "user", "content": "..."}, ...]

  • Optimization: Use LTRIM to keep only the last 20 messages.

def add_message(conv_id, role, content):
    key = f"chat:{conv_id}"
    msg = json.dumps({"role": role, "content": content})

    pipe = redis_client.pipeline()
    pipe.rpush(key, msg)
    pipe.ltrim(key, -20, -1) # Keep only last 20 items
    pipe.expire(key, 86400)  # 24h TTL
    pipe.execute()

2. Semantic Caching (Embeddings)#

Users often ask similar questions (e.g., “Reset password?” vs “How to change password?”). Standard cache misses these because the strings are different.

Strategy:

  1. Vectorize the user query (Embedding).

  2. Search Redis Vector Store for similar past queries (Cosine Similarity > 0.95).

  3. If found, return the cached answer. This saves an expensive LLM call.

Summary#

Redis is not just a “fast database”. It is a strategic layer that protects your primary database and enhances user experience. Whether it’s caching user sessions, API responses, or LLM context, mastering Redis is essential for building high-performance, scalable systems.

References#