Assignment: LLM Observability with LangFuse & LangSmith#

Assignment Metadata#

Field	Description
Assignment Name	LLM Observability Implementation
Course	LLMOps and Evaluation
Project Name	`llm-observability-lab`
Estimated Time	120 minutes
Framework	Python 3.10+, LangChain, LangFuse, LangSmith, OpenAI API

By completing this assignment, you will be able to:

You are building a production-ready RAG chatbot application. Without observability, you face:

Your task is to instrument this application with comprehensive observability.

Python 3.10 or higher
Required packages:
- langfuse >= 2.0.0
- langchain >= 0.1.0
- langchain-openai >= 0.0.5
- openai >= 1.0.0

Set up LangFuse environment:
- Create a LangFuse Cloud account or deploy locally with Docker
- Configure API keys and environment variables
- Verify connectivity
Implement tracing for a LangChain application:
- Create a RAG chain with retrieval and generation steps
- Add CallbackHandler to capture all traces
- Verify traces appear in the LangFuse dashboard
Implement cost tracking:
- Capture token usage for each LLM call
- Calculate costs based on model pricing
- Display cost breakdown per session
Document:
- Screenshot of trace visualization in LangFuse
- Cost breakdown for at least 10 queries

Configure LangSmith auto-tracing:
- Set environment variables for automatic instrumentation
- Create a project for your application
- Verify traces are captured
Build a RAG pipeline with detailed tracing:
- Implement document retrieval step
- Implement LLM generation step
- Capture intermediate states
Use the Playground for debugging:
- Identify a failed or low-quality response
- Open the trace in the Playground
- Modify the prompt and re-run
- Document the improvement
Create a test dataset:
- Export 5 production traces to a dataset
- Run evaluation on the dataset
- Compare results across prompt versions

Write a recommendation (200-300 words):
- Which tool would you choose for different scenarios?
- What are the key trade-offs?

Implement sampling:
- Configure 100% tracing for development
- Configure 5% sampling for production simulation
- Add “High Importance” flag for error traces
Implement PII handling:
- Create a masking function for sensitive data
- Apply to traces before sending to observability tools
- Test with sample PII data
Design an alerting strategy:
- Define thresholds for error rate, latency, and cost
- Document alert rules (pseudo-code or tool configuration)
- Create a runbook for each alert type

Criteria	Points
LangFuse integration & tracing	30
LangSmith integration & debugging	30
Comparison analysis quality	20
Production best practices	15
Code quality and documentation	5
Total	100

Start with LangSmith as it requires minimal code changes (just environment variables)
Use LangFuse’s prompt management for version control of prompts
When comparing tools, focus on real usage scenarios from your experience
For PII masking, consider regex patterns for emails, phone numbers, and credit cards
Set up alerts using webhook integrations or existing monitoring tools