Watcher
Getting Started
Installation
Prerequisites
System Requirements
Installation Steps
Verification
Quick Start Guide
Step 1: Create Your First Pipeline
Step 2: Start a Pipeline Execution
Step 3: Create Address Lineage
Step 4: Set Up Monitoring
Step 5: Configure Anomaly Detection
Step 6: Web Pages
Next Steps
Configuration
Environment Variables
Development Environment
Production Environment
Database Configuration
Connection Pool Default Settings
Redis Configuration
Connection Default Settings
Celery Default Configuration
Monitoring Configuration
Logfire Integration
Slack Notifications
Feature Flags
Auto-Create Anomaly Detection Rules
Profiling
API Reference
API Endpoints
Pipeline Management
Create or Get Pipeline
List Pipelines
Get Pipeline by ID
Update Pipeline
Pipeline Execution
Start Pipeline Execution
End Pipeline Execution
Get Pipeline Execution
Pipeline Types
Create or Get Pipeline Type
List Pipeline Types
Get Pipeline Type by ID
Address Management
Create or Get Address
List Addresses
Get Address by ID
Update Address
Address Types
Create or Get Address Type
List Address Types
Get Address Type by ID
Address Lineage
Create Address Lineage
Anomaly Detection
Create or Get Anomaly Detection Rule
List Anomaly Detection Rules
Get Anomaly Detection Rule by ID
Update Anomaly Detection Rule
Unflag Anomalies
Monitoring & Health
Check Timeliness
Check Freshness
Log Cleanup
Celery Queue Monitoring
Pydantic Data Models
Pipeline Models
PipelinePostInput
PipelinePostOutput
PipelinePatchInput
Pipeline Type Models
PipelineTypePostInput
PipelineTypePostOutput
PipelineTypePatchInput
Pipeline Execution Models
PipelineExecutionStartInput
PipelineExecutionStartOutput
PipelineExecutionEndInput
Address Models
AddressPostInput
AddressPostOutput
AddressPatchInput
Address Type Models
AddressTypePostInput
AddressTypePostOutput
AddressTypePatchInput
Address Lineage Models
AddressLineagePostInput
AddressLineagePostOutput
AddressLineageGetOutput
AddressLineageClosureGetOutput
Anomaly Detection Models
AnomalyDetectionRulePostInput
AnomalyDetectionRulePostOutput
AnomalyDetectionRulePatchInput
UnflagAnomalyInput
Monitoring Models
FreshnessPostOutput
TimelinessPostInput
TimelinessPostOutput
Log Cleanup Models
LogCleanupPostInput
LogCleanupPostOutput
Enums
AnomalyMetricFieldEnum
DatePartEnum
ValidatorModel
User Interface
Web Pages
Diagnostics Web Page
Lineage Graph Web Page
Reporting Dashboard Web Page
Interactive API Documentation
User Guides
Address Lineage
Understanding Address Lineage
Creating Lineage Relationships
Querying Lineage
Closure Table Pattern
Pipeline Integration
Managing Load Lineage Flag
Naming Conventions
Address Naming Convention
Address Type Organization
Anomaly Detection
How It Works
Statistical Analysis
Supported Metrics
Setting Up Anomaly Detection
Creating Detection Rules
Rule Configuration
Multiple Rules
Automatic Execution
Triggered Execution
No Manual Triggering Required
Anomaly Results
Understanding Results
Result Fields
Alert Notifications
Slack Alerts
Alert Configuration
Managing Anomalies
Viewing Anomalies
Updating Rules
Unflagging Anomalies
Adjusting Thresholds
Understanding the Data
Tuning Process
Best Practices
Rule Configuration
Threshold Selection
Monitoring Strategy
Common Scenarios
Data Volume Anomalies
Performance Anomalies
Throughput Anomalies
DML Operation Anomalies
Advanced Configuration
Auto-Creation Rules
Custom Database Querying
Index Utilization Patterns
Date-Based Queries
Time-Based Queries
Common Query Patterns
Pipeline Execution Analysis
Anomaly Detection Analysis
Lineage Analysis
Hierarchical Execution Analysis
Best Practices
Performance Considerations
Query Optimization
Safety Guidelines
Common Use Cases
Reporting and Analytics
Daily Pipeline Report Materialized View
Operational Monitoring
Data Quality Analysis
Advanced Patterns
Comparative Analysis
ETL Script Example
Overview
ETL Example Script
Key Features Demonstrated
Log Cleanup & Maintenance
How It Works
Configuration
API Usage
Cleanup Process
Scheduled Cleanup
Monitoring & Health Checks
Freshness Monitoring
Purpose
Configuration
Supported Time Units
Running Freshness Checks
Timeliness Monitoring
Purpose
Configuration
Running Timeliness Checks
Celery Queue Monitoring
Alert Thresholds
Scheduled Monitoring Tasks
System Health Monitoring
Diagnostics Dashboard
Pipeline Reporting Dashboard
Alerting Configuration
Slack Integration
Alert Types
Monitoring Strategy
Automated Monitoring
Load Testing
Trigger Load Tests
Load Test Scenarios
Performance Targets
Pipeline Management
Creating Pipelines
Basic Pipeline Creation
Pipeline Configuration
Managing Active Status
Pipeline Execution
Starting and Ending Executions
Execution Patterns
Pipeline Updates
Common Update Scenarios
Nested Pipeline Executions
Querying Nested Executions
Pipeline Organization
Pipeline Type Organization
Pipeline Naming Convention
Recommended Implementation - SDK
Installation
Key Processes
Define Pipeline Configuration
Initialize Watcher Client
Sync Pipeline Configuration
Track Pipeline Execution
Complete Example
Benefits of Using the SDK
Watermark Management
Understanding Watermarks
Watermark Patterns
Watermark Increment Logic
How to Use Watermarks
Practical Example
Reference
Architecture & Design
Design Philosophy
Configuration as Code
Efficiency & Performance
Scalability
Reliability
Observability
High-Level Architecture
FastAPI Framework
PostgreSQL Database
Celery Background Processing
Redis Message Broker
Docker Containerization
Logfire Integration
Performance Design Goals
Celery Tasks
Task Types
Regular Tasks
detect_anomalies_task
freshness_check_task
timeliness_check_task
address_lineage_closure_rebuild_task
pipeline_execution_closure_maintain_task
Scheduled Tasks
scheduled_freshness_check
scheduled_timeliness_check
scheduled_celery_queue_health_check
Task Configuration
Queue Management
Rate Limiting
Celery Beat Configuration
Retry Policies
Error Handling
Task Monitoring
Task Status Tracking
Progress Updates
Error Details
Database Schema
Core Tables
Pipeline
Pipeline Type
Pipeline Execution
Pipeline Execution Closure
Address
Address Type
Address Lineage
Address Lineage Closure
Monitoring Tables
Timeliness Pipeline Execution Log
Freshness Pipeline Log
Anomaly Detection
Anomaly Detection Rule
Anomaly Detection Result
Data Relationships
Hierarchical Relationships
Many-to-Many Relationships
Development - Kubernetes
Prerequisites
Installation Steps
Install kubectl
Install Helm
Verify Installation
Deploy to Kubernetes
Quick Start with Make Commands (Recommended)
Access Your Application
Stop the Development Environment
Manual Development Setup - Alternative
Start Dependencies
Build Docker Image
Deploy with Helm
Access the Application
Cleanup
Remove the Deployment
Watcher
Index
Index