A Python-based ETL pipeline that reads raw artist data from a CSV file, validates and deduplicates rows by artist ID, transforms column types, and outputs clean data to both a PostgreSQL database and ...
Implemented pandas-based cleaning rules in data_preprocessing.py, transformations for salesorder.csv → clean_salesorder.csv, pipeline testing via multiple DAG runs.