Overview
Data Engineering focuses on designing, building, and optimizing systems for data collection, storage, and processing. I specialize in creating scalable ETL pipelines, managing data warehouses, and enabling robust analytics.
Skills and Tools
- **ETL Pipelines**: Airflow, Luigi
- **Data Warehousing**: BigQuery, Snowflake
- **Programming**: Python, SQL, Spark
- **Cloud Platforms**: AWS, GCP
- **Database Management**: PostgreSQL, MySQL
Featured Projects
- ETL Optimization for Streaming Data
- Objective: Built an ETL pipeline for real-time data ingestion
- Tools Used: Apache Kafka, Airflow, and BigQuery
- Outcome: Reduced processing time by 40% and enabled near real-time analytics
- ETL Optimization for Batch Data
- Objective: Build an ETL/ELT pipeline for async batch data from 300+ data collectors for Harvard University
- Tools Used: Amazon Web Services (AWS), Google Cloud Platform (GCP), Python
- Outcome:
Tutorials and Resources
- [Building Scalable ETL Pipelines](https://github.com/andy/etl-pipelines)
- [Optimizing SQL Queries for Data Warehousing](https://andy-blog.com/sql-tips)
Achievements
- Google Cloud Professional Data Engineer (2024)
- Published "Scaling Data Pipelines in the Cloud" on Medium