Overview
Data Engineering focuses on designing, building, and optimizing systems for data collection, storage, and processing. I specialize in creating scalable ETL pipelines, managing data warehouses, and enabling robust analytics.
Skills and Tools
- **ETL Pipelines**: Airflow, Luigi
- **Data Warehousing**: BigQuery, Snowflake
- **Programming**: Python, SQL, Spark
- **Cloud Platforms**: AWS, GCP
- **Database Management**: PostgreSQL, MySQL
Featured Projects
- ETL Optimization for Streaming Data
- **Objective**: Built an ETL pipeline for real-time data ingestion.
- **Tools Used**: Apache Kafka, Airflow, and BigQuery.
- **Outcome**: Reduced processing time by 40% and enabled near real-time analytics.
Tutorials and Resources
- [Building Scalable ETL Pipelines](https://github.com/andy/etl-pipelines)
- [Optimizing SQL Queries for Data Warehousing](https://andy-blog.com/sql-tips)
Achievements
- Google Cloud Professional Data Engineer (2024)
- Published "Scaling Data Pipelines in the Cloud" on Medium