Data Engineering

Overview

Data Engineering focuses on designing, building, and optimizing systems for data collection, storage, and processing. I specialize in creating scalable ETL pipelines, managing data warehouses, and enabling robust analytics.

Skills and Tools

**ETL Pipelines**: Airflow, Luigi
**Data Warehousing**: BigQuery, Snowflake
**Programming**: Python, SQL, Spark
**Cloud Platforms**: AWS, GCP
**Database Management**: PostgreSQL, MySQL

Featured Projects

1. 1. ETL Optimization for Streaming Data

Objective: Built an ETL pipeline for real-time data ingestion
Tools Used: Apache Kafka, Airflow, and BigQuery
Outcome: Reduced processing time by 40% and enabled near real-time analytics

1. 1. ETL Optimization for Batch Data

Objective: Build an ETL/ELT pipeline for async batch data from 300+ data collectors for Harvard University
Tools Used: Amazon Web Services (AWS), Google Cloud Platform (GCP), Python
Outcome:

Tutorials and Resources

[Building Scalable ETL Pipelines](https://github.com/andy/etl-pipelines)
[Optimizing SQL Queries for Data Warehousing](https://andy-blog.com/sql-tips)

Achievements

Google Cloud Professional Data Engineer (2024)
Published "Scaling Data Pipelines in the Cloud" on Medium