(Created page with "== Overview == Data Engineering focuses on designing, building, and optimizing systems for data collection, storage, and processing. I specialize in creating scalable ETL pipelines, managing data warehouses, and enabling robust analytics. == Skills and Tools == * **ETL Pipelines**: Airflow, Luigi * **Data Warehousing**: BigQuery, Snowflake * **Programming**: Python, SQL, Spark * **Cloud Platforms**: AWS, GCP * **Database Management**: PostgreSQL, MySQL == Featured Proj...") |
No edit summary |
||
Line 11: | Line 11: | ||
== Featured Projects == | == Featured Projects == | ||
### ETL Optimization for Streaming Data | ### ETL Optimization for Streaming Data | ||
* | * '''Objective''': Built an ETL pipeline for real-time data ingestion | ||
* | * '''Tools Used''': Apache Kafka, Airflow, and BigQuery | ||
* | * '''Outcome''': Reduced processing time by 40% and enabled near real-time analytics | ||
### ETL Optimization for Batch Data | |||
* '''Objective''': Build an ETL/ELT pipeline for async batch data from 300+ data collectors for Harvard University | |||
* '''Tools Used''': Amazon Web Services (AWS), Google Cloud Platform (GCP), Python | |||
* '''Outcome''': | |||
== Tutorials and Resources == | == Tutorials and Resources == |
Latest revision as of 18:54, 25 November 2024
Overview
Data Engineering focuses on designing, building, and optimizing systems for data collection, storage, and processing. I specialize in creating scalable ETL pipelines, managing data warehouses, and enabling robust analytics.
Skills and Tools
- **ETL Pipelines**: Airflow, Luigi
- **Data Warehousing**: BigQuery, Snowflake
- **Programming**: Python, SQL, Spark
- **Cloud Platforms**: AWS, GCP
- **Database Management**: PostgreSQL, MySQL
Featured Projects
- ETL Optimization for Streaming Data
- Objective: Built an ETL pipeline for real-time data ingestion
- Tools Used: Apache Kafka, Airflow, and BigQuery
- Outcome: Reduced processing time by 40% and enabled near real-time analytics
- ETL Optimization for Batch Data
- Objective: Build an ETL/ELT pipeline for async batch data from 300+ data collectors for Harvard University
- Tools Used: Amazon Web Services (AWS), Google Cloud Platform (GCP), Python
- Outcome:
Tutorials and Resources
- [Building Scalable ETL Pipelines](https://github.com/andy/etl-pipelines)
- [Optimizing SQL Queries for Data Warehousing](https://andy-blog.com/sql-tips)
Achievements
- Google Cloud Professional Data Engineer (2024)
- Published "Scaling Data Pipelines in the Cloud" on Medium