Data Engineering: Difference between revisions

Latest revision as of 18:54, 25 November 2024

Overview

Data Engineering focuses on designing, building, and optimizing systems for data collection, storage, and processing. I specialize in creating scalable ETL pipelines, managing data warehouses, and enabling robust analytics.

Skills and Tools

**ETL Pipelines**: Airflow, Luigi
**Data Warehousing**: BigQuery, Snowflake
**Programming**: Python, SQL, Spark
**Cloud Platforms**: AWS, GCP
**Database Management**: PostgreSQL, MySQL

Featured Projects

1. 1. ETL Optimization for Streaming Data

Objective: Built an ETL pipeline for real-time data ingestion
Tools Used: Apache Kafka, Airflow, and BigQuery
Outcome: Reduced processing time by 40% and enabled near real-time analytics

1. 1. ETL Optimization for Batch Data

Objective: Build an ETL/ELT pipeline for async batch data from 300+ data collectors for Harvard University
Tools Used: Amazon Web Services (AWS), Google Cloud Platform (GCP), Python
Outcome:

Tutorials and Resources

[Building Scalable ETL Pipelines](https://github.com/andy/etl-pipelines)
[Optimizing SQL Queries for Data Warehousing](https://andy-blog.com/sql-tips)

Achievements

Google Cloud Professional Data Engineer (2024)
Published "Scaling Data Pipelines in the Cloud" on Medium

@@ Line 11: / Line 11: @@
 == Featured Projects ==
 ### ETL Optimization for Streaming Data
-* **Objective**: Built an ETL pipeline for real-time data ingestion.
+* '''Objective''': Built an ETL pipeline for real-time data ingestion
-* **Tools Used**: Apache Kafka, Airflow, and BigQuery.
+* '''Tools Used''': Apache Kafka, Airflow, and BigQuery
-* **Outcome**: Reduced processing time by 40% and enabled near real-time analytics.
+* '''Outcome''': Reduced processing time by 40% and enabled near real-time analytics
+### ETL Optimization for Batch Data
+* '''Objective''': Build an ETL/ELT pipeline for async batch data from 300+ data collectors for Harvard University
+* '''Tools Used''': Amazon Web Services (AWS), Google Cloud Platform (GCP), Python
+* '''Outcome''':
 == Tutorials and Resources ==