Toggle menu
Toggle preferences menu
Toggle personal menu
Not logged in
Your IP address will be publicly visible if you make any edits.

Data Engineering

From Andy’s Data Science Wiki
Revision as of 18:54, 25 November 2024 by Admin (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Overview

Data Engineering focuses on designing, building, and optimizing systems for data collection, storage, and processing. I specialize in creating scalable ETL pipelines, managing data warehouses, and enabling robust analytics.

Skills and Tools

  • **ETL Pipelines**: Airflow, Luigi
  • **Data Warehousing**: BigQuery, Snowflake
  • **Programming**: Python, SQL, Spark
  • **Cloud Platforms**: AWS, GCP
  • **Database Management**: PostgreSQL, MySQL

Featured Projects

      1. ETL Optimization for Streaming Data
  • Objective: Built an ETL pipeline for real-time data ingestion
  • Tools Used: Apache Kafka, Airflow, and BigQuery
  • Outcome: Reduced processing time by 40% and enabled near real-time analytics
      1. ETL Optimization for Batch Data
  • Objective: Build an ETL/ELT pipeline for async batch data from 300+ data collectors for Harvard University
  • Tools Used: Amazon Web Services (AWS), Google Cloud Platform (GCP), Python
  • Outcome:

Tutorials and Resources

Achievements

  • Google Cloud Professional Data Engineer (2024)
  • Published "Scaling Data Pipelines in the Cloud" on Medium