Results-oriented Data Engineer with over 4 years of experience in designing and implementing data pipelines and ETL workflows across Azure and AWS environments. Developed robust solutions for data integration and transformation using T-SQL, Python, and PySpark to support enterprise-scale data warehousing. Contributed to high-quality data processing within complex environments, focusing on data mapping, validation, and performance tuning of SQL queries. Skilled in collaborating with cross-functional teams to troubleshoot database issues and ensure data accuracy in production environments. Focused on delivering scalable data solutions while adhering to coding best practices and scheduled deadlines within healthcare and financial domains.
Results-oriented Data Engineer with over 4 years of experience in designing and implementing data pipelines and ETL workflows across Azure and AWS environments. Developed robust solutions for data integration and transformation using T-SQL, Python, and PySpark to support enterprise-scale data warehousing. Contributed to high-quality data processing within complex environments, focusing on data mapping, validation, and performance tuning of SQL queries. Skilled in collaborating with cross-functional teams to troubleshoot database issues and ensure data accuracy in production environments. Focused on delivering scalable data solutions while adhering to coding best practices and scheduled deadlines within healthcare and financial domains.
- Developed multi-threaded Java ingestion jobs and Sqoop scripts to migrate data from FTP servers and Oracle to big data platforms.
- Designed and maintained scalable ETL workflows using Azure Data Factory, PySpark, and DBT to automate data processing from raw sources to Snowflake.
- Created Databricks workflows to extract data from SQL Server and securely transfer it to SFTP, optimizing transformation performance for healthcare-related datasets.
- Optimized Databricks jobs using caching, partitioning, and broadcast joins, reducing execution time and improving query performance.
- Developed Snowflake pipelines leveraging SnowSQL scripts and SnowPipe for automated ingestion and transformation of incremental datasets.
- Implemented data governance policies including access control and audit logging within Azure Databricks to ensure compliance with standards.
- Integrated AWS DynamoDB with AWS Lambda to store item values and managed real-time backups via DynamoDB Streams for data integrity.
- Designed and implemented ETL workflows using Talend, adhering to best practices for structured and semi-structured data pipelines.
- Developed and deployed Databricks ETL pipelines with Spark SQL and Python to transform data for downstream data warehouse consumption.
- Built Spark Streaming applications to process data in mini-batches, performing real-time transformations to drive streaming analytics.
- Used Kafka for distributed messaging, managing partitioned feeds and real-time event data for large-scale data aggregation.
- Developed scalable analytics components using Scala and Spark, implementing MapReduce jobs for complex data preprocessing and standardization.
- Designed and developed scalable data ingestion pipelines using Azure Data Factory and Spark SQL on Azure HDInsight for structured and unstructured data.
- Built a custom ELT logging framework in ADF to enhance monitoring and debugging of pipeline executions and identify root cause issues.
- Developed Spark Streaming applications to process real-time Kafka messages and write transformed streams into HBase for low-latency analytics.
- Leveraged Databricks and Spark SQL for data extraction, transformation, and aggregation to support regulatory and compliance reporting.
- Automated CI/CD pipelines using Jenkins, Git, and Terraform to support cross-platform data engineering tasks and consistent code deployment.
- Engineered data workflows for banking products, integrating transaction data while implementing T-SQL triggers and exception handling for data accuracy.
- Designed and developed robust ETL pipelines using Azure Data Factory to ingest data from log files and business apps for warehouse loading.
- Built a reusable ETL framework to automate data migration from RDBMS systems to the Data Lake using Spark Data Sources and Hive.
- Developed and scheduled Airflow DAGs for ETL batch processing, enabling reliable data loading into Snowflake for enterprise analytics.
- Integrated Azure Logic Apps with ADF pipelines and HTTP triggers to automate batch workflows and improve process efficiencies.
- Engineered Spark Streaming jobs to consume and format real-time packet data from Kafka topics into JSON for downstream use.
- Created multiple Databricks Spark jobs with PySpark and Spark SQL to support complex table-to-table transformations and data profiling.