Swaroop Morampudi's Portfolio

visitor@portfolio:~$whoami

Results-oriented Data Engineer with over 4 years of experience in designing and implementing data pipelines and ETL workflows across Azure and AWS environments. Developed robust solutions for data integration and transformation using T-SQL, Python, and PySpark to support enterprise-scale data warehousing. Contributed to high-quality data processing within complex environments, focusing on data mapping, validation, and performance tuning of SQL queries. Skilled in collaborating with cross-functional teams to troubleshoot database issues and ensure data accuracy in production environments. Focused on delivering scalable data solutions while adhering to coding best practices and scheduled deadlines within healthcare and financial domains.

New Jersey, USA

visitor@portfolio:~$> cat README.md

visitor@portfolio:~$> experience --list

Azure Data Engineer@Homebridge Financial Services

Sep 2023 - Present[Iselin, New Jersey]

Developed multi-threaded Java ingestion jobs and Sqoop scripts to migrate data from FTP servers and Oracle to big data platforms.
Designed and maintained scalable ETL workflows using Azure Data Factory, PySpark, and DBT to automate data processing from raw sources to Snowflake.
Created Databricks workflows to extract data from SQL Server and securely transfer it to SFTP, optimizing transformation performance for healthcare-related datasets.
Optimized Databricks jobs using caching, partitioning, and broadcast joins, reducing execution time and improving query performance.
Developed Snowflake pipelines leveraging SnowSQL scripts and SnowPipe for automated ingestion and transformation of incremental datasets.
Implemented data governance policies including access control and audit logging within Azure Databricks to ensure compliance with standards.

AWS Data Engineer@Catalent

Dec 2022 - Aug 2023[Somerset, New Jersey]

Integrated AWS DynamoDB with AWS Lambda to store item values and managed real-time backups via DynamoDB Streams for data integrity.
Designed and implemented ETL workflows using Talend, adhering to best practices for structured and semi-structured data pipelines.
Developed and deployed Databricks ETL pipelines with Spark SQL and Python to transform data for downstream data warehouse consumption.
Built Spark Streaming applications to process data in mini-batches, performing real-time transformations to drive streaming analytics.
Used Kafka for distributed messaging, managing partitioned feeds and real-time event data for large-scale data aggregation.
Developed scalable analytics components using Scala and Spark, implementing MapReduce jobs for complex data preprocessing and standardization.

Application Developer | Data Engineer@Deutsche Bank

Nov 2020 - Jul 2022[Mumbai, India]

Designed and developed scalable data ingestion pipelines using Azure Data Factory and Spark SQL on Azure HDInsight for structured and unstructured data.
Built a custom ELT logging framework in ADF to enhance monitoring and debugging of pipeline executions and identify root cause issues.
Developed Spark Streaming applications to process real-time Kafka messages and write transformed streams into HBase for low-latency analytics.
Leveraged Databricks and Spark SQL for data extraction, transformation, and aggregation to support regulatory and compliance reporting.
Automated CI/CD pipelines using Jenkins, Git, and Terraform to support cross-platform data engineering tasks and consistent code deployment.
Engineered data workflows for banking products, integrating transaction data while implementing T-SQL triggers and exception handling for data accuracy.

Data Engineer@Jio

May 2020 - Oct 2020[Mumbai, India]

Designed and developed robust ETL pipelines using Azure Data Factory to ingest data from log files and business apps for warehouse loading.
Built a reusable ETL framework to automate data migration from RDBMS systems to the Data Lake using Spark Data Sources and Hive.
Developed and scheduled Airflow DAGs for ETL batch processing, enabling reliable data loading into Snowflake for enterprise analytics.
Integrated Azure Logic Apps with ADF pipelines and HTTP triggers to automate batch workflows and improve process efficiencies.
Engineered Spark Streaming jobs to consume and format real-time packet data from Kafka topics into JSON for downstream use.
Created multiple Databricks Spark jobs with PySpark and Spark SQL to support complex table-to-table transformations and data profiling.

visitor@portfolio:~$> npm install --save-dev

+Programming & Scripting: Python (pandas

+NumPy)

+SQL (T-SQL

+PL/SQL)

+Scala

+Java

+Shell Scripting

+Data Warehousing & ETL: Azure Data Factory

+SSIS

+Informatica

+Snowflake

+T-SQL

+Star Schema

+OLTP/OLAP

+Cloud Platforms: Azure (Synapse

+Data Lake)

+AWS (S3

+Redshift

+Lambda)

+GCP (BigQuery)

+Big Data & Analytics: Databricks

+PySpark

+Spark SQL

+Hive

+Kafka

+Hadoop Ecosystem

+Databases: Microsoft SQL Server

+Oracle

+MySQL

+MongoDB

+Cassandra

+DynamoDB

+Tools & Methodologies: Jira

+Git

+Jenkins

+Docker

+Agile (Scrum)

+SDLC

+Microsoft Office Suite

visitor@portfolio:~$> cat /etc/education

Master of Science in Data Science

Saint Peter's University

Sep 2022 - Mar 2024 | New Jersey, USA

visitor@portfolio:~$> cat .contact

[email protected]

+1 (551) 344-2212

visitor@portfolio:~$echo "Thanks for visiting!"