vamsi krishna janjanam's Portfolio

visitor@portfolio:~$whoami

Senior Data Engineer with 6+ years of experience in cloud-native data engineering, specializing in AWS, PySpark, Snowflake, and Hadoop. Proven track record of building scalable ETL pipelines, reducing data processing time by 63%, and optimizing cloud data platforms for high-volume analytics. Strong expertise in data integration, automation, and distributed systems, with hands-on experience in Docker and modern data stacks.

visitor@portfolio:~$> cat README.md

visitor@portfolio:~$> experience --list

Data Engineer@Capital One

Sep 2024 - Present

Migrated enterprise data warehouse workloads from Snowflake to an internal AWS-based central data lake (S3 + Glue + EMR), reducing query latency by 45% and saving $200K annually.
Automated daily ETL pipelines processing 50M+ records using Python and AWS Glue, achieving 99% data accuracy and reducing manual intervention by 90%.
Implemented incremental loading strategies using AWS Glue and Spark, enabling near-zero downtime during migration and ensuring continuous access for end users.
Developed a monitoring system in Python to detect anomalies in data flow across AWS and Snowflake, leading to a 30% decrease in downtime and ensuring data pipeline reliability.

Cloud Data Engineer@IT Resources Inc

Jan 2023 - Sep 2024

Implemented an AWS-based data pipeline processing 20M+ payment records/day, achieving a 63% reduction in processing time through pipeline optimization.
Utilized PySpark Data Frames to process extensive payment data, achieving a 37% reduction in job execution time and a 56% decrease in resource consumption through optimized partitioning, caching, and broadcast variable utilization.
Designed and optimized Snowflake schemas (star/snowflake), leveraging micro-partitioning, clustering keys, and secure data sharing for analytics.
Increased data accessibility for data analysts by 30% by providing clear and consistent access to processed data in Redshift.

AWS Data Engineer@Capital One

Mar 2022 - Dec 2022

Integrated data sources into AWS Glue, optimized DynamoDB, automated S3 backups with Boto3, and prototyped CI/CD with Jenkins.
Designed comprehensive ETL pipelines utilizing FastAPI to streamline data flow between systems, resulting in reduced processing times of incoming datasets by an average of two hours daily without compromising integrity.
Leveraged Docker to expedite development processes, ensuring rapid iteration and seamless environment reproducibility.
Analyzed application performance metrics using Splunk, diagnosing 15 critical bottlenecks in real-time data processing; implemented system enhancements that increased application uptime to 99.9% and improved user satisfaction ratings.
Optimized data processing of 6 million data records per day, improving the application's performance by 1.5x speed to generate reports. Validated code using pytest, maintaining test coverage at 80%, ensuring robust data pipelines.

Data Analyst (Python / Data Engineering)@JPMC

Aug 2019 - Dec 2020

Implemented algorithms like String matching, Rule Engine, and N-gram generation to classify the Protection Group using the reference data & extract the data from various sources using Python ML libraries.
Used various data preprocessing techniques to enrich the accuracy of the dataset and remove outliers.
Developed a Rule Engine in Python to apply business rules to various statements and an N-gram match algorithm using NLTK to compare sentences.
Saved 120 man-hours by developing an API to automate classification using Python and a conventional Rule-based approach.
Achieved 81% accuracy by using an innovative rule engine and attained more than 15% accuracy compared with SME results.

SQL Developer@PepsiCo Client

Apr 2018 - Jul 2019

Upgraded SQL Server infrastructure, migrating 33 servers from Unix to Linux, ensuring a seamless transition and improved system performance by 33%.
Developed views for replication, procedures, triggers, and cron jobs for scheduling tasks, using SCP to transfer files and scripts between servers, resulting in streamlined operations and reduced manual intervention.
Optimized query performance by analyzing execution plans and implementing appropriate indexing strategies, leading to a 28% reduction in query execution time.

visitor@portfolio:~$> npm install --save-dev

+Programming: Python

+SQL

+Java

+Shell

+Big Data: Apache Spark (PySpark

+Spark SQL)

+Hadoop

+Databricks

+Cloud (AWS): S3

+Glue

+EMR

+Lambda

+Redshift

+DynamoDB

+Kinesis

+EC2

+RDS

+Step Functions

+Data Warehousing: Snowflake

+Redshift

+Databases: MySQL

+Oracle

+SQL Server

+MongoDB

+Data Engineering: ETL/ELT

+Data Integration

+Data Modeling

+Data Quality

+Data Governance

+Tools: Airflow

+Docker

+Git

+Jenkins

+Power BI

+Tableau

visitor@portfolio:~$> cat /etc/education

Master of Science in Computer Science

Wichita State University

Jan 2021 - Jan 2022

Bachelor of Technology in Information Technology

Acharya Nagarjuna University

Jan 2013 - Jan 2017

visitor@portfolio:~$> cat .contact

[email protected]

+1 (704) 345-1713

visitor@portfolio:~$echo "Thanks for visiting!"

visitor@portfolio:~$whoami

visitor@portfolio:~$> cat README.md

visitor@portfolio:~$> experience --list

Data Engineer@Capital One

Sep 2024 - Present

Migrated enterprise data warehouse workloads from Snowflake to an internal AWS-based central data lake (S3 + Glue + EMR), reducing query latency by 45% and saving $200K annually.
Automated daily ETL pipelines processing 50M+ records using Python and AWS Glue, achieving 99% data accuracy and reducing manual intervention by 90%.
Implemented incremental loading strategies using AWS Glue and Spark, enabling near-zero downtime during migration and ensuring continuous access for end users.
Developed a monitoring system in Python to detect anomalies in data flow across AWS and Snowflake, leading to a 30% decrease in downtime and ensuring data pipeline reliability.

Cloud Data Engineer@IT Resources Inc

Jan 2023 - Sep 2024

Implemented an AWS-based data pipeline processing 20M+ payment records/day, achieving a 63% reduction in processing time through pipeline optimization.
Utilized PySpark Data Frames to process extensive payment data, achieving a 37% reduction in job execution time and a 56% decrease in resource consumption through optimized partitioning, caching, and broadcast variable utilization.
Designed and optimized Snowflake schemas (star/snowflake), leveraging micro-partitioning, clustering keys, and secure data sharing for analytics.
Increased data accessibility for data analysts by 30% by providing clear and consistent access to processed data in Redshift.

AWS Data Engineer@Capital One

Mar 2022 - Dec 2022

Integrated data sources into AWS Glue, optimized DynamoDB, automated S3 backups with Boto3, and prototyped CI/CD with Jenkins.
Designed comprehensive ETL pipelines utilizing FastAPI to streamline data flow between systems, resulting in reduced processing times of incoming datasets by an average of two hours daily without compromising integrity.
Leveraged Docker to expedite development processes, ensuring rapid iteration and seamless environment reproducibility.
Analyzed application performance metrics using Splunk, diagnosing 15 critical bottlenecks in real-time data processing; implemented system enhancements that increased application uptime to 99.9% and improved user satisfaction ratings.
Optimized data processing of 6 million data records per day, improving the application's performance by 1.5x speed to generate reports. Validated code using pytest, maintaining test coverage at 80%, ensuring robust data pipelines.

Data Analyst (Python / Data Engineering)@JPMC

Aug 2019 - Dec 2020

Implemented algorithms like String matching, Rule Engine, and N-gram generation to classify the Protection Group using the reference data & extract the data from various sources using Python ML libraries.
Used various data preprocessing techniques to enrich the accuracy of the dataset and remove outliers.
Developed a Rule Engine in Python to apply business rules to various statements and an N-gram match algorithm using NLTK to compare sentences.
Saved 120 man-hours by developing an API to automate classification using Python and a conventional Rule-based approach.
Achieved 81% accuracy by using an innovative rule engine and attained more than 15% accuracy compared with SME results.

SQL Developer@PepsiCo Client

Apr 2018 - Jul 2019

Upgraded SQL Server infrastructure, migrating 33 servers from Unix to Linux, ensuring a seamless transition and improved system performance by 33%.
Developed views for replication, procedures, triggers, and cron jobs for scheduling tasks, using SCP to transfer files and scripts between servers, resulting in streamlined operations and reduced manual intervention.
Optimized query performance by analyzing execution plans and implementing appropriate indexing strategies, leading to a 28% reduction in query execution time.

visitor@portfolio:~$> npm install --save-dev

+Programming: Python

+SQL

+Java

+Shell

+Big Data: Apache Spark (PySpark

+Spark SQL)

+Hadoop

+Databricks

+Cloud (AWS): S3

+Glue

+EMR

+Lambda

+Redshift

+DynamoDB

+Kinesis

+EC2

+RDS

+Step Functions

+Data Warehousing: Snowflake

+Redshift

+Databases: MySQL

+Oracle

+SQL Server

+MongoDB

+Data Engineering: ETL/ELT

+Data Integration

+Data Modeling

+Data Quality

+Data Governance

+Tools: Airflow

+Docker

+Git

+Jenkins

+Power BI

+Tableau

visitor@portfolio:~$> cat /etc/education

Master of Science in Computer Science

Wichita State University

Jan 2021 - Jan 2022

Bachelor of Technology in Information Technology

Acharya Nagarjuna University

Jan 2013 - Jan 2017

visitor@portfolio:~$> cat .contact

[email protected]

+1 (704) 345-1713

visitor@portfolio:~$echo "Thanks for visiting!"