Skip to main content
portfolio@vamsi-krishna-janjanam
visitor@portfolio:~$whoami

Senior Data Engineer with 6+ years of experience in cloud-native data engineering, specializing in AWS, PySpark, Snowflake, and Hadoop. Proven track record of building scalable ETL pipelines, reducing data processing time by 63%, and optimizing cloud data platforms for high-volume analytics. Strong expertise in data integration, automation, and distributed systems, with hands-on experience in Docker and modern data stacks.

visitor@portfolio:~$> cat README.md

Senior Data Engineer with 6+ years of experience in cloud-native data engineering, specializing in AWS, PySpark, Snowflake, and Hadoop. Proven track record of building scalable ETL pipelines, reducing data processing time by 63%, and optimizing cloud data platforms for high-volume analytics. Strong expertise in data integration, automation, and distributed systems, with hands-on experience in Docker and modern data stacks.

visitor@portfolio:~$> experience --list
Data Engineer@Capital One
Sep 2024 - Present
  • Migrated enterprise data warehouse workloads from Snowflake to an internal AWS-based central data lake (S3 + Glue + EMR), reducing query latency by 45% and saving $200K annually.
  • Automated daily ETL pipelines processing 50M+ records using Python and AWS Glue, achieving 99% data accuracy and reducing manual intervention by 90%.
  • Implemented incremental loading strategies using AWS Glue and Spark, enabling near-zero downtime during migration and ensuring continuous access for end users.
  • Developed a monitoring system in Python to detect anomalies in data flow across AWS and Snowflake, leading to a 30% decrease in downtime and ensuring data pipeline reliability.
Cloud Data Engineer@IT Resources Inc
Jan 2023 - Sep 2024
  • Implemented an AWS-based data pipeline processing 20M+ payment records/day, achieving a 63% reduction in processing time through pipeline optimization.
  • Utilized PySpark Data Frames to process extensive payment data, achieving a 37% reduction in job execution time and a 56% decrease in resource consumption through optimized partitioning, caching, and broadcast variable utilization.
  • Designed and optimized Snowflake schemas (star/snowflake), leveraging micro-partitioning, clustering keys, and secure data sharing for analytics.
  • Increased data accessibility for data analysts by 30% by providing clear and consistent access to processed data in Redshift.
AWS Data Engineer@Capital One
Mar 2022 - Dec 2022
  • Integrated data sources into AWS Glue, optimized DynamoDB, automated S3 backups with Boto3, and prototyped CI/CD with Jenkins.
  • Designed comprehensive ETL pipelines utilizing FastAPI to streamline data flow between systems, resulting in reduced processing times of incoming datasets by an average of two hours daily without compromising integrity.
  • Leveraged Docker to expedite development processes, ensuring rapid iteration and seamless environment reproducibility.
  • Analyzed application performance metrics using Splunk, diagnosing 15 critical bottlenecks in real-time data processing; implemented system enhancements that increased application uptime to 99.9% and improved user satisfaction ratings.
  • Optimized data processing of 6 million data records per day, improving the application's performance by 1.5x speed to generate reports. Validated code using pytest, maintaining test coverage at 80%, ensuring robust data pipelines.
Data Analyst (Python / Data Engineering)@JPMC
Aug 2019 - Dec 2020
  • Implemented algorithms like String matching, Rule Engine, and N-gram generation to classify the Protection Group using the reference data & extract the data from various sources using Python ML libraries.
  • Used various data preprocessing techniques to enrich the accuracy of the dataset and remove outliers.
  • Developed a Rule Engine in Python to apply business rules to various statements and an N-gram match algorithm using NLTK to compare sentences.
  • Saved 120 man-hours by developing an API to automate classification using Python and a conventional Rule-based approach.
  • Achieved 81% accuracy by using an innovative rule engine and attained more than 15% accuracy compared with SME results.
SQL Developer@PepsiCo Client
Apr 2018 - Jul 2019
  • Upgraded SQL Server infrastructure, migrating 33 servers from Unix to Linux, ensuring a seamless transition and improved system performance by 33%.
  • Developed views for replication, procedures, triggers, and cron jobs for scheduling tasks, using SCP to transfer files and scripts between servers, resulting in streamlined operations and reduced manual intervention.
  • Optimized query performance by analyzing execution plans and implementing appropriate indexing strategies, leading to a 28% reduction in query execution time.
visitor@portfolio:~$> npm install --save-dev
+Programming: Python
+SQL
+Java
+Shell
+Big Data: Apache Spark (PySpark
+Spark SQL)
+Hadoop
+Databricks
+Cloud (AWS): S3
+Glue
+EMR
+Lambda
+Redshift
+DynamoDB
+Kinesis
+EC2
+RDS
+Step Functions
+Data Warehousing: Snowflake
+Redshift
+Databases: MySQL
+Oracle
+SQL Server
+MongoDB
+Data Engineering: ETL/ELT
+Data Integration
+Data Modeling
+Data Quality
+Data Governance
+Tools: Airflow
+Docker
+Git
+Jenkins
+Power BI
+Tableau
visitor@portfolio:~$> cat /etc/education
Master of Science in Computer Science
Wichita State University
Jan 2021 - Jan 2022
Bachelor of Technology in Information Technology
Acharya Nagarjuna University
Jan 2013 - Jan 2017
visitor@portfolio:~$> cat .contact
visitor@portfolio:~$echo "Thanks for visiting!"