Skip to main content
portfolio@nikhil
visitor@portfolio:~$whoami

Results-driven Data Engineer with around 5 years of experience designing, building, and optimizing large-scale data pipelines, cloud architectures, and analytics platforms for banking, fintech, e-commerce, logistics, and consulting domains. Proven expertise in ETL development, big data processing (Hadoop, Spark, Hive), cloud data engineering (AWS, GCP, Azure), and data warehousing (Snowflake, Amazon Redshift, Google BigQuery). Adept at SQL optimization, dimensional modeling, and data quality governance, delivering measurable gains in processing speed, cost efficiency, and SLA compliance. Skilled in automation with Apache Airflow, AWS Glue, and CI/CD pipelines, enabling near-real-time analytics for datasets exceeding billions of records daily. Strong track record of collaborating with cross-functional teams to deliver business-critical insights, reduce operational costs, and support enterprise-scale decision-making.

visitor@portfolio:~$> cat README.md

Results-driven Data Engineer with around 5 years of experience designing, building, and optimizing large-scale data pipelines, cloud architectures, and analytics platforms for banking, fintech, e-commerce, logistics, and consulting domains. Proven expertise in ETL development, big data processing (Hadoop, Spark, Hive), cloud data engineering (AWS, GCP, Azure), and data warehousing (Snowflake, Amazon Redshift, Google BigQuery). Adept at SQL optimization, dimensional modeling, and data quality governance, delivering measurable gains in processing speed, cost efficiency, and SLA compliance. Skilled in automation with Apache Airflow, AWS Glue, and CI/CD pipelines, enabling near-real-time analytics for datasets exceeding billions of records daily. Strong track record of collaborating with cross-functional teams to deliver business-critical insights, reduce operational costs, and support enterprise-scale decision-making.

visitor@portfolio:~$> experience --list
Data Engineer@JP Morgan Chase & Co.
Feb 2025 - Current[USA]
  • Engineered a high-performance PySpark ETL workflow on Hadoop to process 2.5 billion+ daily transaction records, reducing data processing time by 43% and enabling near-real-time fraud detection for retail banking operations.
  • Optimized complex SQL queries in Amazon Redshift for investment portfolio analytics, cutting query runtime from 12 minutes to under 3 minutes and improving reporting SLA compliance from 93% to 99.5%.
  • Architected scalable AWS Glue pipelines to integrate 12 disparate treasury data sources into a centralized data lake, increasing available curated datasets from 25 to 40 for downstream analytics teams.
  • Developed robust Snowflake dimensional models for wealth management KPIs, improving metric consistency across 5 business units and reducing manual reconciliation effort by 18 hours per week.
  • Deployed automated data quality checks in Apache Airflow for commercial banking loan datasets, preventing ingestion of over 4 million erroneous records annually before they reached BI dashboards.
  • Created interactive Power BI dashboards for cross-asset risk exposure, enabling traders and risk officers to monitor 20 million+ daily trade records and reduce risk assessment cycles from 48 hours to under 12 hours.
Data Engineer@Uber
Mar 2024 - Feb 2025[USA]
  • Engineered a high-throughput data ingestion framework using PySpark on Hadoop to process 3.1 TB/day of trip, payment, and driver telemetry data, enabling near-real-time analytics and cutting daily batch runtime by 6.5 hours.
  • Optimized complex SQL queries and partitioning strategies in Amazon Redshift for marketplace demand forecasting models, reducing average query execution time from 18 minutes to under 7 minutes and lowering compute costs by $79,000 annually.
  • Automated 215 daily ETL workflows with Apache Airflow for Uber Eats order, menu, and delivery datasets, achieving 99.8% ontime execution and freeing up 780+ engineering hours per year previously spent on manual job reruns.
  • Migrated 43 TB of historical ride-sharing data from on-premise PostgreSQL to Google BigQuery, enabling analysts to run ad-hoc revenue and demand queries in under 30 seconds versus 2–3 minutes previously.
  • Implemented robust data quality checks using AWS Glue with schema validation and anomaly detection, blocking ingestion of ~1.26 million corrupt or incomplete records each month and safeguarding compliance-critical datasets.
  • Developed dimensional models (Star/Snowflake schema) for Tableau dashboards tracking driver incentives across 150+ U.S. cities, unlocking insights that guided $12.4M in optimized incentive spending over a fiscal year.
  • Deployed infrastructure-as-code with Terraform for provisioning scalable AWS EMR clusters, cutting environment setup time from 3 days to 3.5 hours and supporting surge loads of up to 95,000 concurrent trip events without downtime.
Data Engineer@Capgemini
Jun 2021 - Aug 2022[India]
  • Engineered a scalable ETL framework using Apache Airflow, automating ingestion from 12+ disparate sources for a global retail client, cutting manual workload by 320 hours per month and enabling near real-time reporting for 45 business units.
  • Optimized complex analytical queries in Hive by applying partitioning and bucketing strategies, reducing average execution time from 18 minutes to under 4 minutes across 75 million records, which accelerated churn analysis cycles by 3 days.
  • Deployed a secure, serverless data processing solution on AWS Glue for a financial services project, processing over 2 TB of daily transactions while maintaining 99.99% job success rate and meeting PCI-DSS compliance audits.
  • Modeled enterprise data using Snowflake’s star schema for a healthcare provider, supporting analytics on 28 million patient records and improving dashboard load times from 15 seconds to under 5 seconds.
  • Automated CI/CD pipelines with Jenkins, enabling continuous deployment of PySpark jobs to EMR clusters, reducing release cycles from 14 days to 1 day and saving an estimated ₹18 lakh annually in operational costs.
  • Integrated structured and semi-structured IoT data into Cassandra, enabling real-time analytics for 1.2 million sensor readings per hour and increasing predictive maintenance accuracy by 17% for manufacturing equipment.
  • Delivered executive dashboards in Power BI for a logistics company, consolidating KPIs from 32 warehouses and cutting monthly reporting effort from 5 days to 4 hours, directly improving decision turnaround for supply chain planning.
Data Engineer@KPMG
Jul 2019 - May 2021[India]
  • Engineered a scalable data ingestion framework in Python, consolidating structured and semi-structured feeds from 12+ client systems, reducing onboarding time for new datasets from 10 days to 6 days.
  • Optimized ETL workflows on Apache Spark (PySpark) to process 1.8 billion audit records monthly, cutting average runtime from 7 hours to 2.5 hours and saving 420 compute hours per month.
  • Orchestrated cloud-based data pipelines on AWS Glue for risk analytics, enabling near real-time updates of 2 TB/day of regulatory data across 3 global audit teams.
  • Deployed a centralized warehouse in Snowflake, storing and serving 15 TB of cross-domain financial data with query performance improved from 90 seconds to 18 seconds.
  • Designed optimized analytical queries in PostgreSQL, reducing report generation time from 12 minutes to under 5 minutes for compliance audits impacting 50+ government departments.
  • Automated workflow scheduling in Apache Airflow, integrating 54 dependent jobs, achieving 99.9% pipeline uptime over 12 consecutive months.
  • Implemented a CI/CD process with Jenkins, delivering 180+ production deployments annually with zero downtime and reducing release turnaround from 7 days to under 24 hours.
  • Modeled a dimensional data structure using Star Schema, ensuring validated lineage for 20+ KPIs used in quarterly regulatory filings worth ₹8,500 crore in reported assets.
  • Delivered interactive risk dashboards in Tableau, consolidating 5 disparate data sources and helping audit managers cut decisionmaking time from 2 days to less than 8 hours.
visitor@portfolio:~$> npm install --save-dev
+Programming & Scripting: Python
+SQL
+Scala
+Shell Scripting
+Big Data Technologies: Hadoop
+Spark (PySpark/Scala)
+Hive
+Pig
+HDFS
+Cloud Platforms: AWS (S3
+EMR
+Lambda
+Redshift
+Glue)
+Azure Data Factory
+GCP (BigQuery
+Cloud Storage)
+ETL & Data Pipelines: Apache Airflow
+Talend
+Informatica
+NiFi
+AWS Glue
+Data Warehousing: Amazon Redshift
+Snowflake
+Google BigQuery
+Teradata
+Databases: PostgreSQL
+MySQL
+Oracle
+NoSQL (MongoDB
+Cassandra)
+DevOps & CI/CD: Git
+Jenkins
+Docker
+Terraform
+Data Modeling & Governance: Dimensional Modeling
+Star/Snowflake Schema
+Data Lineage
+Data Quality
+Analytics & BI Tools: Tableau
+Power BI
+Looker
+Superset
+Version Control & Collaboration: GitHub
+Bitbucket
+Jira
+Confluence
visitor@portfolio:~$> cat /etc/education
Master of Science in Information Technology and Management
Webster University
Aug 2022 - May 2024 | Missouri, USA
Bachelor of Technology in Computer Science
Jawaharlal Nehru Technological University
Jul 2015 - Dec 2019 | Hyderabad, India
visitor@portfolio:~$> cat .contact
visitor@portfolio:~$echo "Thanks for visiting!"