Skip to main content
portfolio@anil-kumar-mandava
visitor@portfolio:~$whoami

Versatile Data Engineer with over 8 years of experience building scalable data infrastructure and unified GTM data foundations. Expert in developing robust ELT/ETL pipelines and identity resolution logic across CRM, marketing automation, and cloud warehouses. Proven ability to implement Retrieval-Augmented Generation (RAG) and Large Language Model (LLM) workflows for production-level revenue intelligence applications. Technically proficient in Python, SQL, and distributed systems within AWS and Google Cloud Platform (GCP) environments. Committed to delivering high-quality data assets and predictive models that drive revenue activation and business growth.

visitor@portfolio:~$> cat README.md

Versatile Data Engineer with over 8 years of experience building scalable data infrastructure and unified GTM data foundations. Expert in developing robust ELT/ETL pipelines and identity resolution logic across CRM, marketing automation, and cloud warehouses. Proven ability to implement Retrieval-Augmented Generation (RAG) and Large Language Model (LLM) workflows for production-level revenue intelligence applications. Technically proficient in Python, SQL, and distributed systems within AWS and Google Cloud Platform (GCP) environments. Committed to delivering high-quality data assets and predictive models that drive revenue activation and business growth.

visitor@portfolio:~$> experience --list
Data Engineer@Walmart
Dec 2023 - Present[Sunnyvale, CA]
  • Developed Go-to-Market (GTM) data foundations on Google Cloud Platform (GCP) to unify CRM and product telemetry data.
  • Built Retrieval-Augmented Generation (RAG) pipelines utilizing vector databases to automate identity resolution across disparate datasets.
  • Implemented Machine Learning (ML) evaluation workflows on GCP to score pipeline likelihood and customer expansion propensity.
  • Constructed distributed ELT pipelines to migrate massive GTM datasets to BigQuery, establishing canonical schemas for revenue activation.
  • Optimized data processing logic through partitioning and caching strategies to maximize cost efficiency for advertising measurements.
  • Collaborated with cross-functional teams to implement automated observability dashboards for monitoring GTM data quality and lineage.
Data Engineer@Nike
Jan 2023 - Dec 2023[Portland, OR]
  • Designed AWS-based data ingestion services for Salesforce and marketing automation signals using Lambda and Kinesis.
  • Integrated Generative AI (GenAI) and Large Language Model (LLM) orchestration to automate metadata extraction for prospect intelligence.
  • Built event-driven architectures to trigger real-time revenue signals and signals activation using Amazon Simple Notification Service (SNS).
  • Implemented Snowflake data models to support ABM prioritization and ICP scoring for high-velocity telemetry data.
  • Developed dbt models with comprehensive data quality tests to ensure governance across unified AWS-based revenue data layers.
  • Optimized Spark performance using predicate pushdown and broadcast joins to reduce infrastructure costs for revenue modeling.
Data Engineer@Infosys
Apr 2020 - May 2022[Bangalore, India]
  • Developed real-time ingestion pipelines on AWS for CRM and billing data using Apache NiFi with fault-tolerant error handling.
  • Improved ELT efficiency by migrating legacy workflows to optimized Apache Spark applications within AWS EMR environments.
  • Utilized Databricks to productionize segmentation and scoring models for pipeline predictability and trial health.
  • Built Java-based automation tools to monitor job lineage and trigger workflows via RESTful APIs for GTM system integration.
  • Integrated Redshift and Snowflake for high-performance warehousing to support enterprise-wide revenue intelligence requirements.
  • Refactored distributed codebases to align with modern architectural standards and ensured high-quality delivery through unit testing.
Big Data Developer@Lince Soft Solutions
Feb 2018 - Mar 2020[India]
  • Maintained Hadoop cluster technical operations to support large-scale processing of prospect intelligence and ecosystem signals.
  • Developed custom MapReduce programs in Java to cleanse and deduplicate millions of CRM records for improved targeting.
  • Optimized Hive queries using vectorized execution and columnar storage formats like ORC to accelerate GTM program reporting.
  • Built data quality frameworks to validate record counts and schema conformance across billing and enrichment datasets.
  • Implemented data aggregation agents to centralize telemetry logs from multiple sources into a unified system for observability.
  • Executed troubleshooting in complex UNIX/Linux environments to ensure high availability of data processing for revenue workflows.
Software Engineer@MicroSpark Software Solutions
Feb 2016 - Feb 2018[India]
  • Developed backend components for enterprise data management applications using Spring MVC and Hibernate ORM.
  • Built multithreading batch processors in Java to facilitate high-concurrency loading of large sales datasets into SQL databases.
  • Designed RESTful APIs following OpenAPI standards to support secure integration between GTM systems and external partners.
  • Implemented Data Access Object (DAO) layers and optimized connection pooling to increase efficiency of database interactions.
  • Wrote complex SQL queries and PL/SQL packages for high-performance reporting and transformation of raw sales logs.
  • Enhanced core application performance by optimizing data structures and algorithms in the backend revenue processing layer.
visitor@portfolio:~$> npm install --save-dev
+Programming: Python
+Java
+Scala
+SQL
+Go
+TypeScript
+Shell Scripting
+Cloud & Platforms: Google Cloud Platform (GCP)
+Amazon Web Services (AWS)
+Snowflake
+Amazon Redshift
+BigQuery
+Data Engineering: ELT/ETL Pipelines
+Apache Spark
+Databricks
+Apache Airflow
+dbt
+Apache Kafka
+Apache NiFi
+AI & Machine Learning: Large Language Models (LLM)
+Retrieval-Augmented Generation (RAG)
+MLflow
+Vector Databases
+Statistical Modeling
+Databases & Storage: MongoDB
+Cassandra
+Postgres
+pgvector
+Amazon S3
+Identity Resolution
+DevOps & Tools: Docker
+Kubernetes
+Terraform
+Pulumi
+Jenkins
+Git
+Data Governance
visitor@portfolio:~$> cat /etc/education
Master’s in Computer Science with AI and ML
University of Central Missouri
| MO-US
Bachelor’s in CSE
UshaRama College of Engineering and Technology, JNTUK
| AP-India
visitor@portfolio:~$> cat .contact
visitor@portfolio:~$echo "Thanks for visiting!"
    Anil Kumar Mandava — Data Engineer | grad.jobs