Anil Kumar Mandava — Data Engineer

visitor@portfolio:~$whoami

Versatile Data Engineer with over 8 years of experience building scalable data infrastructure and unified GTM data foundations. Expert in developing robust ELT/ETL pipelines and identity resolution logic across CRM, marketing automation, and cloud warehouses. Proven ability to implement Retrieval-Augmented Generation (RAG) and Large Language Model (LLM) workflows for production-level revenue intelligence applications. Technically proficient in Python, SQL, and distributed systems within AWS and Google Cloud Platform (GCP) environments. Committed to delivering high-quality data assets and predictive models that drive revenue activation and business growth.

visitor@portfolio:~$> cat README.md

visitor@portfolio:~$> experience --list

Data Engineer@Walmart

Dec 2023 - Present[Sunnyvale, CA]

Developed Go-to-Market (GTM) data foundations on Google Cloud Platform (GCP) to unify CRM and product telemetry data.
Built Retrieval-Augmented Generation (RAG) pipelines utilizing vector databases to automate identity resolution across disparate datasets.
Implemented Machine Learning (ML) evaluation workflows on GCP to score pipeline likelihood and customer expansion propensity.
Constructed distributed ELT pipelines to migrate massive GTM datasets to BigQuery, establishing canonical schemas for revenue activation.
Optimized data processing logic through partitioning and caching strategies to maximize cost efficiency for advertising measurements.
Collaborated with cross-functional teams to implement automated observability dashboards for monitoring GTM data quality and lineage.

Data Engineer@Nike

Jan 2023 - Dec 2023[Portland, OR]

Designed AWS-based data ingestion services for Salesforce and marketing automation signals using Lambda and Kinesis.
Integrated Generative AI (GenAI) and Large Language Model (LLM) orchestration to automate metadata extraction for prospect intelligence.
Built event-driven architectures to trigger real-time revenue signals and signals activation using Amazon Simple Notification Service (SNS).
Implemented Snowflake data models to support ABM prioritization and ICP scoring for high-velocity telemetry data.
Developed dbt models with comprehensive data quality tests to ensure governance across unified AWS-based revenue data layers.
Optimized Spark performance using predicate pushdown and broadcast joins to reduce infrastructure costs for revenue modeling.

Data Engineer@Infosys

Apr 2020 - May 2022[Bangalore, India]

Developed real-time ingestion pipelines on AWS for CRM and billing data using Apache NiFi with fault-tolerant error handling.
Improved ELT efficiency by migrating legacy workflows to optimized Apache Spark applications within AWS EMR environments.
Utilized Databricks to productionize segmentation and scoring models for pipeline predictability and trial health.
Built Java-based automation tools to monitor job lineage and trigger workflows via RESTful APIs for GTM system integration.
Integrated Redshift and Snowflake for high-performance warehousing to support enterprise-wide revenue intelligence requirements.
Refactored distributed codebases to align with modern architectural standards and ensured high-quality delivery through unit testing.

Big Data Developer@Lince Soft Solutions

Feb 2018 - Mar 2020[India]

Maintained Hadoop cluster technical operations to support large-scale processing of prospect intelligence and ecosystem signals.
Developed custom MapReduce programs in Java to cleanse and deduplicate millions of CRM records for improved targeting.
Optimized Hive queries using vectorized execution and columnar storage formats like ORC to accelerate GTM program reporting.
Built data quality frameworks to validate record counts and schema conformance across billing and enrichment datasets.
Implemented data aggregation agents to centralize telemetry logs from multiple sources into a unified system for observability.
Executed troubleshooting in complex UNIX/Linux environments to ensure high availability of data processing for revenue workflows.

Software Engineer@MicroSpark Software Solutions

Feb 2016 - Feb 2018[India]

Developed backend components for enterprise data management applications using Spring MVC and Hibernate ORM.
Built multithreading batch processors in Java to facilitate high-concurrency loading of large sales datasets into SQL databases.
Designed RESTful APIs following OpenAPI standards to support secure integration between GTM systems and external partners.
Implemented Data Access Object (DAO) layers and optimized connection pooling to increase efficiency of database interactions.
Wrote complex SQL queries and PL/SQL packages for high-performance reporting and transformation of raw sales logs.
Enhanced core application performance by optimizing data structures and algorithms in the backend revenue processing layer.

visitor@portfolio:~$> npm install --save-dev

+Programming: Python

+Java

+Scala

+SQL

+Go

+TypeScript

+Shell Scripting

+Cloud & Platforms: Google Cloud Platform (GCP)

+Amazon Web Services (AWS)

+Snowflake

+Amazon Redshift

+BigQuery

+Data Engineering: ELT/ETL Pipelines

+Apache Spark

+Databricks

+Apache Airflow

+dbt

+Apache Kafka

+Apache NiFi

+AI & Machine Learning: Large Language Models (LLM)

+Retrieval-Augmented Generation (RAG)

+MLflow

+Vector Databases

+Statistical Modeling

+Databases & Storage: MongoDB

+Cassandra

+Postgres

+pgvector

+Amazon S3

+Identity Resolution

+DevOps & Tools: Docker

+Kubernetes

+Terraform

+Pulumi

+Jenkins

+Git

+Data Governance

visitor@portfolio:~$> cat /etc/education

Master’s in Computer Science with AI and ML

University of Central Missouri

| MO-US

Bachelor’s in CSE

UshaRama College of Engineering and Technology, JNTUK

| AP-India

visitor@portfolio:~$> cat .contact

[email protected]

+1 (330) 969-9139

visitor@portfolio:~$echo "Thanks for visiting!"