Versatile Data Engineer with over 8 years of experience building scalable data infrastructure and unified GTM data foundations. Expert in developing robust ELT/ETL pipelines and identity resolution logic across CRM, marketing automation, and cloud warehouses. Proven ability to implement Retrieval-Augmented Generation (RAG) and Large Language Model (LLM) workflows for production-level revenue intelligence applications. Technically proficient in Python, SQL, and distributed systems within AWS and Google Cloud Platform (GCP) environments. Committed to delivering high-quality data assets and predictive models that drive revenue activation and business growth.
Versatile Data Engineer with over 8 years of experience building scalable data infrastructure and unified GTM data foundations. Expert in developing robust ELT/ETL pipelines and identity resolution logic across CRM, marketing automation, and cloud warehouses. Proven ability to implement Retrieval-Augmented Generation (RAG) and Large Language Model (LLM) workflows for production-level revenue intelligence applications. Technically proficient in Python, SQL, and distributed systems within AWS and Google Cloud Platform (GCP) environments. Committed to delivering high-quality data assets and predictive models that drive revenue activation and business growth.
- Developed Go-to-Market (GTM) data foundations on Google Cloud Platform (GCP) to unify CRM and product telemetry data.
- Built Retrieval-Augmented Generation (RAG) pipelines utilizing vector databases to automate identity resolution across disparate datasets.
- Implemented Machine Learning (ML) evaluation workflows on GCP to score pipeline likelihood and customer expansion propensity.
- Constructed distributed ELT pipelines to migrate massive GTM datasets to BigQuery, establishing canonical schemas for revenue activation.
- Optimized data processing logic through partitioning and caching strategies to maximize cost efficiency for advertising measurements.
- Collaborated with cross-functional teams to implement automated observability dashboards for monitoring GTM data quality and lineage.
- Designed AWS-based data ingestion services for Salesforce and marketing automation signals using Lambda and Kinesis.
- Integrated Generative AI (GenAI) and Large Language Model (LLM) orchestration to automate metadata extraction for prospect intelligence.
- Built event-driven architectures to trigger real-time revenue signals and signals activation using Amazon Simple Notification Service (SNS).
- Implemented Snowflake data models to support ABM prioritization and ICP scoring for high-velocity telemetry data.
- Developed dbt models with comprehensive data quality tests to ensure governance across unified AWS-based revenue data layers.
- Optimized Spark performance using predicate pushdown and broadcast joins to reduce infrastructure costs for revenue modeling.
- Developed real-time ingestion pipelines on AWS for CRM and billing data using Apache NiFi with fault-tolerant error handling.
- Improved ELT efficiency by migrating legacy workflows to optimized Apache Spark applications within AWS EMR environments.
- Utilized Databricks to productionize segmentation and scoring models for pipeline predictability and trial health.
- Built Java-based automation tools to monitor job lineage and trigger workflows via RESTful APIs for GTM system integration.
- Integrated Redshift and Snowflake for high-performance warehousing to support enterprise-wide revenue intelligence requirements.
- Refactored distributed codebases to align with modern architectural standards and ensured high-quality delivery through unit testing.
- Maintained Hadoop cluster technical operations to support large-scale processing of prospect intelligence and ecosystem signals.
- Developed custom MapReduce programs in Java to cleanse and deduplicate millions of CRM records for improved targeting.
- Optimized Hive queries using vectorized execution and columnar storage formats like ORC to accelerate GTM program reporting.
- Built data quality frameworks to validate record counts and schema conformance across billing and enrichment datasets.
- Implemented data aggregation agents to centralize telemetry logs from multiple sources into a unified system for observability.
- Executed troubleshooting in complex UNIX/Linux environments to ensure high availability of data processing for revenue workflows.
- Developed backend components for enterprise data management applications using Spring MVC and Hibernate ORM.
- Built multithreading batch processors in Java to facilitate high-concurrency loading of large sales datasets into SQL databases.
- Designed RESTful APIs following OpenAPI standards to support secure integration between GTM systems and external partners.
- Implemented Data Access Object (DAO) layers and optimized connection pooling to increase efficiency of database interactions.
- Wrote complex SQL queries and PL/SQL packages for high-performance reporting and transformation of raw sales logs.
- Enhanced core application performance by optimizing data structures and algorithms in the backend revenue processing layer.