Skip to main content
Portfolio

charitha sree sakhamuri

Data Engineer

Data Engineer with 3+ years of experience designing and operating scalable data pipelines, analytical data models, and cloud data platforms. Strong expertise in SQL, Python, and Spark for transforming raw, high-volume data into trusted datasets used for reporting, optimization, and AI/ML workflows. Proven ability to translate business and commercialization requirements into robust data models while enforcing data quality, privacy, and governance standards

[email protected] (678)-739-1251

Experience

Data Engineer

Saayam For ALL — Remote, USA

July 2025 — present
  • Built a real-time data processing pipeline using Apache Kafka and Apache Flink to ingest and process high-volume event streams,
  • enabling near real-time reporting and operational insights.
  • Designed and implemented scalable batch and streaming workflows on Databricks (Spark Structured Streaming) over an AWS S3 data
  • lake, following Medallion (Bronze/Silver/Gold) architecture.
  • Developed and managed production-grade DAGs in Apache Airflow to orchestrate ingestion, transformation, and validation workflows
  • across multiple data sources.
  • Engineered optimized data transformations using Python, PySpark, and advanced SQL (window functions, CTEs, performance tuning) to
  • handle large-scale structured datasets efficiently.
  • Integrated AWS services including S3, Lambda, IAM, and CloudWatch to manage storage, automation, access control, and monitoring of
  • data pipelines.
  • Improved pipeline performance and reliability by implementing Delta Lake schema enforcement, partitioning strategies, and streaming
  • data validation checks, reducing latency and failure rates.

Data Engineer

RecVue — Hyderabad, Telangana

July 2022 — May 2023
  • Translated business and commercialization requirements into scalable conceptual, logical, and physical data models supporting revenue,
  • usage, and churn analytics.
  • Designed and maintained batch data pipelines using Python, SQL, and Spark to transform raw source data into analytics-ready and ML-
  • ready datasets.
  • Implemented a data quality framework covering schema enforcement, null checks, duplicate detection, and aggregation accuracy.
  • Ensured proper handling of sensitive data by enforcing consistent schemas, controlled access patterns, and governed data exposure.
  • Built structured warehouse tables optimized for complex analytical queries and downstream reporting.
  • Optimized Spark transformations and SQL queries, improving pipeline reliability and reporting consistency by ~20%.
  • Stored and managed large-scale structured datasets in cloud object storage (AWS S3).

Junior Data Engineer

Vijeta High School — Guntur, Andhra Pradesh

August 2021 — June 2022
  • Designed and maintained ETL pipelines integrating operational and reporting data from multiple source systems.
  • Modeled structured datasets and aggregated tables to support dashboards and recurring analytical reports.
  • Developed and optimized complex SQL queries for time-based analysis and performance reporting.

Data Engineer Intern

Skill Banc — Hyderabad, Telangana

May 2021 — July 2021
  • Assisted in building foundational data ingestion and transformation pipelines.
  • Performed data validation, preprocessing, and schema alignment to support analytics workflows.
  • Wrote optimized SQL queries for internal reporting and insights.
  • Improved batch pipeline performance through Spark job tuning and query optimization.

Expertise

Languages & Querying: SQL (advanced joins
window functions
CTEs
query optimization)
Python
PySpark
Data Processing & Platforms: Apache Spark
Databricks
Delta Lake
Spark Structured Streaming
Orchestration: Apache Airflow (DAG design & scheduling)
Dagster
Streaming & Real-Time Systems: Apache Kafka
AWS Kinesis
Apache Flink
Data Architecture & Modeling: Medallion Architecture (Bronze/Silver/Gold)
Star & Snowflake schema
Conceptual/Logical/Physical modeling
Cloud & Storage: AWS (S3
IAM
Lambda)
Cloud Data Warehousing (Snowflake/Redshift/BigQuery-style systems)
Data Lake & Lakehouse: S3 Data Lake
Delta Lake
Partitioning strategies
Schema enforcement
Containerization & DevOps: Docker
CI/CD pipelines (GitHub Actions/Jenkins)
Git
Analytics & BI: Tableau
Power BI
Excel

Education

Master of Science

The University of Texas at Arlington

Aug 2023 — May 2025

Bachelor of Science

ICFAI University, Hyderabad

July 2019 — May 2023

Contact

Let's Talk

[email protected](678)-739-1251

Built with GradJobs