Location
Dallas, TX
Salary
Not specified
Type
fulltime
Posted
Today
Job Description
Job Title: Data Engineer
Location: Dallas /Plano, TX or Middletown, NJ (Onsite) – Need Locals only within 1hr. distance in NJ; no NY candidates
Duration: 6\+ months Contract
Rate: $60-62/hr. W2
MOI: Phone/Video (MS Teams)
Visa: GC/USC
Client: Telecommunications Company
Need active LinkedIn profiles with local ID proof. Need Last 4 SSN \& DOB with submission.
Job Description:
Data Engineer (Streaming \& Full Stack Databricks)
Telecom experience is a major required especially AT\&T.
Role Summary
We are seeking a high-performing Data Engineer to design and implement a real-time data platform using the Medallion Architecture. You will be responsible for the end-to-end development of data pipelines—from ingesting real-time source data into Bronze, transforming it into a relational silver layer, and finally delivering high-concurrency, consumption-ready JSON Gold tables. You will act as a "Full Stack" data professional, handling everything from infrastructure automation (DataOps) to complex nested data modeling.
Key Responsibilities
- Real-Time Ingestion:
Build scalable ingestion pipelines using Auto Loader and Spark Structured Streaming to capture data from Kafka, Event Hubs, or CDC sources into raw Delta tables.
- Relational Transformation:
Develop ELT logic to cleanse, deduplicate, and normalize data into a relational format. Ensure ACID compliance and "exactly-once" processing semantics.
- JSON API Optimization:
Design and build the layer specifically for client consumption. This involves flattening/nesting data into optimized JSON structures within Delta tables to support low-latency API queries.
- Advanced Orchestration:
Implement and manage complex workflows using Delta Live Tables (DLT) or Standard Streaming Live tables and Databricks Workflows to ensure data freshness and lineage.
- Governance \& Security:
Use Unity Catalog to enforce fine-grained access control (row/column level) and maintain a searchable data catalog for consuming clients.
- DataOps \& Automation:
Own the deployment lifecycle using Databricks Asset Bundles (DABs) and CI/CD pipelines (GitHub Actions/Azure DevOps) to ensure reproducible environments.
- Performance Tuning:
Optimize streaming triggers, watermarking, and stateful processing to minimize latency and manage cloud costs effectively.
Skills \& Qualifications
- 1\. Technical Core (Databricks \& Spark)Expert PySpark/Scala:
Deep understanding of Spark internals, broadcast joins, and RDD/Dataframe partitioning.
- Delta Lake Mastery:
Proficiency in Delta features like Z-Ordering, Liquid Clustering, Change Data Feed (CDF), and Time Travel.
- Streaming Patterns:
Hands-on experience with Watermarking, Checkpoints, and handling late-arriving data in Structured Streaming.
- 2\. Data Modeling \& LanguagesSQL:
Expert-level SQL for complex transformations and window functions.
- JSON/Semi-Structured Data:
Mastery of parsing and generating complex nested JSON objects within Spark (e.g., struct, array, to_json, from_json).
- Medallion Design:
Proven experience moving data across Bronze, Silver, and Gold layers with clear "Data Contracts."
- 3\. Full Stack \& DevOpsCI/CD:
Experience automating data pipeline deployments (Git-based workflows).
- Observability:
Ability to set up monitoring and alerts using Databricks SQL Alerts or Grafana to track pipeline lag.
- 4\. Soft SkillsArchitectural Thinking:
Ability to decide when to use "Continuous" vs. "AvailableNow" streaming based on cost vs. latency requirements.
Client Focus:
Understanding how an API client (e.g., a React app or a microservice) will consume the Gold layer JSON
Looking for more opportunities?
Browse thousands of graduate jobs and entry-level positions.