Skip to main content
C

Data Engineer

Caterpillar Inc.

Location

Dallas, TX

Salary

Not specified

Type

fulltime

Posted

Today

via linkedin

Job Description

Job Title: Data Engineer

Location: Dallas /Plano, TX or Middletown, NJ (Onsite) – Need Locals only within 1hr. distance in NJ; no NY candidates

Duration: 6\+ months Contract

Rate: $60-62/hr. W2

MOI: Phone/Video (MS Teams)

Visa: GC/USC

Client: Telecommunications Company

Need active LinkedIn profiles with local ID proof. Need Last 4 SSN \& DOB with submission.

Job Description:

Data Engineer (Streaming \& Full Stack Databricks)

Telecom experience is a major required especially AT\&T.

Role Summary

We are seeking a high-performing Data Engineer to design and implement a real-time data platform using the Medallion Architecture. You will be responsible for the end-to-end development of data pipelines—from ingesting real-time source data into Bronze, transforming it into a relational silver layer, and finally delivering high-concurrency, consumption-ready JSON Gold tables. You will act as a "Full Stack" data professional, handling everything from infrastructure automation (DataOps) to complex nested data modeling.

Key Responsibilities

  • Real-Time Ingestion:

Build scalable ingestion pipelines using Auto Loader and Spark Structured Streaming to capture data from Kafka, Event Hubs, or CDC sources into raw Delta tables.

  • Relational Transformation:

Develop ELT logic to cleanse, deduplicate, and normalize data into a relational format. Ensure ACID compliance and "exactly-once" processing semantics.

  • JSON API Optimization:

Design and build the layer specifically for client consumption. This involves flattening/nesting data into optimized JSON structures within Delta tables to support low-latency API queries.

  • Advanced Orchestration:

Implement and manage complex workflows using Delta Live Tables (DLT) or Standard Streaming Live tables and Databricks Workflows to ensure data freshness and lineage.

  • Governance \& Security:

Use Unity Catalog to enforce fine-grained access control (row/column level) and maintain a searchable data catalog for consuming clients.

  • DataOps \& Automation:

Own the deployment lifecycle using Databricks Asset Bundles (DABs) and CI/CD pipelines (GitHub Actions/Azure DevOps) to ensure reproducible environments.

  • Performance Tuning:

Optimize streaming triggers, watermarking, and stateful processing to minimize latency and manage cloud costs effectively.

Skills \& Qualifications

  • 1\. Technical Core (Databricks \& Spark)Expert PySpark/Scala:

Deep understanding of Spark internals, broadcast joins, and RDD/Dataframe partitioning.

  • Delta Lake Mastery:

Proficiency in Delta features like Z-Ordering, Liquid Clustering, Change Data Feed (CDF), and Time Travel.

  • Streaming Patterns:

Hands-on experience with Watermarking, Checkpoints, and handling late-arriving data in Structured Streaming.

  • 2\. Data Modeling \& LanguagesSQL:

Expert-level SQL for complex transformations and window functions.

  • JSON/Semi-Structured Data:

Mastery of parsing and generating complex nested JSON objects within Spark (e.g., struct, array, to_json, from_json).

  • Medallion Design:

Proven experience moving data across Bronze, Silver, and Gold layers with clear "Data Contracts."

  • 3\. Full Stack \& DevOpsCI/CD:

Experience automating data pipeline deployments (Git-based workflows).

  • Observability:

Ability to set up monitoring and alerts using Databricks SQL Alerts or Grafana to track pipeline lag.

  • 4\. Soft SkillsArchitectural Thinking:

Ability to decide when to use "Continuous" vs. "AvailableNow" streaming based on cost vs. latency requirements.

Client Focus:

Understanding how an API client (e.g., a React app or a microservice) will consume the Gold layer JSON

Looking for more opportunities?

Browse thousands of graduate jobs and entry-level positions.

Browse All Jobs