Job Description
- Architect and implement enterprise-grade Lakehouse solutions using Databricks.
- Design and deliver end-to-end data engineering pipelines, including batch and real-time streaming solutions.
- Lead implementation of:
- Cloud-based data lakehouse platforms integrating diverse data sources.
- Real-time data processing pipelines for operational and analytical use cases.
- Develop scalable ETL/ELT pipelines using PySpark, Scala, and SQL.
- Implement advanced data modeling solutions including 3NF, dimensional modeling, and enterprise data warehousing strategies.
- Design and build incremental data loading frameworks and metadata-driven ingestion pipelines.
- Establish data quality frameworks and governance standards.
- Implement and manage Unity Catalog, including fine-grained security and access controls.
- Leverage Databricks components such as:
- Delta Live Tables
- Autoloader
- Structured Streaming
- Databricks Workflows
- Integration with orchestration tools (e.g., Apache Airflow)
- Drive CI/CD automation, deployment strategies, and DevOps best practices.
- Optimize performance of pipelines, Spark jobs, and compute resources.
- Provide architectural guidance and technical leadership across cross-functional teams.
- Engage with stakeholders and clients to translate business requirements into scalable technical solutions.
Deep expertise in:
- Databricks and cloud-native storage/compute platforms
- Apache Spark (batch \& streaming)
- Delta Lake \& Lakehouse architecture
- Distributed data processing systems
Strong hands-on programming skills in Python, PySpark, Scala, and SQL.
Looking for more opportunities?
Browse thousands of graduate jobs and entry-level positions.