Job Description
🚀 Data Engineer (Spark Specialist)
Location:
Remote
Experience Level:
Senior (5\+ years)
Type:
Full-time, Permanent
About Retailogists
Retailogists is a fast-growing startup at the intersection of retail consulting and technology. We combine deep retail domain expertise with technical excellence in big data, full-stack engineering, and AI/ML. Our clients range from fast-scaling digital brands to large, multi-location retailers.
We're a nimble team of technologists, consultants, and builders — and we're looking for a Senior Spark Engineer who lives and breathes distributed data processing. If tuning a misbehaving Spark job is your idea of a good afternoon, we want to talk to you.
What You'll Do
As our Spark specialist, you'll play a leadership role the heavy-lifting layer of our clients' data platform: the pipelines that move, transform, and reshape large volumes of retail data for both internal tools and client-facing products. Most of this runs on AWS Glue today, and you'll be the person we turn to for getting it fast, reliable, and cost-efficient.
Responsibilities include:
- Designing, building, and maintaining large-scale Spark pipelines on AWS Glue (PySpark and/or Scala)
- Tuning Spark jobs for performance and cost — partitioning, shuffles, joins, caching, executor sizing, the works
- Debugging and stabilizing production Spark workloads, including spill, skew, and OOM issues
- Architecting batch and incremental ETL/ELT patterns across S3-based data lakes (Parquet, Iceberg, Delta, or Hudi)
- Integrating Glue with the broader AWS data stack (S3, Athena, Lake Formation, Step Functions, EMR where relevant)
- Establishing engineering standards for Spark code — testing, modularity, reusability, and CI/CD for Glue jobs
- Partnering with analysts, data scientists, and client teams to land production-ready data where it needs to go
What We're Looking For (must-haves)
- 5\+ years of professional data engineering experience, with a heavy Spark focus
- Deep, hands-on Spark expertise: you understand the execution model, the Catalyst optimizer, and how to read a Spark UI to find the real bottleneck
- Strong production experience with
AWS Glue
— Glue jobs, Glue Catalog, crawlers, bookmarks, and the quirks that come with them
- Proficiency in PySpark (Scala is a plus)
- Comfort working with columnar formats and modern lakehouse table formats (Parquet, Iceberg, Delta, or Hudi)
- Solid SQL fundamentals
Nice to Have
- Experience with cloud data warehouses (Redshift, Snowflake, BigQuery)
- Familiarity with dbt and semantic-layer modelling
- Exposure to BI tooling (Metabase, Looker Studio, Power BI, etc.)
- Background in analytics engineering or BI workflows
- Orchestration experience (Airflow, Step Functions, Dagster)
- Retail or e-commerce data experience
Work Environment
- Fully remote with the option to use offices in Montreal / Toronto
- Flexible hours, collaborative culture, and high-impact work
- Direct exposure to clients and real business problems — your pipelines will power decisions, not sit in a backlog
Looking for more opportunities?
Browse thousands of graduate jobs and entry-level positions.