Location
Sunnyvale, CA
Salary
Not specified
Type
fulltime
Posted
Today
via linkedin
Job Description
Job Description
Responsibilities
- Data Pipeline \& Infrastructure Development: Build, maintain, and scale data pipelines
(ETL or ELT) using tools like Apache Spark, Airflow, and Kafka to support AI and ML workloads.
- AI Ready Data Preparation: Transform messy, unstructured data (text, images, video) into structured datasets suitable for model training, including handling feature engineering and vector database ingestion.
- ML Model Product ionization: Partner with data scientists to deploy ML models, create APIs for models, and implement MLOPS practices, including monitoring for data drift.
- Analytics and Visualization: Create dashboards (Tableau, Power BI, Looker) and run SQL queries to provide actionable business insights, acting as an analytics engineer.
- Data Governance \& Quality: Ensure data quality, reliability, and security (PII or PHI) within AI systems, ensuring compliance with regulations like GDPR or HIPAA.
- Cloud and Data Management: Operate within cloud environments (AWS, Azure, Google Cloud) using services like S3, Redshift, Glue, or Databricks.
Key Skills and Qualifications
- Programming Languages: Expert level Python and Advance SQL are mandatory. Java or Scala are preferred for large scale distributed systems.
- ML Frameworks: Familiarity with libraries such as PyTorch, TensorFlow, or Scikit learn for data manipulation and model interaction.
- Data Engineering Tools: Experience with Apache Spark, Kafka, Airflow, dbt, and Vector Databases (Pinecone, Milvus).
- Cloud Platforms: Hands on experience with AWS (Glue, SageMaker) or GCP.
- Analytical Skills: Strong ability to perform exploratory data analysis (EDA) and interpret complex datasets.
- Soft Skills: Must have Strong communication to bridge technical data engineering with business stakeholders.
Looking for more opportunities?
Browse thousands of graduate jobs and entry-level positions.