Responsible for data cleaning (ETL) and data warehouse construction to support large-scale AI models
Responsible for training and fine-tuning large AI models to meet the requirements of specific business scenarios
Responsible for developing supporting tools, such as dashboards and general business logic, to ensure the practicality of AI model applications
Must have hands-on development experience and be able to lead a team or independently complete projects related to data collection and development
负责数据清洗（ETL）和数仓建设，从而为大模型服务
负责大模型训练和调优，以满足对应业务场景要求
负责开发周边工具，比如dashboad和普通业务逻辑，以实现大模型应用产品实用性。
要有实际开发经验，带队或独立完成数据收集开发相关项目

Requirements

A degree in computer science or a related field is preferred. Must be familiar with professional knowledge in machine learning, deep learning, and natural language processing, with at least 1 year of experience in GPT or Gemini application development, and proficient in deep learning frameworks such as PyTorch or TensorFlow
Familiar with models such as Transformer, BERT, GPT, and fine-tuning algorithms like LoRA, with experience in fine-tuning models
Must have Java programming experience
Experience in backend Java development for data engineering use cases, particularly real-time processing with Apache Flink
Must have experience in data warehouse development and construction, such as using Flink and building ETL data cleaning pipelines
Experience with large model pre-training and practical application in business scenarios is a plus
Must have hands-on experience in setting up large models based on open-source frameworks
Experience in conversational AI, marketing content generation, or machine translation is preferred
Priority will be given to candidates with hands-on experience in Google Cloud Platform (GCP), particularly those with experience in BigQuery
计算机相关专业优先，熟悉机器学习、深度学习、自然语言处理等领域专业知识，必须有过至少1年的GPT或者Gemini应用开发，熟悉pytorch/tensorflow深度学习框架；
熟悉transformer、bert、gpt等模型，熟悉LoRA等微调算法，有微调模型的经验；
必须有Java编程经验；
在数据工程用例下后端 Java 开发经验，特别是使用 Apache Flink 的实时处理
必须数仓开发和建设经验，比如flink技术和ETL数据清洗流水线搭建。
有大模型预训练、实际业务场景落地经验者优先；
必须有过基于开源大模型自己搭建的经验；
有对话机器人，营销广告素材生成，机器翻译方向工作经验者优先。
优先考虑具备 Google Cloud Platform（GCP）实战经验，尤其是 BigQuery 相关经验的候选人。

Benefits

Lead community-building for Southeast Asia's largest parenting ecosystem
Be at the forefront of connecting brands with real parents in authentic and impactful ways
Work with a passionate team driving innovation in the parenting space
Regional exposure across three of SSEA's most dynamic markets
引领东南亚最大育儿生态社区的发展与建设
站在前线，以真实且有影响力的方式连接品牌与真实父母
与充满热情的团队合作，共同推动育儿领域的创新
拥有覆盖东南亚三大核心市场的区域曝光与发展机会

Data Engineer ETL 工程师

Job Description

Looking for more opportunities?