Job Description
Data Scientist
Primary skills:
- PDF extraction \& OCR post-processing: Hands-on experience extracting structured data from PDF, HTML, and scanned sources. Proficiency with Python libraries (pdfplumber, PyMuPDF, Tesseract OCR) and the ability to build robust post-processing and validation pipelines.
- Python proficiency: Working experience in Python for data manipulation, analysis, and pipeline development. Comfortable with core OOP concepts (classes, encapsulation, inheritance) at a functional level.
- Generative AI: Knowledge and experience in Generative AI (LLM models, prompting techniques, RAG / GraphRAG solutions).
- SQL: Strong working experience in SQL for data querying, validation, and transformation.
- Data analysis \& transformation: Experience analysing, transforming, manipulating and interpreting data.
- Collaborative code repositories: Experience with shared code repositories (Git/GitHub).
Good to have skills
- :Azure Databricks / PySpark: Experience with Databricks and PySpark for high-volume distributed data processing scenarios
- .XBRL / iXBRL: Familiarity with XBRL and iXBRL financial reporting formats is an advantage for financial data product work
- .Agile and scrum tools: Experience working with agile and scrum tools (Azure DevOps)
- .Knowledge graphs / graph databases: Experience with Neo4j / Cypher or similar graph technologies
.
Looking for more opportunities?
Browse thousands of graduate jobs and entry-level positions.