Data Scientist

Primary skills:

PDF extraction \& OCR post-processing: Hands-on experience extracting structured data from PDF, HTML, and scanned sources. Proficiency with Python libraries (pdfplumber, PyMuPDF, Tesseract OCR) and the ability to build robust post-processing and validation pipelines.
Python proficiency: Working experience in Python for data manipulation, analysis, and pipeline development. Comfortable with core OOP concepts (classes, encapsulation, inheritance) at a functional level.
Generative AI: Knowledge and experience in Generative AI (LLM models, prompting techniques, RAG / GraphRAG solutions).
SQL: Strong working experience in SQL for data querying, validation, and transformation.
Data analysis \& transformation: Experience analysing, transforming, manipulating and interpreting data.
Collaborative code repositories: Experience with shared code repositories (Git/GitHub).

Good to have skills

:Azure Databricks / PySpark: Experience with Databricks and PySpark for high-volume distributed data processing scenarios
.XBRL / iXBRL: Familiarity with XBRL and iXBRL financial reporting formats is an advantage for financial data product work
.Agile and scrum tools: Experience working with agile and scrum tools (Azure DevOps)
.Knowledge graphs / graph databases: Experience with Neo4j / Cypher or similar graph technologies

Data Scientist

Job Description