ROLE SUMMARY

We are hiring a hands-on NLP Engineer to build robust pipelines that convert policy, regulatory, fintech, and healthcare documents into structured, graph-ready data. You will own the full extraction lifecycle from raw text to clean, schema-validated outputs using classical NLP, deep learning, and LLM APIs.

KEY RESPONSIBILITIES

Pipeline Development:

Design and build end-to-end text extraction pipelines for policy, regulatory, fintech, and healthcare documents

Entity \& Clause Extraction:

Extract key entities (countries, companies, minerals) and structure policy clauses and obligations

Deep Learning \& Transformers:

Fine-tune BERT / RoBERTa for NER, text classification, and relation extraction tasks

LLM Integration:

Leverage LLM APIs with structured output extraction, prompt engineering, and tool/function calling

Data Engineering:

Build scalable Python pipelines for high-volume document processing with robust pre-processing for PDF, DOCX, and HTML

Schema \& Graph Readiness:

Define and enforce JSON schemas; ensure outputs are clean and compatible with knowledge graph ingestion

Accuracy Improvement:

Evaluate model performance, track metrics, and implement feedback loops to improve extraction quality over time

REQUIRED SKILLS

3–5 years hands-on NLP engineering real production pipelines, not just model experiments
Strong Python skills: OOP, async programming, packaging, and testing
NLP frameworks: spaCy, HuggingFace Transformers, NLTK
Deep learning: fine-tuning transformer models for sequence labeling and classification
LLM API integration: prompt engineering, structured outputs, and function/tool calling
Data pipeline experience: ETL, batch processing, and text pre-processing at scale
JSON schema design and validation using pydantic or json schema

GOOD TO HAVE

Experience with legal, regulatory, or policy documents (contracts, compliance filings, government publications)
Familiarity with knowledge graphs or graph databases (Neo4j, RDF)
Document parsing tools: pdfplumber, Docling, Apache Tika
Domain knowledge in fintech or healthcare NLP
Exposure to information extraction benchmarks (CoNLL, DocRED, SciERC)

NLP Engineer

Job Description

Looking for more opportunities?