Location
Remote
Salary
Not specified
Type
fulltime
Posted
Today
Job Description
ROLE SUMMARY
We are hiring a hands-on NLP Engineer to build robust pipelines that convert policy, regulatory, fintech, and healthcare documents into structured, graph-ready data. You will own the full extraction lifecycle from raw text to clean, schema-validated outputs using classical NLP, deep learning, and LLM APIs.
KEY RESPONSIBILITIES
- Pipeline Development:
Design and build end-to-end text extraction pipelines for policy, regulatory, fintech, and healthcare documents
- Entity \& Clause Extraction:
Extract key entities (countries, companies, minerals) and structure policy clauses and obligations
- Deep Learning \& Transformers:
Fine-tune BERT / RoBERTa for NER, text classification, and relation extraction tasks
- LLM Integration:
Leverage LLM APIs with structured output extraction, prompt engineering, and tool/function calling
- Data Engineering:
Build scalable Python pipelines for high-volume document processing with robust pre-processing for PDF, DOCX, and HTML
- Schema \& Graph Readiness:
Define and enforce JSON schemas; ensure outputs are clean and compatible with knowledge graph ingestion
- Accuracy Improvement:
Evaluate model performance, track metrics, and implement feedback loops to improve extraction quality over time
REQUIRED SKILLS
- 3–5 years hands-on NLP engineering real production pipelines, not just model experiments
- Strong Python skills: OOP, async programming, packaging, and testing
- NLP frameworks: spaCy, HuggingFace Transformers, NLTK
- Deep learning: fine-tuning transformer models for sequence labeling and classification
- LLM API integration: prompt engineering, structured outputs, and function/tool calling
- Data pipeline experience: ETL, batch processing, and text pre-processing at scale
- JSON schema design and validation using pydantic or json schema
GOOD TO HAVE
- Experience with legal, regulatory, or policy documents (contracts, compliance filings, government publications)
- Familiarity with knowledge graphs or graph databases (Neo4j, RDF)
- Document parsing tools: pdfplumber, Docling, Apache Tika
- Domain knowledge in fintech or healthcare NLP
- Exposure to information extraction benchmarks (CoNLL, DocRED, SciERC)
Looking for more opportunities?
Browse thousands of graduate jobs and entry-level positions.