Skip to main content
B

NLP Engineer

BigStep Technologies

Location

Remote

Salary

Not specified

Type

fulltime

Posted

Today

via linkedin

Job Description

ROLE SUMMARY

We are hiring a hands-on NLP Engineer to build robust pipelines that convert policy, regulatory, fintech, and healthcare documents into structured, graph-ready data. You will own the full extraction lifecycle from raw text to clean, schema-validated outputs using classical NLP, deep learning, and LLM APIs.

KEY RESPONSIBILITIES

  • Pipeline Development:

Design and build end-to-end text extraction pipelines for policy, regulatory, fintech, and healthcare documents

  • Entity \& Clause Extraction:

Extract key entities (countries, companies, minerals) and structure policy clauses and obligations

  • Deep Learning \& Transformers:

Fine-tune BERT / RoBERTa for NER, text classification, and relation extraction tasks

  • LLM Integration:

Leverage LLM APIs with structured output extraction, prompt engineering, and tool/function calling

  • Data Engineering:

Build scalable Python pipelines for high-volume document processing with robust pre-processing for PDF, DOCX, and HTML

  • Schema \& Graph Readiness:

Define and enforce JSON schemas; ensure outputs are clean and compatible with knowledge graph ingestion

  • Accuracy Improvement:

Evaluate model performance, track metrics, and implement feedback loops to improve extraction quality over time

REQUIRED SKILLS

  • 3–5 years hands-on NLP engineering real production pipelines, not just model experiments
  • Strong Python skills: OOP, async programming, packaging, and testing
  • NLP frameworks: spaCy, HuggingFace Transformers, NLTK
  • Deep learning: fine-tuning transformer models for sequence labeling and classification
  • LLM API integration: prompt engineering, structured outputs, and function/tool calling
  • Data pipeline experience: ETL, batch processing, and text pre-processing at scale
  • JSON schema design and validation using pydantic or json schema

GOOD TO HAVE

  • Experience with legal, regulatory, or policy documents (contracts, compliance filings, government publications)
  • Familiarity with knowledge graphs or graph databases (Neo4j, RDF)
  • Document parsing tools: pdfplumber, Docling, Apache Tika
  • Domain knowledge in fintech or healthcare NLP
  • Exposure to information extraction benchmarks (CoNLL, DocRED, SciERC)

Looking for more opportunities?

Browse thousands of graduate jobs and entry-level positions.

Browse All Jobs