Skip to main content
H

Artificial Intelligence Engineer

HDFC securities

Location

Bengaluru, Karnataka, India

Salary

Not specified

Type

fulltime

Posted

Today

via linkedin

Job Description

We are building production-grade AI systems for capital markets, including an AI-powered investing assistant that runs on cloud-native infrastructure and integrates with regulated trading and research platforms. We are hiring a Senior AI Engineer to build, evaluate, and operate LLM-based products end to end.

This is a deeply hands-on role. You will write code, debug live systems, run evaluations, and ship to production. We are not looking for someone whose AI experience is limited to wiring up a hosted chat API — we expect you to have personally built, broken, and fixed LLM systems in production.

Experience

  • 5–8 years in software engineering, with 2\+ years on LLM/AI products in production.
  • Strong track record of shipping AI features that are actually used by real users at scale.

Required Skills

1\. LLM Hosting \& Serving

  • Hands-on experience hosting LLMs for testing, evaluation, and production inference.
  • Working knowledge of inference servers and runtimes: vLLM, TGI (Text Generation Inference), TensorRT-LLM, Ollama, llama.cpp.
  • Experience deploying open-weight models (Llama, Mistral, Qwen, Nemotron, GPT-OSS, DeepSeek, etc.) on GPU instances — knowledge of quantization (GPTQ, AWQ, GGUF, FP8), batching strategies (continuous batching, paged attention), and KV-cache management.
  • Experience with managed model hosting platforms: AWS Bedrock, SageMaker, Azure OpenAI, Vertex AI, or equivalent.
  • Ability to choose between hosted APIs and self-hosted inference based on cost, latency, throughput, and data-residency constraints — and to defend that choice with numbers.

2\. LLM Evaluation \& Testing

  • Designing and running case-specific test suites for LLM-based applications — not just generic benchmarks.
  • Building eval datasets from production traffic, edge cases, and adversarial prompts. Experience curating golden datasets and maintaining them as the product evolves.
  • Hands-on with evaluation frameworks: Langfuse, Promptfoo, DeepEval, RAGAS, OpenAI Evals, LM-Eval-Harness, or equivalents.
  • LLM-as-a-judge pipelines — including knowing the failure modes (judge bias, position bias, verbosity bias) and how to mitigate them.
  • Regression testing for prompts, models, and tool chains. Catching silent quality drift between model versions.
  • Quantitative metrics: faithfulness, groundedness, answer relevance, tool-selection accuracy, hallucination rates, latency percentiles, token cost per query.

3\. LLM Frameworks \& Orchestration

  • Working knowledge of LangChain, LangGraph, LlamaIndex, Haystack, or equivalent orchestration frameworks.
  • Experience with agentic patterns: ReAct, ReWoo, Reflexion, Plan-and-Execute, multi-agent workflows.
  • MCP (Model Context Protocol) and tool-calling: building tool schemas, handling tool-selection failures, recovering from malformed tool calls.
  • Comfortable working outside Python ecosystems — building LLM applications in Go, Java/Kotlin, TypeScript/Node, or custom in-house frameworks. We do not assume Python is the right answer for production services.
  • Streaming responses (SSE, WebSockets), session management, and handling long-running agentic loops gracefully.

4\. Retrieval \& Context Engineering

  • Hands-on with embedding models, vector databases (pgvector, OpenSearch, Pinecone, Weaviate, Milvus), and hybrid search (BM25 \+ dense).
  • Chunking strategies, re-ranking (Cohere Rerank, cross-encoders), and query rewriting.
  • Knowledge of when RAG is the wrong answer (and what to do instead).

5\. Good-to-Have

  • Fine-tuning / instruction-tuning / LoRA / QLoRA / DPO on open-weight models.
  • RLHF or RLAIF exposure.
  • Prompt distillation, model routing, and cost optimization at scale.
  • Guardrails: PII redaction, jailbreak detection, output validation (Guardrails AI, NeMo Guardrails, Llama Guard).
  • Experience with multimodal models (vision, audio, ASR/TTS).
  • Contributions to open-source AI/ML projects.

Responsibilities

  • Build and ship LLM-powered features: agentic workflows, RAG pipelines, tool-using assistants, summarization and classification services.
  • Host, serve, and benchmark LLMs — both hosted (Bedrock, Azure OpenAI) and self-hosted (vLLM, TGI) — with measurable latency, throughput, and cost targets.
  • Write and maintain case-specific test suites; create eval datasets from real traffic; gate model and prompt changes on regression results.
  • Instrument production: traces, prompts, tool calls, token usage, error taxonomy. Build dashboards that tell you when quality is degrading.
  • Collaborate with product, design, and domain experts to translate fuzzy requirements into concrete prompts, tools, and evals.
  • Mentor junior engineers and review code with care.
  • Participate in on-call for AI services and contribute to runbooks and RCAs.

Looking for more opportunities?

Browse thousands of graduate jobs and entry-level positions.

Browse All Jobs