Company Description

VOLTO Consulting empowers organizations by blending technology, talent, and strategy to drive transformation and innovation. Offering services such as SAP, AI, digital engineering, cloud, and infrastructure support, VOLTO enables clients to scale and adapt rapidly. With expertise across industries like banking, healthcare, energy, and manufacturing, the company delivers tailored, outcome-driven solutions. Its nearshore and offshore delivery models, combined with deep domain knowledge, help clients build and optimize systems and teams for sustainable growth. Trusted by enterprises and startups alike, VOLTO bridges strategy with execution to deliver measurable results.

Role Description

Key Responsibilities

Own end-to-end optimization of ML models across ASR, TTS, NLP, and LLM pipelines for production

Reduce model size, memory footprint, and inference latency for real-time applications

Implement model compression techniques (quantization, pruning, distillation) and optimize inference using ONNX, TensorRT, TVM, or similar frameworks

Design and deploy efficient inference pipelines for edge and cloud environments, enabling low compute, on-device AI

Develop performance-critical components in C/C\+\+ and build high-concurrency backend services using Go (Golang)

Architect scalable, low-latency microservices (gRPC/REST) and optimize throughput via batching, caching, and parallel processing

Maximize CPU/GPU utilization and reduce cost per inference

Integrate optimized inference engines with production systems and manage deployments via Docker and Kubernetes

Define and track performance metrics (latency, memory, throughput, cost) and establish monitoring/observability frameworks

Collaborate with ASR, TTS, and LLM teams to productionize models and drive trade-offs between accuracy, latency, and cost

Mentor engineers and establish best practices in ML systems optimization and performance engineering

Education \& Experience

• 5–10\+ years in ML engineering / systems / performance optimization

• Strong experience in deploying ML models at scale (edge \+ cloud) Technical Skills

• Strong programming in Go (Golang) and C/C\+\+ (both mandatory)

• Experience with low-level optimization, memory management, and high-performance computing

• Deep learning frameworks: PyTorch / TensorFlow

Inference optimization tools: ONNX Runtime, TensorRT, TVM, OpenVINO

Backend systems: gRPC, REST APIs, microservices architecture

Strong understanding of hardware-aware optimization (CPU, GPU, edge accelerators)

Infrastructure: Docker, Kubernetes, cloud platforms (AWS/GCP/Azure) Preferred Qualifications o Experience with speech models (ASR/TTS) and/or LLMs o Edge AI deployment (Android, iOS, IoT, embedded systems) o Familiarity with WebRTC / real-time streaming systems o Contributions to performance-critical ML systems or infra

Senior ML Systems Engineer

Job Description

Looking for more opportunities?