Engineering
Featured

AI Engineer / Researcher

Hybrid · Ho Chi Minh City, Vietnam
Full Time
5+ years

Train, fine-tune, and optimize open-source LLMs and multimodal models for domain-specific retrieval, reasoning, and workflow automation in production environments.

About the Role

We are building domain-specialized AI agents and models for enterprise applications. This role focuses on advancing our AI capabilities by training, fine-tuning, and optimizing open-source LLMs and multimodal models for retrieval, reasoning, and workflow automation. You will be at the forefront of applied AI research, adapting state-of-the-art models to real-world business problems, optimizing for performance and cost, and deploying robust solutions that deliver measurable value. This is a hands-on role that bridges research and production engineering.

Key Responsibilities

Model Development & Fine-tuning

Research, train, and fine-tune open-source LLMs (LLaMA, Mistral, Qwen, Gemma, etc.) for domain-specific tasks.

Train and adapt BERT and other Transformer encoder models (RoBERTa, DeBERTa, MPNet) for classification, retrieval, and embedding workloads.

Implement supervised fine-tuning (SFT), instruction tuning, and preference-based alignment (RLHF, DPO, ORPO).

Run distributed training jobs on Ray / KubeRay clusters with DeepSpeed or Hugging Face Accelerate.

Develop efficient data pipelines for model training: data cleaning, tokenization, chunking, and labeling.

Optimize models for RAG pipelines, grounding responses in canonical data and metadata.

Evaluation, Serving & Deployment

Evaluate models with LangSmith, custom benchmarks, and human-in-the-loop feedback loops.

Deploy optimized models to production environments (cloud, on-prem, or air-gapped setups) using vLLM, SGLang, or TGI.

Route and govern model traffic with liteLLM for multi-model, multi-provider serving patterns.

Collaborate with platform engineers on inference infrastructure: tensor parallelism, continuous batching, KV-cache tuning.

Maintain experiment tracking and ensure reproducibility across training runs.

Qualifications

Must-Have Technical Expertise

5+ years in applied ML/AI research or engineering, with at least 2 years focused on LLM or Transformer model work.

Strong background in PyTorch, Hugging Face Transformers, and tokenizers. Hands-on with both decoder (Llama-family) and encoder (BERT-family) architectures.

Proven ability to adapt open-source models to real-world, production-grade tasks.

Experience with distributed training using Ray / Ray Train, DeepSpeed, or Hugging Face Accelerate.

Working knowledge of an inference-serving stack (vLLM, SGLang, TGI) and of liteLLM for multi-provider routing.

Deep understanding of training efficiency tradeoffs: memory, throughput, and cost optimization.

Proficiency with AI-assisted development tools (Cursor, Claude Code, GitHub Copilot, or similar).

Research & Execution

Ability to balance rapid prototyping with rigorous benchmarking and reproducibility.

Strong analytical skills for experiment design and result interpretation.

Excellent documentation and communication of research findings.

Preferred/Bonus

Parameter-efficient fine-tuning techniques: LoRA, QLoRA, adapters.

Quantization techniques: 4-bit/8-bit inference, AWQ/GPTQ, GGUF/GGML optimizations.

Experience training or distilling domain-specific embedding models on top of BERT / MPNet / E5 backbones.

Multimodal training experience (vision + text for document understanding).

Experience running Ray jobs on KubeRay with gang scheduling and GPU-aware autoscaling.

Familiarity with continuous batching, PagedAttention, and tensor-parallel inference in vLLM.

Experiment tracking with Weights & Biases, MLflow, or similar tools.

Serving stacks: vLLM, TGI, TensorRT-LLM, Ray Serve, SGLang.

Strong Vietnamese and English communication skills.

Benefits

Competitive salary and performance incentives

Work on cutting-edge AI research with real-world applications

Access to compute resources for model training and experimentation

Advanced training and conference attendance opportunities

Flexible work arrangements

A collaborative environment that values research rigor and practical impact

Ready to Join Our Team?

We're excited to meet passionate engineers who want to build the future of AI. Apply now and let's create something amazing together.