Engineering
Featured

Senior Data Platform Engineer

Hybrid · Ho Chi Minh City, Vietnam
Full Time
5+ years

Build and operate production data orchestration platforms. Design metadata-driven pipelines, implement data transformations at scale, and integrate enterprise data systems.

About the Role

We are seeking an experienced Data Platform Engineer to build and operate production-grade data orchestration infrastructure. You will design metadata-driven pipelines, implement large-scale data transformations, and integrate diverse enterprise data sources into a unified platform. In this role, you will own the data platform layer, from ingestion and transformation to serving APIs. You'll work at the intersection of data engineering, backend systems, and AI infrastructure, building the foundation that powers intelligent applications.

Key Responsibilities

Data Orchestration & Pipeline Development

Design and implement production data pipelines using Dagster for asset-first orchestration, with Airflow or Prefect where appropriate.

Build metadata-driven pipeline architectures with YAML-based configuration and dynamic asset generation.

Implement data transformation layers using dbt for SQL-based transformations and data modeling against Snowflake, Databricks, or BigQuery.

Create reusable pipeline components for ingestion, transformation, and data quality validation.

Integrate with enterprise data sources (databases, APIs, data lakes, streaming systems) using Airbyte connectors and custom adapters.

Cloud Data Platforms

Build and operate pipelines against modern cloud data platforms (Snowflake, Databricks lakehouse, BigQuery) with cost-aware design.

Implement Delta Lake / Iceberg table patterns (bronze / silver / gold) with schema evolution and partitioning discipline.

Design Airbyte ingestion topologies with normalisation, incremental syncs, and failure-recovery strategies.

Model data warehouse layers for analytical and AI consumption, balancing compute and serving cost.

Data Infrastructure & Integration

Build and maintain data catalog integrations for metadata discovery and governance.

Design and implement automated REST API generation from data warehouse views (PostgREST, Hasura) with clean contracts.

Develop connectors and adapters for diverse data sources and destinations.

Implement Row-Level Security (RLS) and data access control policies.

Build document ingestion pipelines for RAG systems including parsing, chunking, and embedding generation.

Production Operations

Own production reliability for data pipelines with monitoring, alerting, and incident response.

Optimize pipeline performance for scale (50TB+ data volumes, sub-second latency targets).

Implement data quality frameworks with automated testing and validation.

Design incremental processing strategies for efficient large-scale data updates.

Collaborate with DevOps on deployment automation and infrastructure provisioning.

Qualifications

Must-Have Technical Expertise

5+ years of data engineering experience, with 2+ years building production orchestration platforms.

Expert-level Python for data engineering workloads and pipeline development.

Deep experience with Dagster (preferred), Airflow or Prefect for asset-first orchestration.

Hands-on experience with at least one cloud data platform: Snowflake, Databricks, or BigQuery.

Production experience with Airbyte (or an equivalent ingestion tool such as Fivetran / Stitch) for connector-based extraction.

Strong SQL skills and experience with dbt for data transformation and modeling.

Strong REST API design intuition: contracts, versioning, idempotency, pagination.

Production experience with PostgreSQL, data warehouses, and database optimization.

Proficiency with AI-assisted development tools (Cursor, Claude Code, GitHub Copilot, or similar).

Platform & Integration Skills

Experience with metadata management and data catalog systems.

Understanding of REST API design and automated API generation patterns (PostgREST, Hasura).

Familiarity with vector databases and embedding pipelines for AI applications.

Experience with containerization (Docker) and orchestration basics (Kubernetes).

Strong understanding of data governance, access control, and compliance requirements.

Preferred/Bonus

Experience with Dagster components, assets, sensors, resources, and asset checks.

Deep work with Databricks (Unity Catalog, Delta Lake, MLflow) or Snowflake (Snowpark, Dynamic Tables).

Experience building reusable Airbyte connectors or contributing to the Airbyte connector framework.

Experience with document processing pipelines (OCR, parsing, chunking strategies).

Familiarity with DSPy or LLM integration in data pipelines.

Strong Vietnamese and English communication skills.

Benefits

Competitive salary and performance incentives

Work on cutting-edge data infrastructure projects

Advanced training in modern data engineering technologies

Flexible work arrangements

A collaborative, innovative engineering team environment

Ready to Join Our Team?

We're excited to meet passionate engineers who want to build the future of AI. Apply now and let's create something amazing together.