Senior DevOps / Site Reliability Engineer (SRE) - Upto $4000
Mô tả công việc
Operational Excellence & Incident Management
• Global Incident Response (24/7): Orchestrate a "Follow- the- Sun" support model by leveraging the hybrid team structure.
• Production Stability: Ensure high availability and reliability of our AI- driven skin health products.
In- house: Handle critical, complex Level 3 incidents and root cause analysis.
External/Partners: Manage Level 1/Level 2 monitoring, alert triage, and off hour coverage.
• Monitoring: Define and track SLIs, SLOs, and SLAs. Implement comprehensive observability dashboards (metrics, logs, traces).
Infrastructure & Platform Engineering (LLMOps/MLOps)
• Environment Strategy: Architect and maintain robust environments (Dev, Staging, Prod) tailored for distinct needs:
Product Teams: Seamless CI/CD pipelines for web/mobile apps.
AI Teams: Specialized AIOps/MLOps pipelines for model training, fine tuning, and inference.
• DataOps: Build and maintain scalable data pipelines ensuring high throughput for image processing and health data analysis.
Cloud Resource & GPU Management
• Cloud Architecture: Manage cloud infrastructure (AWS/GCP/Azure) using Infrastructure as Code (Terraform/Pulumi).
• Cost & Resource Optimization: Lead capacity planning for GPU allocation and optimization in particular for cost- effective model training and inference.
Implement FinOps practices to provide visibility into cloud spend and resource utilization
Security & Corporate IT (Global Scope)
• Data Security: Act as the primary owner of Data System Security. Ensure compliance with health data standards (e.g., HIPAA, GDPR).
• Office Network & IT: Oversee the design and security of office networks and IT infrastructure across our three global locations: USA, France, and Vietnam.
Technical Coordination & Vendor Standards
• Vendor Technical Oversight: Act as the technical expert to monitor MSPs and external contractors. Define technical SLAs, evaluate their delivery quality, and ensure they meet our system&039;s reliability requirements.
• Operational Integration: Ensure external partners follow our security and infrastructure- as- code (IaC) practices, maintaining a seamless "One Team" workflow
• Knowledge Sharing & Standards: Set high technical standards for the internal DevOps/SRE team; mentor junior engineers and drive a culture of "Automate Everything" across regions (Vietnam & France).
The Tech Stack
• IaC & CI/CD: Terraform, Ansible, GitHub Actions / GitLab CI, ArgoCD.
• Observability: Prometheus, Grafana, ELK Stack / Datadog.
• Core: Kubernetes (K8s), Docker, Linux.
• Security: IAM, VPNs, Firewalls, Secret Management (Vault).
• Cloud: AWS / GCP.
• AI/Data: LLMOps tools (e.g., Kubeflow, Ray, ClearML, ...), GPU orchestration (NVIDIA tools), Vector Databases.
Yêu cầu công việc
• Experience
5+ years in DevOps/SRE
• Technical Mastery:
Deep expertise in Kubernetes administration and troubleshooting.
Experience with LLMOps /MLOps/workflows (managing GPU clusters is a huge plus).
Strong background in Python or Go scripting.
• Security Mindset:
Ability to work with maturity and discretion with sensitive data
Proactive attitude toward managing risks
Previous experience securing healthcare data or PII is highly desirable. Ability to design secure access workflows for external collaborators.
• Entrepreneur mindset:
Ability to work autonomously in a startup environment.
High ownership and self- starter mindset
Ability to connect business needs and tech requirements
• Communication:
Language: Fluent English is mandatory for daily communication with US/France teams and international vendors.
Excellent communication skills to explain infrastructure constraints to Product/AI teams.
• Hybrid Team Experience: Proven experience managing mixed teams (in- house and outsourced/offshore) is a strong plus.
Quyền lợi
• Salary: upto $4000
• Leadership Opportunity: Define the engineering culture and build your own team from the ground up in Vietnam.
• Competitive Package: Attractive salary, stock options, and benefits.
• International Exposure: Daily collaboration with top- t ier talent in US and France.
• Global Impact:Work on products that genuinely improve people&039;s health and confidence.
• Cutting- edge Tech:Get your hands dirty with the latest in Large Language Models and Computer Vision infrastructure.
Cập nhật gần nhất lúc: 2026-02-03 18:25:03
CÔNG TY TNHH BELLE ASIA
Bí kíp tìm việc an toàn
Tiện ích hỗ trợ bạn
Việc làm đề xuất liên quan
Hiện tại chúng tôi chưa có việc làm đề xuất phù hợp với bạn.








