Mô tả công việc
Infrastructure Automation & CI/CD
Maintain GitOps practices and CI/CD infrastructure using Jenkins, GitLab, and related tooling.
Design, implement, and maintain CI/CD pipelines for scalable backend and data services.
Automate infrastructure provisioning using tools like Terraform, Terragrunt, or Ansible.
Integrate automated testing, deployment workflows, and rollback strategies to support agile development.
Kubernetes & EKS Orchestration
Build, configure, and maintain Helm charts for deployment automation and cluster app lifecycle.
Manage and optimize Kubernetes clusters on AWS (EKS), including node autoscaling, namespace management, and resource allocation.
Work with developers to containerize services and deploy them reliably in production.
Implement best practices for Kubernetes security, multi- tenant isolation, and cluster upgrades.
DataOps (Nice to have)Support Metabase as the core business intelligence tool, including integration with data sources, permission management, and dashboard reliability.
Integrate and manage data warehouse platforms, optimized for analytical and operational workloads.
Collaborate with data engineers to automate data pipeline deployment using tools like Apache Airflow, ensuring end- to- end scheduling, dependency management, and monitoring of data workflows.
Contribute to the deployment and tuning of data storage solutions (e.g., MinIO, S3) and metadata/catalog tools to enhance discoverability and governance.
Operationalize and maintain databases such as MySQL, PostgreSQL, and streaming/message brokers like Kafka, ActiveMQ, and Redis.
Automate and document backup, restore, failover, and disaster recovery strategies for critical infrastructure and data assets.
Implement and manage transformation pipelines, supporting versioned SQL models, testing, and documentation across environments.
Ensure data quality, lineage, and observability by collaborating on validation frameworks and metrics integration.
Cloud Infrastructure Management
Architect and manage cloud infrastructure primarily on AWS (including VPC, EC2, EKS, RDS, MSK, ElastiCache).
Drive cost optimization efforts in compute, storage, and networking.
Design high- availability (HA) and fault- tolerant infrastructure for critical backend and data workloads.
Support multi- region deployment patterns and network configuration (e.g., DNS, VPN, routing, load balancing).
Monitoring, Logging & Incident Management
Lead incident response and root cause analysis for system failures.
Define and enforce SLOs/SLAs for chatbot uptime and response time.
Set up monitoring tools (Prometheus, Grafana, ELK, or Datadog) for proactive alerting.
Setup Logging Centralized using EFK (ElasticSearch, Fluentbit, Kibana)
Security & Compliance
Perform ad- hoc DevOps tasks as required, including emergency patches, incident support, or rapid deployment of security updates.
Support compliance efforts for data protection (GDPR, SOC2) in chatbot data pipelines.
Ensure best practices in infrastructure security (IAM, VPC, secrets management).