JOB OVERVIEW
We are looking for a DevOps Engineer who will play a core role in building and operating a modern, scalable DevOps & Data Platform infrastructure that supports hundreds of microservices and data workloads across hybrid cloud (AWS + On- prem) environments.
You will:
● Operate and optimize ETL/ELT data pipelines, ensuring performance, reliability, and automation.
● Develop and maintain CI/CD pipelines (GitLab CI, ArgoCD) across environments.
● Manage and scale Kubernetes clusters (EKS, RKE2) with data- oriented workloads (Airflow, NiFi, Kafka, Spark, Druid, OLTP/OLAP).
● Monitor infrastructure health using Prometheus, Grafana, Loki, ELK, and OpenTelemetry.
● Automate infrastructure provisioning using Terraform, Helm, Ansible, and manage secrets with Vault.
● Collaborate closely with Data Engineering and BI teams to deliver a high- availability, secure, and observable Data Platform.
KEY RESPONSIBILITIES
🛠️ DevOps & Automation:
● Design and maintain standardized CI/CD pipelines (build, scan, test, multi- env deploy).
● Automate operations using scripting (Bash, Python), cronjobs, webhooks, and CLI tools.
● Implement GitOps with ArgoCD for staging, UAT, and production environments.
● Write and maintain Helm charts, YAML templating, and deployment automation.
☁️ Cloud & On- Prem Infrastructure:
● Operate across AWS and on- premises environments (bare- metal, VMware).
● Handle hybrid storage systems: Longhorn, EBS, Ceph, NFS, MinIO.
● Manage Kubernetes clusters (EKS, RKE2), scale infrastructure using autoscalers (Karpenter, Cluster Autoscaler).
● Implement HA/DR strategies, backups/restores with tools like Velero, S3 versioning, snapshots.
🧠 Big Data / Data Platform:
● Build and monitor ETL/ELT pipelines, syncing from OLTP systems (PostgreSQL, MongoDB) to OLAP systems (Redshift, ClickHouse, etc.).
● Ensure data integrity, latency management, and job failure troubleshooting.
● Operate and tune Airflow, Apache NiFi, Kafka, Spark, Druid in production.
🔐 Security & Observability:
● Collaborate with the Security team to ensure CI/CD and data pipelines are secure and compliant.
● Monitor systems with Prometheus, Grafana, Alertmanager, Loki, and integrate with Slack/webhooks.
● Implement IAM, RBAC, security groups, TLS, cert rotation, and secrets management (Vault).
REQUIREMENTS
Must- Have:
● 3+ years of hands- on experience in DevOps, SRE, or Infrastructure Engineering.
● Proven experience with Airflow, Apache NiFi, Kafka, or equivalent in production.
● Strong understanding of ETL/ELT pipelines, OLTP/OLAP architecture
● Automation- first mindset, with a focus on scalability, DR, and efficiency.
● Proficient with Kubernetes (EKS, RKE2), GitLab CI/CD, Helm, Terraform.
● Strong scripting and Linux skills (Bash, Python), with deep understanding of systems and networking.
● Solid knowledge of AWS services: EKS, EC2, S3, IAM, RDS