DevOps Engineer (AI Infrastructure)

OOm Pte Ltd
Full Time: $ 4000 - $ 4498 / month

Job Description

Role Overview
You will be responsible for setting up and managing the CI/CD pipelines, infrastructure automation, and cloud environments that power our AI/ML workflows. This role is ideal for someone who thrives in fast-paced environments and is excited by the challenge of enabling scalable AI product delivery.

Key Responsibilities
- Design, implement, and maintain CI/CD pipelines for AI models, APIs, and supporting applications.
- Set up and manage cloud infrastructure (AWS, GCP, or equivalent) with a strong focus on scalability, cost optimization, and security.
- Support containerized environments using Docker and Kubernetes (EKS, GKE, etc.).
- Work closely with AI engineers and software developers to automate data pipelines, model training/deployment, and monitoring.
- Implement and maintain infrastructure as code (IaC) using tools like Terraform or Pulumi.
- Monitor system performance, troubleshoot production issues, and ensure system reliability and uptime.
- Enforce best practices in DevOps, security, versioning, and documentation.

Job Requirements

- 3+ years of DevOps, Site Reliability Engineering, or relevant infrastructure experience.
- Strong hands-on experience with cloud providers (AWS and GCP preferred).
- Solid understanding of CI/CD principles and experience with tools like GitHub Actions, GitLab CI, or Jenkins.
- Experience with Docker, Kubernetes, and container orchestration.
- Familiarity with IaC tools such as Terraform, CloudFormation, or Pulumi.
- Working knowledge of networking, security, and access control in cloud environments.
- Exposure to machine learning or AI deployment workflows is a strong plus.
- Comfortable collaborating with cross-functional teams including data scientists, backend engineers, and product managers.

Nice to Have
- Experience deploying AI/ML pipelines with tools like MLflow, Airflow, or Kubeflow.
- Understanding of GPU/TPU setup and auto-scaling strategies for training/inference workloads.
- Monitoring and logging using Prometheus, Grafana, CloudWatch, or similar tools.

Why Join Us
- Work on real AI products with tangible impact.
- Autonomy to shape and optimize our AI infrastructure.
- A collaborative and ambitious team, with leadership open to innovation and experimentation.
- Opportunities for growth and cross-disciplinary exposure across AI, web, and product development.


Work Location

1 GRANGE ROAD ORCHARD BUILDING, 239693