SRE & DevOps Engineer ()

20 hours ago


India N-iX Full time ₹ 12,00,000 - ₹ 24,00,000 per year

N-iX is a global software development service company that helps businesses across the world develop successful software products. Founded in 2002, N-iX has come a long way, expanding its presence across Europe, the US, and Latin America. Today, we are a strong community of 2,000+ professionals and a reliable partner for global industry leaders and Fortune 500 companies. 

Our client is a global commerce leader where you can influence how the world buys, sells, and gives. You'll be part of a work culture that's been genuinely committed to diversity and inclusion since its founding over twenty five years ago. Here, you can be yourself, do your
best work along with a team of professionals, and have a meaningful impact on people across the globe. We seek people with drive, ideas, and a passion for helping small businesses succeed to help.

About the team:  You will join the AI Platform Team, providing highly available, scalable, and automated machine learning infrastructure for researchers and data scientists globally. We are looking for a motivated, self-reliant SRE / DevOps engineer with Python and C++ experience to drive operational excellence, automation, and platform reliability, with a focus on

About the role: This role focuses on maintaining, deploying, and improving AI/ML platform services using , with strong emphasis on DevOps, SRE practices, and automation. You will collaborate closely with developers, researchers, and infrastructure teams to ensure robust, scalable, and highly available distributed ML systems.

Responsibilities:

DevOps tasks (~60%)

  • Design, implement, and maintain CI/CD pipelines for AI/ML platform services. 
  • Manage and troubleshoot Kubernetes clusters, Docker containers, and cloud infrastructure.
  • Ensure high availability %), system reliability, and security across platforms.
  • Automate operational tasks, monitoring, and deployment workflows.
  • Deploy and maintain clusters, ensuring workload scheduling and distributed job reliability.
  • Monitor production systems via Ray Dashboard, CLI tools, and integrate alerting/metrics.
  • Analyze and resolve production issues, performance bottlenecks, and functional problems.
  • Define operational standards, versioning practices, and advise teams on DevOps best practices.
  • Prepare documentation, training materials, and provide technical support to platform users.

Development tasks (~40%):

  • Design, build, and refactor Python and C++ services for workflows.
  • Work with Ray ecosystem libraries such as Ray Train, Ray Tune, Ray Serve, Ray Data.
  • Integrate Ray with tools such as Airflow, MLflow, Dask, DeepSpeed (plus).
  • Work with ML frameworks such as PyTorch, TensorFlow, and Triton.
  • Collaborate with developers to integrate distributed ML pipelines into automated CI/CD workflows.

Requirements:

  • Strong Python and C++ development experience (2–4 years).
  • Hands-on experience with cluster deployment, workload management, distributed task scheduling.
  • Familiarity with Ray ecosystem libraries (Train, Tune, Serve, Data) and integration with ML tooling.
  • Solid understanding of Kubernetes, Docker, Linux fundamentals, and DevOps practices.
  • Experience with CI/CD pipelines (Jenkins or similar), test automation, and monitoring.
  • Strong debugging and triaging skills for distributed systems.
  • Excellent communication and collaboration skills with cross-functional teams.
  • Strong organizational skills to manage multiple projects in a fast-paced environment.
  • Fluent in English (spoken and written).
  • Overall 3-5 years of relevant DevOps / SRE experience.

We offer*:

  • Flexible working format - remote, office-based or flexible
  • A competitive salary and good compensation package
  • Personalized career growth
  • Professional development tools (mentorship program, tech talks and trainings, centers of excellence, and more)
  • Active tech communities with regular knowledge sharing
  • Education reimbursement
  • Memorable anniversary presents
  • Corporate events and team buildings
  • Other location-specific benefits

*not applicable for freelancers


  • DevOps Engineer

    3 weeks ago


    Bengaluru, India 8byte Full time

    Job Description DevOps Engineer 8byte What is the role As a DevOps Engineer, you will be responsible for designing, implementing, and maintaining robust, scalable, and secure infrastructure while optimizing development and deployment processes. This role goes beyond traditional DevOps and aligns closely with Site Reliability Engineering (SRE) principles,...


  • India Jigya Software Services Full time

    Job Title: SRE / DevOps Engineer – Cloud (AWS & Azure)Location: RemoteDepartment: Engineering / DevOpsTE – 9 Yrs+ [6+ years of experience relevant ]Experience: 6+ yearsAbout the RoleWe are looking for a skilled and experienced SRE/DevOps Engineer to join our cloud infrastructure team. The ideal candidate will have deep expertise in AWS and Azure cloud...


  • India Jigya Software Services Full time

    Job Title: SRE / DevOps Engineer – Cloud (AWS & Azure) Location: Remote Department: Engineering / DevOps TE – 9 Yrs+ (6 + years of experience relevant ) Experience: 6+ years About the Role We are looking for a skilled and experienced SRE/DevOps Engineer to join our cloud infrastructure team. The ideal candidate will have deep expertise in AWS and Azure...


  • India iVedha Inc. Full time

    Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice Location: India (Remote) - Must be available to work in the EST (US/Canada) Time Zone. Role Summary:Are you a Senior Site Reliability Engineer (SRE) with deep ELK expertise, ready to take ownership of large-scale observability infrastructure?We're looking for an SRE with 7+...


  • Bengaluru, Karnataka, India, Karnataka Prospance Inc Full time

    SRE & DevOps Engineer (ML/AI Platform)Contract Position | Global E-Commerce Leader | HybridAbout the OpportunityWe're partnering with a leading global e-commerce company to find an exceptional SRE & DevOps Engineer to join their AI Platform Team. This is your chance to shape the future of machine learning infrastructure that powers innovation for millions of...


  • India N-iX Full time ₹ 40,000 - ₹ 80,000 per year

    Looking for a company that inspires passion, courage, and innovation? With our client, you can help shape the future of global commerce, influencing how millions of people buy, sell, connect, and share worldwide. Join a purpose-driven, inclusive team dedicated to making a meaningful impact globally.About the team: We are the AI Platform Team, providing...


  • India MAK Technologies LLC Full time

    Senior SAP SRE ConsultantDuration: 1 YearLocation: IndiaDistribution Effort: 5/Days per weekNo of Positions: 2Remote: YESSalary/per AnnumDescription:SAP for Me is SAP's strategic customer portal, serving as a single, digital entry point for a customer's entire SAP relationship.It provides a personalized and transparent overview of their key assets and...

  • SRE Lead

    4 weeks ago


    Mumbai, Maharashtra, India, Maharashtra SID Global Solutions Full time

    Job Title: SRE LeadExperience Level: ~10 yearsRole Type: Engineering / ReliabilityRole Overview:The SRE Lead is responsible for leading site reliability initiatives across assigned product or platform areas, ensuring systems are scalable, reliable, and performant. This role defines and manages reliability goals, drives operational excellence, and partners...


  • Bengaluru, Karnataka, India, Karnataka ITC Infotech Full time

    We're Hiring! I'm excited to share that we're looking for SRE and DevOps - ML Framework to join our team at ITC Infotech.Below is the JD for your reference.Job Functions: ● You will be a member of our AI Platform Team, supporting the next generation AI architecture for various research and engineering teams within the organization.● You'll partner with...


  • India Zensar Technologies Full time

    **Job Title**: DevOps / Site Reliability Engineer (SRE) **Job Summary**: **Key Responsibilities**: - **Infrastructure & Automation**:Design, implement, and manage scalable infrastructure using **IaC (Terraform, Ansible)**: - **CI/CD Pipelines**:Develop and optimize **continuous integration and continuous deployment**pipelines using tools like **Jenkins,...