SRE - AI_ML Support Engineer

1 day ago


Pune, Maharashtra, India CRUTZ LEELA ENTERPRISES Full time ₹ 5,00,000 - ₹ 17,19,712 per year

SRE - AI_ML Support Engineer – JD

We are hiring a "SRE [Site Reliability Engineer] AI ML Support" engineer for our "Enterprise-grade high-

performance supercomputing" platform. We are helping enterprises and service providers build their AI

inference platforms for end users, powered by our state-of-the-art RDU (Reconfigurable Dataflow Unit)

hardware architecture. Our cloud-agnostic, enterprise-grade MLOps platform abstracts infrastructure

complexity and enables seamless deployment, management, and scaling of foundation model workloads at

production scale. You'll contribute to the core of our enterprise-grade AI platform, collaborating across teams to

ensure our systems are performant, secure, and built to last. This is a high-impact, high-visibility role working

at the intersection of AI infrastructure, enterprise software, and developer experience.

Minimum Requirements:

  • Foundational ML knowledge with hands-on experience working with machine learning models,

especially large language models (LLMs) and LLM APIs

  • Strong programming skills in Python, including working with ML frameworks (PyTorch, Huggingface,

LangChain, etc) as well as building scripts, automation

  • Solid understanding of Generative AI concepts (such as RAG) and applied use cases
  • Exposure to Linux systems and familiarity with troubleshooting environment/setup issues
  • Ability to investigate, triage, and resolve customer or internal issues related to ML workflows, APIs, and

AI-based applications

  • Experience with issue tracking, documentation, and collaboration platforms (e.g., ticketing systems,

project tracking tools, knowledge bases)

  • Proficiency with Docker for containerization and shell scripting for system automation
  • Good communication and collaboration skills to work with cross-functional teams as well as external

customers or stakeholders

Nice to have:

  • Familiarity with multi-modal models (e.g. Llama 4 Maverick)
  • Familiarity with ML Ops practices – monitoring, observability, exposure to related libraries and

frameworks like OpenSearch, Prometheus and Grafana

  • Strong hands-on exposure to Linux system administration and network administration, including

troubleshooting, system monitoring, and optimizing performance

  • Experience working with Kubernetes (on-prem deployments preferred) for managing containerized ML

workloads

  • Exposure to one or more public cloud platforms (AWS, GCP, Azure, etc)
  • Strong customer-facing communication skills to handle escalations, reliability concerns, and solution

discussions with stakeholders and clients in a B2B environment

Ways to stand out from the crowd:

  • Prior experience working with APIs and SDKs of major LLM providers (OpenAI, Anthropic, Hugging

Face, etc)

  • Demonstrated ability to resolve complex issues in production ML systems
  • Knowledge of fine-tuning, prompt engineering, and optimizing LLM usage in production

Job Type: Full-time

Pay: ₹500, ₹1,719,712.72 per year

Benefits:

  • Provident Fund

Work Location: In person


  • SRE support

    22 hours ago


    Pune, Maharashtra, India Virtusa Full time ₹ 15,00,000 - ₹ 25,00,000 per year

    We are seeking a skilled and proactive Site Reliability Engineer (SRE) to join our growing engineering team. The SRE will be responsible for ensuring the availability, performance, scalability, and reliability of our production systems. You will work at the intersection of software development and operations, driving best practices in observability,...

  • SRE Engineer

    7 days ago


    Pune, Maharashtra, India Techno Facts Solutions Full time ₹ 8,00,000 - ₹ 24,00,000 per year

    Role OverviewWe are seeking an experienced Site Reliability Engineer (SRE) with a strong background in automation, monitoring, and performance optimization. The ideal candidate will be proficient in scripting (Python, Bash, ), observability tools, and incident response, ensuring reliability and scalability of enterprise applications.Key...

  • SRE & DevOps Engineer

    2 weeks ago


    Pune, Maharashtra, India METRO Global Solution Center IN Full time ₹ 20,00,000 - ₹ 25,00,000 per year

    Metro Global Solution Center (MGSC) is internal solution partner for METRO, a €31 Billion international wholesaler with operations in more than 30 countries. The store network comprises a total of 623 stores in 21 countries, of which 522 offer out-of-store delivery (OOS), and 94 dedicated depots. In 12 countries, METRO runs only the delivery business by...

  • SRE & DevOps Engineer

    2 weeks ago


    Pune, Maharashtra, India METROMAKRO Full time ₹ 1,04,000 - ₹ 1,30,878 per year

    Company Description Metro Global Solution Center (MGSC) is internal solution partner for METRO, a €31 Billion international wholesaler with operations in more than 30 countries. The store network comprises a total of 623 stores in 21 countries, of which 522 offer out-of-store delivery (OOS), and 94 dedicated depots. In 12 countries, METRO runs only the...

  • SRE Migration engineer

    22 hours ago


    Pune, Maharashtra, India procallisto solutions pvt Full time ₹ 20,40,000 per year

    We are seeking an experienced DevOps Engineer with proven expertise in GitHub to GitLab migration, strong hands-on skills in Python programming, AWS, and Site Reliability Engineering (SRE) practices. The ideal candidate will play a key role in modernizing our CI/CD pipelines, improving cloud infrastructure, and ensuring high system reliability and...


  • Pune, Maharashtra, India Apex One Full time ₹ 1,04,000 - ₹ 1,30,878 per year

    Lead and mentor a team of SRE engineers, fostering a reliability, efficiency, and continuous improvement culture.Develop and execute SRE strategies to enhance our systems and services' reliability, availability, and performance.Designed and implemented observability and monitoring solutions using tools like New Relic, Azure Application Insights, AWS X-Ray,...


  • Pune, Maharashtra, India Northern Trust Full time ₹ 15,00,000 - ₹ 28,00,000 per year

    Role SummaryWe are looking for a Lead UNIX Engineer who brings traditional platform depth along with modern engineering practices. Youll help lead the design, deployment, and lifecycle of mission-critical UNIX platforms (Solaris, AIX, RHEL, Ubuntu) and services, while contributing to DevOps tooling, Git-driven workflows, hybrid cloud strategies with an...

  • SRE- IAM

    22 hours ago


    Pune, Maharashtra, India AZGROUPPROD Full time ₹ 10,00,000 - ₹ 25,00,000 per year

    The primary objective of the Site Reliability Engineer (SRE) specializing in One Identity Access Management is to ensure the seamless operation, reliability, and scalability of IAM systems within the organization. This role is critical in maintaining system integrity, optimizing performance, and enhancing security protocols to support the organization's...


  • Pune, Maharashtra, India Barclays Full time ₹ 5,00,000 - ₹ 15,00,000 per year

    Join us as a SRE - Infrastructure Engineer at Barclays, responsible for supporting the successful delivery of Location Strategy projects to plan, budget, agreed quality and governance standards. You'll spearhead the evolution of our digital landscape, driving innovation and excellence. You will harness cutting-edge technology to revolutionise our digital...


  • Pune, Maharashtra, India Apex One Full time ₹ 6,00,000 - ₹ 18,00,000 per year

    Job Overview We are looking for a detail-oriented and experienced Site Reliability Engineer to join our team. The Site Reliability Engineer will be responsible for creating and implementing scalable software solutions in order to meet system and application performance goals. You will also be responsible for troubleshooting system errors and resolving any...