SRE & DevOps Engineer (ML/AI Platform)
3 days ago
SRE & DevOps Engineer (ML/AI Platform) Contract Position | Global E-Commerce Leader | HybridAbout the Opportunity We're partnering with aleading global e-commerce companyto find an exceptional SRE & DevOps Engineer to join their AI Platform Team. This is your chance to shape the future of machine learning infrastructure that powers innovation for millions of users worldwide. As part of this transformative role, you'll support cutting-edge AI platforms and services, working alongside researchers, data scientists, and engineering teams in a purpose-driven, inclusive environment.What You'll Do Platform Operations & Support Support next-generation AI architecture for research and engineering teams Partner with vendors and infrastructure teams to ensure security and 99.999% service availability Diagnose and resolve production issues, including performance and functional challenges Provide technical support to customers and document solutions DevOps & Automation Design and implement zero-downtime monitoring for highly available services Build CI/CD pipelines for automated deployment and configuration Identify automation opportunities to streamline problem management Develop operational standards for tools, versioning, source control, and deployment practices Continuous Improvement Drive customer service enhancements and recommend product improvements Define engineering excellence and operational maturity standards Conduct customer training and generate insights reports Accelerate team efficiency through automation and knowledge sharingWhat You Bring Required Expertise Should be having 5+ years of experience. Strong Python development skills with data structure, algorithm, experience in designing, building, and releasing production software Hands-on experience with ML frameworks:PyTorch, TensorFlow, Triton Cloud-native technologies:Kubernetes, Docker, Linux DevOps proficiency: CI/CD pipelines, Jenkins, test automation Framework troubleshooting: version upgrades, compatibility management Excellent debugging and triaging capabilities Preferred Skills Experience with AI/ML model training and inference platforms LLM fine-tuning systems knowledge Performance monitoring and application deployment automation#SRE #DevOps #MLOps #AI #MachineLearning #Kubernetes #Python #PyTorch #TensorFlow #CloudEngineering #Hiring #TechJobs #ContractRole
-
SRE / Devops (ML Framework)
3 weeks ago
New Delhi, India ACL Digital Full timeACL Digital (An Alten Group Company) hiring for SRE / Devops (ML Framework).Interested candidates can reach out at dineshkumar.s@acldigital.comExperience: 5+ YearsLocation: Bellandur, BengaluruNotice Period: Less than 2 WeeksKey Responsibilities:- Demonstrated ability in designing, building, refactoring and releasing software written in Python. - Hands-on...
-
Site Reliability Engineer
1 week ago
New Delhi, India Stoopa AI Full timeCompany DescriptionStoopa.AI is building next-generation AI-driven platforms for ports and is focused on reliability, speed, and intelligent automation. As we scale our next generation smart port product Turi, we are hiring our first dedicated SRE/DevOps Engineer to build, optimize, and own our reliability engineering function from the ground up. This is a...
-
Site Reliability Engineer
6 days ago
New Delhi, India Stoopa AI Full timeCompany DescriptionStoopa.AI is building next-generation AI-driven platforms for ports and is focused on reliability, speed, and intelligent automation. As we scale our next generation smart port product Turi, we are hiring our first dedicated SRE/DevOps Engineer to build, optimize, and own our reliability engineering function from the ground up. This is a...
-
Site Reliability Engineer
1 week ago
New Delhi, India Stoopa AI Full timeCompany Description Stoopa.AI is building next-generation AI-driven platforms for ports and is focused on reliability, speed, and intelligent automation. As we scale our next generation smart port product Turi, we are hiring ourfirst dedicated SRE/DevOps Engineerto build, optimize, and own our reliability engineering function from the ground up. This is a...
-
AI Platform Engineer
2 weeks ago
New Delhi, India BayOne Solutions Full timeJob DescriptionWe are seeking a highly skilled AI Platform Engineer to design, build, and operate our next-generation AI application platform. In this role, you will work on advanced AI systems including Retrieval-Augmented Generation (RAG) pipelines, multi-model gateways, Model Context Protocol (MCP) tools, agentic workflow automations (e.g., n8n), and...
-
AI Platform Engineer
1 week ago
New Delhi, India BayOne Solutions Full timeJob Description We are seeking a highly skilledAI Platform Engineerto design, build, and operate our next-generationAI application platform . In this role, you will work on advanced AI systems includingRetrieval-Augmented Generation (RAG)pipelines,multi-model gateways ,Model Context Protocol (MCP) tools ,agentic workflow automations(e.g., n8n), and secure...
-
Senior DevOps Platform Engineer
3 days ago
New Delhi, India apna Full timeJob Title: Senior engineer (SDE-2)– Platform EngineeringLocation:Bengaluru Employment Type:Full-time Team:Platform EngineeringAbout the Role: We are looking for a passionate and hands-on DevOps Engineer to join our Platform Engineering team and accelerate our platform modernization journey. This role is ideal for engineers who thrive in automation-heavy...
-
AI Data Platform Reliability
2 weeks ago
New Delhi, India Oracle Full timeResponsibilities- Design, develop, and execute end-to-end (E2E) scenario validations that simulate real-world usage of complex AI data platform workflows (data ingestion, transformation, ML pipeline orchestration, etc.). - Collaborate closely with product, engineering, and field teams to identify gaps in coverage and propose test automation strategies. -...
-
AI Data Platform Reliability
2 weeks ago
New Delhi, India Oracle Full timeResponsibilities- Design, develop, and execute end-to-end (E2E) scenario validations that simulate real-world usage of complex AI data platform workflows (data ingestion, transformation, ML pipeline orchestration, etc.). - Collaborate closely with product, engineering, and field teams to identify gaps in coverage and propose test automation strategies. -...
-
AI Data Platform Reliability
1 week ago
New Delhi, India Oracle Full timeResponsibilitiesDesign, develop, and execute end-to-end (E2E) scenario validations that simulate real-world usage of complex AI data platform workflows (data ingestion, transformation, ML pipeline orchestration, etc.). Collaborate closely with product, engineering, and field teams to identify gaps in coverage and propose test automation strategies. Develop...