SRE & MLOps Engineer (Platform Reliability & AI Operations)
3 days ago
About Blue MachinesBlue Machines powers large-scale, real-time Voice AI and Agentic Workflows across BFSI,Healthcare, HRTech, and Global Enterprises.Role: SRE & MLOps Engineer (3–6 Years Experience)Location: Bangalore (Hybrid)What You Will Own1. Platform Uptime & Reliability- Maintain 99.9%+ uptime.- Monitor and optimize latency for voice agents.2. Observability, Monitoring & Incident Response- Build and maintain monitoring dashboards.- Configure alerts; first responder for incidents.3. MLOps & Model Provider Reliability- Monitor STT/TTS/LLM providers.- Manage failovers and latency SLAs.4. Kubernetes & Infrastructure- Manage GKE clusters, autoscaling, deployments.5. Internal Platform Tooling- Build automation around scaling, canaries, logs.6. Security & Compliance- Enforce encryption, network policies, audit support.RequirementsYou Are a Great Fit If You…- 2–5 years SRE/DevOps/MLOps experience.- Strong with Kubernetes, Prometheus, ELK, Redis, Pub/Sub.- Understand streaming, SIP, WebSockets.- Good communication and incident ownership.Preferred Skills- Experience with LLM pipelines, telephony, GPU, GCP.Why Blue Machines- Build India's most advanced Voice AI platform.- High-scale, low-latency engineering.- Work with CTO's office on reliability.
-
AI Data Platform Reliability
1 week ago
bangalore, India Oracle Full timeResponsibilities Design, develop, and execute end-to-end (E2E) scenario validations that simulate real-world usage of complex AI data platform workflows (data ingestion, transformation, ML pipeline orchestration, etc.). Collaborate closely with product, engineering, and field teams to identify gaps in coverage and propose test automation strategies. Develop...
-
AI Data Platform Reliability
2 weeks ago
bangalore, India Oracle Full timeResponsibilitiesDesign, develop, and execute end-to-end (E2E) scenario validations that simulate real-world usage of complex AI data platform workflows (data ingestion, transformation, ML pipeline orchestration, etc.).Collaborate closely with product, engineering, and field teams to identify gaps in coverage and propose test automation strategies.Develop and...
-
Senior Site Reliability Engineer
3 weeks ago
bangalore, India Jade Global Full timeJob Description Job Description Job Title: Senior Site Reliability Engineer (SRE) – Datadog Observability Experience Required: 8+ years overall in SRE and Infrastructure Operations with minimum 3+ years hands-on experience in Datadog Location: Hyderabad preferable but open for Pune and remote Job Summary: We are seeking an experienced Site Reliability...
-
AI Data Platform Reliability
1 week ago
bangalore district, India Oracle Full timeResponsibilities Design, develop, and execute end-to-end (E2E) scenario validations that simulate real-world usage of complex AI data platform workflows (data ingestion, transformation, ML pipeline orchestration, etc.). Collaborate closely with product, engineering, and field teams to identify gaps in coverage and propose test automation strategies. Develop...
-
AI Data Platform Reliability
1 week ago
Bangalore Division, India Oracle Full timeResponsibilities Design, develop, and execute end-to-end (E2E) scenario validations that simulate real-world usage of complex AI data platform workflows (data ingestion, transformation, ML pipeline orchestration, etc.). Collaborate closely with product, engineering, and field teams to identify gaps in coverage and propose test automation strategies. Develop...
-
AI Platform Engineer
1 week ago
bangalore, India BayOne Solutions Full timeJob DescriptionWe are seeking a highly skilled AI Platform Engineer to design, build, and operate our next-generation AI application platform. In this role, you will work on advanced AI systems including Retrieval-Augmented Generation (RAG) pipelines, multi-model gateways, Model Context Protocol (MCP) tools, agentic workflow automations (e.g., n8n), and...
-
AI Platform Engineer
1 week ago
bangalore, India BayOne Solutions Full timeJob Description We are seeking a highly skilled AI Platform Engineer to design, build, and operate our next-generation AI application platform . In this role, you will work on advanced AI systems including Retrieval-Augmented Generation (RAG) pipelines, multi-model gateways , Model Context Protocol (MCP) tools , agentic workflow automations (e.g., n8n), and...
-
SRE Devops Manager
3 days ago
bangalore, India Infinite Computer Solutions Full timeWe are looking for Site Reliability Engineering (SRE) Devops ManagerLocation: Bangalore / Hyderabad / Chennai / Noida / Pune / Visakhapatnam / GurgaonShift timing: regularCan join Immediate - 30 daysInterested candidates, Please share your profiles and below details toEmail ID: Shanmukh.Varma@infinite.comTotal experience:Relevant Experience:Current...
-
Senior AI Data Platform Reliability
1 week ago
bangalore, India Oracle Full timeResponsibilities Key Responsibilities: Design, develop, and execute end-to-end (E2E) scenario validations that simulate real-world usage of complex AI data platform workflows (data ingestion, transformation, ML pipeline orchestration, etc.). Collaborate closely with product, engineering, and field teams to identify gaps in coverage and propose test...
-
Senior AI Data Platform Reliability
2 weeks ago
bangalore, India Oracle Full timeResponsibilitiesKey Responsibilities: Design, develop, and execute end-to-end (E2E) scenario validations that simulate real-world usage of complex AI data platform workflows (data ingestion, transformation, ML pipeline orchestration, etc.).Collaborate closely with product, engineering, and field teams to identify gaps in coverage and propose test automation...