
Site Reliability Engineer III
3 weeks ago
As a Observability Engineer under Site Reliability Engineering Team, you will be a crucial part of the team responsible for the availability, performance, and scalability of our cloud platform. You will blend software engineering and systems administration expertise to build and run large-scale, distributed, fault-tolerant systems. Your mission is to ensure our services are reliable and efficient through automation, robust monitoring, and proactive incident response. You will work closely with development teams to build resilient and scalable applications on our Google Cloud Platform (GCP) and Kubernetes-based infrastructure. Having a Strong troubleshooting skills and a methodical approach to problem-solving is a MUST.
Key Responsibilities
Infrastructure as Code (IaC): Design, build, and maintain our core cloud infrastructure on GCP using tools like Terraform and Google Config Connector (KCC) within a GitOps framework.
Automation: Utilize Infrastructure as Code (IaC) with Kubernetes (GKE) and Google Config Connector (KCC), Develop automation scripts and tools (primarily in Python or Go) to reduce operational toil, streamline deployments, and improve system efficiency.
Observability: Implement and manage comprehensive monitoring, logging, and alerting solutions using tools like Prometheus, Open Telemetry, Grafana, and Google Cloud's operations suite to gain deep insights into system health.
Reliability & SLOs: Define, measure, and monitor Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for critical services. Drive initiatives to meet and exceed these objectives. Develop & promote dashboarding, and actionable alerting across the organization.
Incident Management: Participate in an on-call rotation to respond to and resolve production incidents. Lead blameless post-mortems to identify root causes and implement lasting solutions.
Collaboration: Partner with software engineering teams throughout the development lifecycle to provide guidance on building reliable, scalable, and secure applications. Help them troubleshoot complex issues, improve service performance, and adopt observability best practices.
Enhance Reliability: Analyze observability data to identify trends, uncover potential issues, and drive initiatives to improve system reliability, performance, and cost-efficiency.
Secure and Scale: Manage secrets and system configurations securely using Hashi Corp Vault and ensure the observability platform scales to meet the demands of a growing engineering organization.
Qualifications Required
- Bachelor's degree in computer science, a related technical field, or equivalent practical experience.
- 3-8 years of experience in a Site Reliability, DevOps, or Software Engineering role.
- Strong proficiency in at least one high-level programming language (e.g., Python, Go, Java).
- Hands-on experience with cloud platforms, particularly Google Cloud Platform (GCP).
- Solid understanding and practical experience with containerization (Docker) and orchestration (Kubernetes).
- Experience with Infrastructure as Code (IaC) tools such as Terraform, Ansible, or Google Config Connector.
- Familiarity with CI/CD principles and tools (e.g., GitLab CI, Jenkins...)
- Knowledge of GitOps principles and tools
- Excellent communication skills and the ability to work effectively in a collaborative team environment.
CME Group: Where Futures are Made
CME Group is the world's leading derivatives marketplace. But who we are goes deeper than that. Here, you can impact markets worldwide. Transform industries. And build a career by shaping tomorrow. We invest in your success and you own it – all while working alongside a team of leading experts who inspire you in ways big and small. Problem solvers, difference makers, trailblazers. Those are our people. And we're looking for more.
At CME Group, we embrace our employees' unique experiences and skills to ensure that everyone's perspectives are acknowledged and valued. As an equal-opportunity employer, we consider all potential employees without regard to any protected characteristic.
Important Notice:
Recruitment fraud is on the rise, with scammers using misleading promises of job offers and interviews to solicit money and personal information from job seekers. CME Group adheres to established procedures designed to maintain trust, confidence and security throughout our recruitment process. Learn more here.
-
Site Reliability Engineer III
3 weeks ago
Bengaluru, India JPMorganChase Full timeJOB DESCRIPTION There's nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems. As a Site Reliability Engineer III at JPMorgan Chase within the Employee Platforms team, you will solve complex and broad business...
-
Site Reliability Engineer III
2 weeks ago
Bengaluru, Karnataka, India JPMorganChase Full time ₹ 20,00,000 - ₹ 25,00,000 per yearJOB DESCRIPTIONThere's nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems.As a Site Reliability Engineer III at JPMorgan Chase within the Employee Platforms team, you will solve complex and broad business...
-
Site Reliability Engineer III
3 weeks ago
Bengaluru, India Guidewire Software Full timeSummaryAt Guidewire, we deliver the software that Property and Casualty (P&C) insurance companies rely on to protect their customers during crises, natural disasters, accidents, and cyber risks. Our core applications enable insurers to sell and underwrite policies, settle claims, and bill their customers. We also offer a suite of innovative products for data...
-
Site Reliability Engineer III
2 weeks ago
Bengaluru, Karnataka, India Guidewire Software Full time ₹ 12,00,000 - ₹ 36,00,000 per yearSummaryAt Guidewire, we deliver the software that Property and Casualty (P&C) insurance companies rely on to protect their customers during crises, natural disasters, accidents, and cyber risks. Our core applications enable insurers to sell and underwrite policies, settle claims, and bill their customers. We also offer a suite of innovative products for data...
-
Bengaluru, India Chase Bank Full timeJob Description There's nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems. As a Site Reliability Engineer III at JPMorgan Chase within the Commercial & Investment Bank, youwill solve complex and broad...
-
Site Reliability Engineer III
1 week ago
Bengaluru, Karnataka, India CME Group Full time ₹ 15,00,000 - ₹ 28,00,000 per yearAs a Observability Engineer under Site Reliability Engineering Team, you will be a crucial part of the team responsible for the availability, performance, and scalability of our cloud platform. You will blend software engineering and systems administration expertise to build and run large-scale, distributed, fault-tolerant systems. Your mission is to ensure...
-
Site Reliability Engineer III
4 weeks ago
Bengaluru, India Guidewire Software Full timeJob Description Job Description - We are seeking a Site Reliability Engineer III who is eager to contribute to the transformation of the insurance industry with our leading cloud platform. As a member of the SRE-Application team, youll play a critical role in ensuring the reliability, performance, and scalability of applications running on our Guidewire...
-
Site Reliability Engineer III
3 weeks ago
Bengaluru, India Vimeo Full timeAs a Site Reliability Engineer, you'll work closely with other SREs and developers to ensure Vimeo remains available, fast, and secure. We own the core infrastructure on which most Vimeo apps sit, including system configuration, basic network services, container orchestration, metrics collection, and load balancing. We're building tools that are used by all...
-
Site Reliability Engineer III
1 week ago
Bengaluru, Karnataka, India Vimeo Full time ₹ 20,00,000 - ₹ 25,00,000 per yearAs a Site Reliability Engineer, you'll work closely with other SREs and developers to ensure Vimeo remains available, fast, and secure. We own the core infrastructure on which most Vimeo apps sit, including system configuration, basic network services, container orchestration, metrics collection, and load balancing. We're building tools that are used by all...
-
Site Reliability Engineer-III
2 weeks ago
Bengaluru, Gurugram, India Rackspace Technology Full time ₹ 12,00,000 - ₹ 36,00,000 per yearSite Reliability Engineer / Observability EngineerPublic Cloud - Offerings and Delivery Workforce Mgmt & Delivery Ops /Full - Time / RemoteRackspace is building up its Professional Services Center of Excellence on Application Performance Monitoring Suites.If you enjoy solving complex business problems and can contribute to building next generation of modern...