
Staff Site Reliability Engineer
2 weeks ago
Job Description Summary
The Site Reliability Engineer will be responsible for performance and availability of Compute and Network infrastructure consumed by all business segments. The Site Reliability teams are composed of highly talented individuals obsessively focused with availability through operational excellence. The ideal individual is relentlessly technical, passionate for automating everything and totally committed to delivering amazing customer experiences.
GE HealthCare is a leading global medical technology and digital solutions innovator. Our purpose is to create a world where healthcare has no limits. Unlock your ambition, turn ideas into world-changing realities, and join an organization where every voice makes a difference, and every difference builds a healthier world.
Job Description
Roles & Responsibilities:
In This Role, You Will
- Own, manage and adapt effective monitoring and alerting systems for GEHC
- Responsible for developing and managing a single pane of glass that provides for single view of GEHC ecosystem monitoring that includes top critical business applications, Sites and Critical network devices.
- Own, Develop and manage world class monitoring data platform that ingests all the monitoring telemetric data across application / infrastructure with GEHC and integrates with AIOPS platform
- Develop & product manage automated solutions / SAAS products to maintain and optimize the availability and performance of critical business processes / services and to address potential problems in the infrastructure and application ecosystem before they result in a service interruption
- Ensure top critical business applications and their ecosystems are effectively monitored with appropriate alerting mechanisms integrated with event management systems for effective "single Pane of Glass"
- Deliver self-service tools that rely on the monitoring platform / SRE – example, logs, and statistics visualization, monitoring dashboards etc.
- Collaborate closely with product teams – Both Internal GE product teams and Monitoring/AIOPS tool vendors to ensure that the designed solution responds to non-functional requirements such as availability, performance, security, and maintainability. Contribute to SLI, SLO and SLA definition, monitoring, alerting, and reporting efforts.
- Partner and Support other operations teams in investigating root cause of Major P1 and escalated P2 incidents through Monitoring lens
- Establish performance baseline, capacity thresholds, correlate events, and define monitoring/alerting criteria
- Continuously identify patterns for a larger problem solve to avoid repeat issues.
- Stay abreast of latest trends in application and infrastructure monitoring, provisioning, maintenance, and uptime. Learn, prototype, and apply newest tools and best practices in real life to meet the goals of SRE practice
Education Qualification
- Bachelor's Degree in Computer Science or "STEM" Majors (Science, Technology, Engineering and Math) with advanced experience.
- 7+ years of relevant experience in IT Operations/Site Reliability engineering domain and should have demonstrable expertise in architecting, designing, and implementing solutions for Availability and/or Performance
- Comprehensive understanding in application performance monitoring, cloud technologies and ability to design and implement Dynatrace solutions in complex enterprise environments.
- Solid expertise in designing and implementing Dynatrace / Dynatrace extension or managing APM / observability solution.
- Proficient in Dynatrace features, architecture design along with installation, fine-tuning, and implementation experience for various environment (Production, Test, Development and Disaster Recovery)
- Expertise in Dynatrace platform configuration including host grouping, auto tagging, naming rules, management zones, RUM (Real User Monitoring), Synthetics, session properties, request attributes, user tags, log monitoring alert profile, problem notifications, threshold tuning, & setting up Integrations with other monitoring tools and ServiceNow.
- Experience in implementing and configuring Dynatrace tools, set up synthetic and transaction monitoring, ensure comprehensive infrastructure and application monitoring
- Create custom extension in Dynatrace using shell, Python and batch script based on rest API and logs.
- Setting up Dynatrace extension configurations, Dashboards (including business), Infrastructure, Analytics, Observability logs, metrics data collection and interpret the same.
- Proficiency in Dynatrace Query Language (DQL) , creating custom dashboards as required
- Establish and foster visible architectural principles and practices to build reusable designs and systems that promote reliability, velocity, scale, security, and efficiency
- Understand and improve applications and plan for faster MTTD, MTTR, auto healing
- Understand reliability metrics and enhance automation solutions for auto-healing and incident resolution
- Experience with full-stack troubleshooting skills across network, application, hardware, management fabric, or distributed services layers.
- Exposure and familiarity with Agile & SRE principles, automated deployments and build pipelines
Desired Characteristics
- Excellence in written and verbal communication, presentation, and ability to partner for success across all levels of organization and technical depths.
- Enterprise logging/alerting implementations using Splunk and ELK stack Enterprise APM implementation using Dynatrace, AppDynamics, New Relic etc.
- Excellent knowledge of common operating systems (Unix/Linux, Windows)Strong oral and written communication skills.
- Demonstrated experience scripting or developing software and services for the cloud Ruby, Python, Go, Java, , .NET, etc.
- Extensive knowledge of network protocols (TCP/IP, SNMP, FTP, syslog, TFTP, etc.
- Experience managing version control systems such as Git
- Experience deploying and managing infrastructure on public clouds such as AWS or Azure
- Experience using an automated configuration management system (Terraform, Chef, Puppet, Ansible, Salt, etc.)
- Strong organizational and project management skills
- Strong analytical and problem resolution skills
- Excellent knowledge of Network Management (SNMP, MIB)
- Experience with configuring, customizing, and extending monitoring tools (Datadog, Sensu, Grafana, Splunk, etc.)
- Excellent knowledge of TCP/IP networking, and inter-networking technologies (routing/switching, proxy, firewall, load balancing etc.)
- Knowledge and experience using Analytics Software Packages like Matlab, SAS, JMPro etc. Programming experience with open source scripting and data analysis packages like Python, R is a plus.
- Proactively engages with cross-functional teams to resolve issues and design solutions using critical thinking and analytics skills and best practices by actively incorporating input from various sources
- Strong analytical and strong problem solving skills - effectively evaluates information/data to make decisions; anticipates obstacles and develops plans to resolve
- Continuous improvement oriented – actively generates process improvements; champions and drives change initiatives
- Ability to deliver results in a rapidly changing dynamic environment
- Emotional Intelligence, ability to influence up and out and the ability to work independently
- Must be a team player with a strong desire to win
- Passionate about continuously learning and able to quickly adapt and pivot to win in dynamic environment
- Highly organized and efficient; able to balance competing priorities and execute accordingly
- Strong oral and written communication skills
Inclusion & Diversity
GE HealthCare is an Equal Opportunity Employer where inclusion matters. Employment decisions are made without regard to race, color, religion, national or ethnic origin, sex, sexual orientation, gender identity or expression, age, disability, protected veteran status or other characteristics protected by law.
We expect all employees to live and breathe our behaviors: to act with humility and build trust; lead with transparency; deliver with focus, and drive ownership – always with unyielding integrity.
Our total rewards are designed to unlock your ambition by giving you the boost and flexibility you need to turn your ideas into world-changing realities. Our salary and benefits are everything you'd expect from an organization with global strength and scale, and you'll be surrounded by career opportunities in a culture that fosters care, collaboration and support.
Disclaimer:
GE HealthCare will never ask for payment to process documents, refer you to a third party to process applications or visas, or ask you to pay costs. Never send money to anyone suggesting they can provide employment with GE HealthCare.
**Additional Information*
*Relocation Assistance Provided:
Yes
-
Staff Site Reliability Engineer
20 hours ago
Bengaluru, Karnataka, India Procore Technologies Full time ₹ 15,00,000 - ₹ 20,00,000 per yearJob DescriptionWe're looking for aStaff Site Reliability Engineerto join Procore's Infrastructure Platform division to work on our commercial initiatives. In this role, you'll help build Procore's next-generation construction compute platform for others to build upon, including Procore developers, analysts, partners, and customers.Procore software solutions...
-
Staff Site Reliability Engineer
2 weeks ago
Bengaluru, Karnataka, India Visa Full timeCompany Description Visa is a world leader in payments and technology with over 259 billion payments transactions flowing safely between consumers merchants financial institutions and government entities in more than 200 countries and territories each year Our mission is to connect the world through the most innovative convenient reliable and secure...
-
Staff Site Reliability Engineer
2 days ago
Bengaluru, Karnataka, India Aerospike Full time ₹ 20,00,000 - ₹ 25,00,000 per yearAbout Aerospike Aerospike is the real-time database for mission-critical use cases and workloads, including machine learning, generative, and agentic AI. Aerospike powers millions of transactions per second with millisecond latency, at a fraction of the total cost of ownership compared to other databases. Global leaders, including Adobe, Airtel, Barclays,...
-
Site Reliability Engineer
4 days ago
Bengaluru, Karnataka, India AppHelix Full time ₹ 9,00,000 - ₹ 12,00,000 per yearRole DescriptionThis is a full-time on-site role located in Bengaluru for a Site Reliability Engineer. The Site Reliability Engineer will be responsible for maintaining and improving the reliability of AppHelix's systems. Daily tasks include monitoring system performance, troubleshooting issues, managing infrastructure, and supporting software development....
-
Site Reliability Engineer
4 days ago
Bengaluru, Karnataka, India FIS Full time ₹ 12,00,000 - ₹ 36,00,000 per yearAbout the Role :Site Reliability Engineer (SRE)with deep expertise inMainframe technologies like COBOL, JCL, etc. to support and enhance ourCard Management & Payment processing functions. This role will be responsible for ensuring reliability, high availability, scalability, stability and performance of mission-critical mainframe software applications and...
-
Staff Site Reliability Engineer
6 days ago
Bengaluru, Karnataka, India Visa Full time ₹ 10,00,000 - ₹ 25,00,000 per yearCompany Description Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more than 200 countries and territories each year. Our mission is to connect the world through the most innovative, convenient, reliable, and secure...
-
Staff Site Reliability Engineer
6 days ago
Bengaluru, Karnataka, India Visa Full time ₹ 12,00,000 - ₹ 36,00,000 per yearCompany DescriptionVisa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more than 200 countries and territories each year. Our mission is to connect the world through the most innovative, convenient, reliable, and secure...
-
Site Reliability Engineer
21 hours ago
Bengaluru, Karnataka, India H&M Full time ₹ 15,00,000 - ₹ 25,00,000 per yearJob DescriptionWe are looking for a Site Reliability Engineer within eCommerce with experience of Headless SaaS (e.g., a headless CMS experience) and API based commerce frameworks and managed cloud services (e.g. managed Kubernetes). You will work within our SRE Capability supporting the next generation customer experience by blending fashion and tech. You...
-
Senior Staff Site Reliability Engineer
2 weeks ago
Bengaluru, Karnataka, India Zscaler Full time ₹ 12,00,000 - ₹ 36,00,000 per yearAbout ZscalerServing thousands of enterprise customers around the world including 45% of Fortune 500 companies, Zscaler (NASDAQ: ZS) was founded in 2007 with a mission to make the cloud a safe place to do business and a more enjoyable experience for enterprise users. As the operator of the world's largest security cloud, Zscaler accelerates digital...
-
Site Reliability Engineer
4 weeks ago
Bengaluru, Karnataka, India Enterprise Minds, Inc Full timeWe're Hiring | Site Reliability Engineer | 8-10 years