Site Reliability Engineer
3 weeks ago
● Work with stakeholders such as product owners and Engineering to define
service level objectives (SLOs) for system operations.
● Track performance against SLOs in partnership with monitoring teams or other
stakeholders, and ensure systems continue to meet SLOs over time.
● Create dashboards and reports to communicate key metrics.
● Create software to improve performance, scalability, and stability of systems.
● Collaborate with development teams to promote the concept of reliability
engineering during all phases of the software development lifecycle to detect and
correct performance issues and meet availability goals.
● Design, code, test, and deliver infrastructure software to automate manual
operational work (i.e., "toil").
● Participate in operational support and on-call rotation shifts for supported
systems and products.
● Conduct blameless post mortems to troubleshoot priority incidents.
● Perform analytics on previous incidents to understand root causes and better
predict and prevent future issues.
● Use automation to reduce the probability and/or impact of problem recurrence.
● Identify, evaluate, and recommend monitoring tools and diagnostic techniques to
improve system observability.
● Participate in system design consulting, platform management, capacity planning
and launch reviews.
● Collaborate and share lessons learned regarding performance and reliability
issues with all stakeholders including developers, other SREs, operations teams,
and project management teams.
● Participate in communities of practice to share knowledge and foster continuous
improvement.
● Remain current on site reliability engineering methods and trends such as
observability-driven development and chaos engineering.
● Drive continuous improvement in software quality and infrastructure reliability and
resilience.
● Oversee, design, implement, and manage DevOps capabilities using continuous
integration/continuous delivery toolsets and automation.
● SRE engineer will focus on Application Performance Monitoring (APM) including
Design, Solution, POC, profiling and tuning application compute and data nodes
and resources. Some key duties of this role are:
● Assist in defining SRE and Observability architecture, design
● Analyze, Implement new features of SRE and Observability Platform
● Full stack monitoring across all layers
(Infrastructure/Network/Database/Application/Services/Third Party)
● Provide technical hands-on leadership in commercial and Open
source/commercial monitoring Tool selection Implementation.
● Implement SRE driven automated Incident Detection -> automated Engagement
–> Triage/Mitigate – RCA/Postmortems -> Problem task Remediation.
● AI Driven Correlation, De-duplication Noise Reduction and Auto Remediation
● Provide weekly monitoring and alert analysis and continuous improvement
● Create a model of the run-time environment (discovery)
● Profile the performance and behavior of user-defined transactions
● Establish Performance metrics from each of the applications/systems technical
components (Webserver, App server, Database, etc.)
● Application performance management database
● APM tool Administration and Support
● Monitoring Tool design and implementation
● APM Setup/Usage policies and guidelines
● Capacity Planning and monitoring
● Monitor selected application performance
● Report vital statistics of application performance in production
● Make recommendations for improvements with Service Desk
● Make recommendations for adjustments to runtime resources to improve overall
performance profile
KEY QUALIFICATION & EXPERIENCES:
● Strong problem solving and analytical skills.
● Strong interpersonal and written and verbal communication skills.
● Highly adaptable to changing circumstances. Interest in continuously learning
new skills and technologies.
● Experience with programming and scripting languages (e.g. Java, C#, C++,
Python, Bash, PowerShell).
● Experience with incident and response management.
● Experience with Agile and DevOps development methodologies.
● Experience with container technologies and supporting tools (e.g. Docker
Swarm, Podman, Kubernetes, Mesos).
● Experience with working in cloud ecosystems (Microsoft Azure AWS, Google
Cloud Platform,).
● Experience with monitoring and observability tools (e.g. Splunk, Cloudwatch,
AppDynamics, NewRelic, ELK, Prometheus, OpenTelemetry).
● Experience with configuration management systems (e.g. Puppet, Ansible, Chef,
Salt, Terraform).
● Experience working with continuous integration/continuous deployment tools
(e.g. Git, Teamcity, Jenkin, Artifactory).
● Experience in GitOps based automation is Plus
● Bachelor's degree (or equivalent years of experience).
● 5+ years of relevant work experience. SRE experience preferred.
● Background in Manufacturing, Platform/Tech compnies is preferred.
● Must have Public Cloud provider certifications (Azure, GCP or AWS)
● Having CNCF certification is plus
OTHER INFORMATION
Travel: as required.
The job is primarily performed in a Hybrid office environment.
-
Site Reliability Engineer
2 weeks ago
Gurgaon, Haryana, India myGwork Full timeThis job is with Synechron, an inclusive employer and a member of myGwork – the largest global platform for the LGBTQ+ business community. Please do not contact the recruiter directly. Overall Summary:We are seeking a skilled and experienced SRE Engineer to join our team. The ideal candidate will...
-
Site Reliability Engineer
2 days ago
Gurgaon, Haryana, India Karix Full timeRole: Site Reliability Engineer (L2 Support)Location: Gurgaon (WFO)About the role: We are seeking an experienced professional Site Reliability Engineer who acts as a bridge between development and IT operations, taking operational tasks to ensure the efficient functioning of Service platforms. They are responsible for monitoring, automating, and improving...
-
Lead Site Reliability Engineer
4 weeks ago
Gurgaon, Haryana, India Bright Vision Technologies Full timeExciting Opportunity for Lead Site Reliability Engineer - H1B Sponsorship for 2025 at Bright Vision Technologies Join the Bright Vision Technologies Team: Where Innovation Meets Opportunity www.bvteck.com As we approach the 2025 H1B filing season, we are excited to offer a unique opportunity for talented professionals like you to work with our direct...
-
Site Reliability Engineer
3 weeks ago
Gurgaon, Haryana, India Impetus Full timeJob DescriptionExperience in developing and maintaining various CI/CD pipelines using tool like Jenkins.Experience with builds and development (L3 level) of programing languages like Java/Go.Experience with SQLsExperience in working micro services based architecture and development.Experience with SRE (Site Reliability Engineering) and L1/L2 production...
-
Senior Site Reliability Engineer
1 week ago
Gurgaon, Haryana, India Majid Al Futtaim Full timeSite Reliability Engineer ( SRE III)About us ::Majid Al Futtaim is an Emirati-owned, diversified lifestyle conglomerate operating across the Middle East, Africa and Asia. The Group started from one man's vision to transform the face of shopping, entertainment, and leisure to 'Create Great Moments For Everyone, Everyday'.Founded in 1992, we're pioneers in...
-
Senior Site Reliability Engineer
3 weeks ago
Gurgaon, Haryana, India Majid Al Futtaim Full timeSite Reliability Engineer ( SRE III)About us ::Majid Al Futtaim is an Emirati-owned, diversified lifestyle conglomerate operating across the Middle East, Africa and Asia. The Group started from one man's vision to transform the face of shopping, entertainment, and leisure to 'Create Great Moments For Everyone, Everyday'.- Founded in 1992, we're pioneers in...
-
Principal Site Reliability Engineer
2 weeks ago
Gurgaon, Haryana, India myGwork Full timeThis job is with S&P Global, an inclusive employer and a member of myGwork – the largest global platform for the LGBTQ+ business community. Please do not contact the recruiter directly. About the Role: Grade Level (for internal use): 11 The role: Principal Site Reliability Engineer The team:...
-
Site Reliability Engineer
1 week ago
Gurgaon, Haryana, India myGwork Full timeThis job is with S&P Global, an inclusive employer and a member of myGwork – the largest global platform for the LGBTQ+ business community. Please do not contact the recruiter directly. About the Role:Grade Level (for internal use):09 S&P Global - Mobility The Role: Site Reliability...
-
Site Reliability Engineer
2 weeks ago
Gurgaon, Haryana, India Grizmo Labs Private Limited Full timeKey Responsibilities :- Design, build, and maintain highly available, scalable, and resilient AWS infrastructure.- Automate infrastructure provisioning and management using tools like Terraform, CloudFormation, and Ansible.- Implement and maintain robust monitoring and alerting systems using tools like Prometheus, Grafana, CloudWatch, and Datadog. - Respond...
-
Principal Site Reliability Engineer
2 weeks ago
Gurgaon, Haryana, India myGwork Full timeThis job is with S&P Global, an inclusive employer and a member of myGwork – the largest global platform for the LGBTQ+ business community. Please do not contact the recruiter directly. About the Role:Grade Level (for internal use):11The role: Principal Site Reliability EngineerThe team: Automotive...
-
Senior Site Reliability Engineer
3 weeks ago
Gurgaon, Haryana, India Majid Al Futtaim Full timeAbout Us Majid Al Futtaim is an Emirati-owned, diversified lifestyle conglomerate operating across the Middle East, Africa and Asia. The Group started from one man's vision to transform the face of shopping, entertainment, and leisure to 'Create Great Moments For Everyone, Everyday'. Our Business We operate 25 shopping malls, 13 hotels, and 4 mixed-use...
-
Site Engineer
1 week ago
Gurgaon, Haryana, India Raahgiri Foundation Full timeAbout the Role: As a Site Engineer, you will be a vital part of our on-the-ground operations, overseeing the successful execution of projects that directly impact the lives of people in Gurugram and beyond. You'll manage contractors and staff, ensuring that construction activities meet our high standards of quality, safety, and efficiency. Responsibilities:...
-
Engineer - Reliability
2 weeks ago
Gurgaon, Haryana, India United Airlines Full timeWe have a wide variety of career opportunities around the world - come find yours Technical Operations MaintenanceTechnical Operations includes the maintenance and overhaul of our aircraft This includes aircraft maintenance technicians engineers planners ground equipment and facilities teams supply chain teams and more Job overview and...
-
Site Engineer
2 weeks ago
Gurgaon, Haryana, India Raahgiri Foundation Full timeAbout the Role:As a Site Engineer, you will be a vital part of our on-the-ground operations, overseeing the successful execution of projects that directly impact the lives of people in Gurugram and beyond. You'll manage contractors and staff, ensuring that construction activities meet our high standards of quality, safety, and efficiency.Responsibilities:The...
-
Reliability Engineering Specialist
4 days ago
Gurgaon, Haryana, India Impetus Full timeJob DescriptionWe are seeking a skilled Site Reliability Engineer to join our team at Impetus. As a key member of our SRE team, you will be responsible for designing, implementing, and maintaining scalable and highly available systems.Responsibilities:Design and implement reliable and efficient CI/CD pipelines using Jenkins and other relevant tools.Develop...
-
Site Engineer
1 week ago
Gurgaon, Haryana, India Raahgiri Foundation Full timeAbout the Role: As a Site Engineer, you will be a vital part of our on-the-ground operations, overseeing the successful execution of projects that directly impact the lives of people in Gurugram and beyond.You'll manage contractors and staff, ensuring that construction activities meet our high standards of quality, safety, and efficiency.Responsibilities:...
-
Reliability Engineer
3 days ago
Gurgaon, Haryana, India United Airlines Full timeJob Overview: As a member of the Technical Operations Reliability team, you will play a crucial role in ensuring United Airlines operates safely and dependably. Your primary responsibility will be to analyze aircraft defects and operational disruptions, identifying root causes and recommending corrective actions to Engineering and Fleet Management groups.Key...
-
Service Reliability Engineer
4 days ago
Gurgaon, Haryana, India BT Global Full timeWhy this role matters : As part of a contract with one of the largest telephone operators and mobile network providers in the world, this role will be working on behalf of Telefonica, and is responsible for the planning and allocation of engineering visits to customer mobile cell sites. Telefonicas network includes 14K base stations, 7 transmission sites and...
-
Reliability Engineer Manager
5 days ago
Gurgaon, Haryana, India ANSR Summit Consulting Full timeAt ANSR Summit Consulting, we are looking for an experienced Reliability Engineer Manager to join our team. In this role, you will be responsible for designing and implementing reliable systems, processes, and solutions that meet the highest standards of quality and efficiency.Main Responsibilities:Collaborate with cross-functional teams to identify and...
-
MEP Site Engineer
2 weeks ago
Gurgaon, Haryana, India BuildMyInfra Full timeJob DescriptionJob Title:* MEP Site Engineer HVAC & FirefightingJob Responsibilities:- Supervise and manage MEP (HVAC & Firefighting) site activities.- Coordinate with vendors, clients, and project teams to ensure seamless execution.- Prepare and review BOQs, ensuring cost-effective procurement and execution.- Monitor project progress, ensuring compliance...