Manager - Site Reliability (SRE)

3 days ago

Hyderabad, Telangana, India Apple Full time

Imagine what we could do together. At Apple, new ideas have a way of becoming excellent products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish. The people here at Apple don't just build products - they craft the kind of wonder that's revolutionized entire industries. It's the diversity of those people and their ideas that encourages the innovation that runs through everything we do, from amazing technology to industry-leading environmental efforts. Apple's ETS group is looking for a versatile Site Reliability Engineering (SRE) Manager with great technical acumen, strong background in operations, automation, implementation and development. As a Site Reliability Engineering Manager, you will be leading a team responsible for ensuring the availability of high volume, critical enterprise platforms / applications and scale seamlessly. The application range from a broad spectrum of security platforms, anomaly detection, malware and abuse detection and prevention, edge security etc. to name a few and integrations with Apple's supply chain partners such as manufacturers, logistics providers, banks, resellers and business customers.

Description

As a Site Reliability Engineering (SRE) Manager, candidate will be responsible for building, developing, and, retaining a high-performing team of software engineers and build an environment where they can thrive and succeed. While the primary role is leading/managing employees, you should have deep technical knowledge on distributed systems and cloud computing, security platforms and can quickly understand and respond to peer teams' needs. It is also encouraged that you have strong experience working with short release cycles, do not hesitate to : - Actively participate in architectural and functional design, implementation and troubleshooting sessions - Review hardware, software infrastructure and application functionality for identifying and optimizing performance bottlenecks - Drive major incident management to restore order - Spearhead in designing and implementing comprehensive monitoring for applications, integrations and anomalies - Innovate and find opportunities and drive automation efforts across various platform and security applications - Working closely with Cross functional IT organization, Business group, Apple's production support team, application engineers, systems engineers, database administrators and QA team to effectively ensure implementation and reliability of Platforms/Applications - A proven track record with managing, motivating and providing technical guidance to a team of software engineers to draw out their best work will be key to success - Ensuring quality in every deliverable, creative thinking, strong problem solving, and the ability to collaborate with other global cross-functional teams in a fast paced environment will be meaningful attributes to succeed in this role

Minimum Qualifications

At least 12+ years of prior demonstrated experience in a Site Reliability Engineering, DevOps, or an Infrastructure-focused role
3+ years of experience leading and managing high performance SRE teams
Proven track record in leading sophisticated SRE projects, enterprise services at a large scale
Strong analytical, troubleshooting and problem solving skills
Good knowledge in at least one object oriented programming language (preferably Java, Python)
Unix Performance Monitoring & Tuning
Good understanding of Database concepts, PL/SQL and NoSql Technologies
Hands on experience with monitoring and data analysis tools (e.g., Prometheus, Splunk, Grafana, Cloudwatch)
Building and operating container orchestrating systems like Kubernetes or EKS
Deep understanding of security concepts and protocols -authentication, authorization, signing, encryption, SSL/TLS, SSH/SFTP, PKI, X509 certificates and PGP
Good fundamentals on Release Management & continuous Integration
Familiarity with modern web services architectures, cloud platforms such as AWS, GCP, Azure and distributed storage systems (ScaleIO, Amazon S3)
Ability to communicate with large cross-functional teams about various engineering topics such as system architecture, detailed design, APIs, project schedules etc.
Ability to make right trade-off choices when dealing with functional complexity, conflicting priorities and aggressive schedules
Represent the team and remove hurdles to enable each team member to operate at the highest level of efficiency and productivity
Ability to hire, mentor and manage the performance of a large team
Ability to connect with senior executives and business stakeholders
A learning attitude to continuously improve self, team and the organisation
Ability to work under pressure and manage difficult situations in a fast-paced work environment
Bachelor or Masters or equivalent experience in Computer Science or other related field

Preferred Qualifications

Java and JVM technologies runtime configurations and troubleshooting is preferred
Good fundamentals on data modelling and machine learning algorithms
Strong knowledge on securing applications, thorough understanding of OWASP top 10 risks and solutions.

Submit CV

SRE(Site Reliability Engineer)

1 week ago

Hyderabad, Telangana, India Talent Worx Full time ₹ 12,00,000 - ₹ 36,00,000 per year

SRE (Site Reliability Engineer)Talent Worx is seeking a talented SRE (Site Reliability Engineer) to enhance our technology team. In this role, you will be pivotal in ensuring the reliability, performance, and availability of our applications and services. Your work will involve both software engineering and systems operations as you strive to improve...
SRE (Site Reliability Engineer)

6 days ago

Hyderabad, Telangana, India VXI Global Solutions Full time ₹ 15,00,000 - ₹ 30,00,000 per year

It's fun to work in a company where people truly BELIEVE in what they are doingWe're committed to bringing passion and customer focus to the business.Job Summary:We are seeking a skilled SRE Engineer to design, implement, and manage robust observability solutions across our cloud infrastructure and applications. The ideal candidate will have hands-on...
Devops SRE Site Reliability Engineer

2 weeks ago

Hyderabad, Telangana, India Infoshare Systems, Inc. Full time ₹ 20,00,000 - ₹ 25,00,000 per year

Position: DevOps SRELocationHyderabad - hybridDuration: FulltimeOptumRequired Skills & Experience5 years of experience in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles.Strong experience with Linux/Unix systems administration.Proficiency with Python, Go, or Bash scripting.Solid understanding of networking, firewalls, and security...
Site Reliability Engineer/SRE

1 week ago

Hyderabad, Telangana, India NCR Corporation Full time ₹ 12,00,000 - ₹ 36,00,000 per year

Job Title: Site Reliability Engineer IILocation: HyderabadJob Type: Full-Time, 24*7 AvailabilityJob Description:As a Site Reliability Engineer, you will be responsible for ensuring the reliability, performance, and observability of our transaction processing and settlement platforms, while also providing hands-on support for critical business applications....
Senior Site Reliability Engineer

2 weeks ago

Hyderabad, Telangana, India Jade Global Software Pvt Ltd Full time ₹ 12,00,000 - ₹ 24,00,000 per year

Senior Site Reliability Engineer (SRE) – Datadog ObservabilitySenior Site Reliability Engineer (SRE) – Datadog Observability1 Job Title: Senior Site Reliability Engineer (SRE) – Datadog ObservabilityExperience Required: 8+ years overall in SRE and Infrastructure Operations with minimum 3+ years hands-on experience in DatadogLocation: Hyderabad...
Senior Site Reliability Engineer

2 weeks ago

Hyderabad, Telangana, India Jade Global Full time ₹ 12,00,000 - ₹ 24,00,000 per year

Senior Site Reliability Engineer (SRE) – Datadog Observability1Job Title: Senior Site Reliability Engineer (SRE) – Datadog ObservabilityExperience Required: 8+ years overall in SRE and Infrastructure Operations with minimum 3+ years hands-on experience in DatadogLocation: Hyderabad preferable but open for Pune and remoteJob Summary:We are seeking an...
Senior Site Reliability Engineer

2 weeks ago

Hyderabad, Telangana, India Jade Global Full time ₹ 1,00,00,000 - ₹ 2,00,00,000 per year

Job Title: Senior Site Reliability Engineer (SRE) – Datadog ObservabilityExperience Required: 8+ years overall in SRE and Infrastructure Operations with minimum 3 + years hands-on experience in Datadog Location: Hyderabad preferable but open for Pune and remoteJob Summary:We are seeking an experienced Site Reliability Engineer (SRE) to lead end-to-end SRE...
Site Reliability Engineer

1 week ago

Hyderabad, Telangana, India BYLD Group Full time ₹ 12,00,000 - ₹ 36,00,000 per year

DescriptionJob Title :Site Reliability Engineer (SRE) - DataDog / AWS Lambda / DynamoDB / ServerlessLocation :Bangalore / Pune / HyderabadExperience :5- 10 YearsAbout The RoleWe are seeking an experienced Site Reliability Engineer (SRE) with strong expertise in DataDog integration, AWS Lambda, DynamoDB, and Serverless architectures. The ideal candidate will...
Site Reliability Engineering Manager

18 hours ago

Hyderabad, Telangana, India Tata Consultancy Services Full time ₹ 9,00,000 - ₹ 12,00,000 per year

Role : Site Reliability Engineering (SRE)Experience : 5 to 12Location : Chennai, Bangalore, HyderabadKeywords :Kubernetes (CLI), PostgreSQL, SQL, GKE, Google Cloud, Terraform, AnsibleInterview Mode : Weekend Walkin DriveMust-Have :GKE(Preferable); Kubernetes (Any cloud) + PostgresSQL, SQL(Must)Linux (Optional), Java (Optional) , Kubernetes (CLI), Prior...
Site Reliability Engineer

1 week ago

Hyderabad, Telangana, India Technology Next Full time ₹ 20,00,000 - ₹ 30,00,000 per year

Urgently hiring for Site Reliability Engineer (SRE) / Chaos EngineerLocation: HyderabadJob Type: Full-time, PermanentJob Description:We are looking for an experienced Site Reliability Engineer (SRE) with strong Python automation skills (Boto3 required) and hands-on experience in chaos engineering to improve system reliability and resilience. The ideal...

Americas

Europe

Asia / Oceania

Africa

Manager - Site Reliability (SRE)