Senior SRE 1

2 weeks ago


Hyderabad, Telangana, India Electronic Arts Full time ₹ 2,50,00,000 - ₹ 5,00,00,000 per year
General Information

Locations: Hyderabad, Telangana, India

Role ID

211515

Worker Type

Regular Employee

Studio/Department

CT - IT

Work Model

Hybrid

Description & Requirements

Electronic Arts creates next-level entertainment experiences that inspire players and fans around the world. Here, everyone is part of the story. Part of a community that connects across the globe. A place where creativity thrives, new perspectives are invited, and ideas matter. A team where everyone makes play happen.

Senior SE I / Site Reliability Engineer (SRE)

Job Description

We are seeking an accomplished Senior Site Reliability Engineer (SRE) with 12–15 years of experience to lead the reliability, scalability, and performance engineering of our critical infrastructure and production systems. As a Senior SRE, you will play a strategic and technical leadership role — driving reliability practices, mentoring SRE teams, and influencing the adoption of automation, observability, and resilience engineering across the organization.

You will act as a technical thought leader and hands-on engineer, collaborating with infrastructure, application, and operations teams to build, automate, and scale reliable systems that support global business operations. This role requires deep expertise in cloud platforms, automation, monitoring, incident management, and system design for large-scale distributed environments.

Roles & Responsibilities
1. Reliability Engineering & Automation
  • Architect, implement, and manage resilient, scalable, and highly available infrastructure systems.
  • Lead initiatives to automate manual operations, deployment, and monitoring processes to improve reliability and reduce toil.
  • Drive the creation of observability solutions and dashboards to proactively detect and remediate potential issues.
2. Incident & Problem Management
  • Lead critical incident response, ensuring swift mitigation and clear communication to stakeholders.
  • Conduct detailed root cause analysis (RCA) and drive permanent corrective actions to prevent recurrence.
  • Implement and mature incident management frameworks, including runbooks, playbooks, and post-incident reviews.
3. Infrastructure Operations & Performance Optimization
  • Oversee system performance, capacity planning, and scalability of infrastructure across hybrid and cloud environments (AWS, Azure, GCP).
  • Optimize system resource utilization, latency, and reliability through performance tuning and automation.
  • Work closely with architecture and platform teams to accommodate growth, change, and modernization initiatives.
4. Leadership & Mentorship
  • Provide technical leadership and mentorship to SRE teams and cross-functional engineering groups.
  • Promote an SRE culture across teams — championing principles of reliability, automation, observability, and continuous improvement.
  • Drive collaboration between development, QA, DevOps, and release teams to embed reliability into the software development lifecycle (SDLC).
5. Service Level Management
  • Define, track, and continuously improve Service Level Objectives (SLOs) and Service Level Indicators (SLIs).
  • Apply the Four Golden Signals of SRE monitoring — Latency, Traffic, Errors, and Saturation — to guide system health and performance strategies.
6. Documentation & Knowledge Sharing
  • Establish and maintain comprehensive documentation of systems, operational procedures, and best practices.
  • Facilitate learning through technical sessions, blameless postmortems, and cross-team knowledge sharing.
7. Strategic Technology & Continuous Improvement
  • Contribute to defining the long-term SRE strategy, tooling roadmap, and automation frameworks.
  • Evaluate and adopt emerging technologies, tools, and methodologies to enhance system reliability and efficiency.
  • Partner with business and technical leaders to ensure alignment of SRE objectives with organizational goals.
8. Security & Compliance
  • Collaborate with security and compliance teams to ensure infrastructure, systems, and operations meet organizational and regulatory standards.
  • Implement secure configuration baselines, vulnerability remediation, and access control policies.
  • Integrate security practices into CI/CD pipelines to ensure DevSecOps alignment.
9. Strategic Leadership & Stakeholder Management
  • Partner with executive and business stakeholders to align SRE initiatives with enterprise objectives and risk frameworks.
  • Provide data-driven insights on reliability, capacity, and operational performance to influence strategic decision-making.
  • Represent SRE functions in technical governance forums, audits, and architecture reviews to drive reliability-focused outcomes.

Qualifications

  • Education: Bachelor's or Master's degree in Computer Science, Information Technology, or a related field.
  • Experience: 12–15 years of total IT experience, with at least 8+ years in SRE, DevOps, or large-scale systems engineering.
  • Technical Expertise:

  • Strong proficiency in Linux/Unix system administration and internals.

  • Proven experience in cloud platforms — AWS, Azure, or GCP.
  • Advanced scripting and automation skills using Python, Go, PowerShell, or Bash.
  • Hands-on exposure to containerization and orchestration technologies (Docker, Kubernetes) and expertise on service mesh like istio etc
  • Deep understanding of monitoring and observability stacks (Prometheus, Grafana, ELK, Datadog, Splunk, Zabbix, Nagios).
  • Expertise in configuration management and IaC tools (Ansible, Terraform, Chef, Puppet).
  • Strong knowledge of networking, load balancing, databases, and distributed systems.
  • Operational Excellence:

  • Hands-on experience in incident response, problem management, and capacity planning at enterprise scale.

  • Proven ability to design for reliability, redundancy, and disaster recovery.
  • Soft Skills:

  • Excellent analytical, communication, and leadership abilities.

  • Proven track record of mentoring and developing high-performing engineering teams.
  • Strong stakeholder management and cross-functional collaboration skills.
Nice to Have
  • Experience defining and implementing SRE frameworks or centers of excellence in global organizations.
  • Familiarity with REST API development, integration, and database query optimization.
  • Strong understanding of governance, risk, and compliance frameworks.
  • Experience with AIOps, self-healing systems, or machine learning-driven monitoring.
  • Demonstrated experience in driving organizational culture change toward reliability and automation.
  • Active participation in industry forums or open-source contributions related to DevOps or SRE practices.

About Electronic Arts

We're proud to have an extensive portfolio of games and experiences, locations around the world, and opportunities across EA. We value adaptability, resilience, creativity, and curiosity. From leadership that brings out your potential, to creating space for learning and experimenting, we empower you to do great work and pursue opportunities for growth.

We adopt a holistic approach to our benefits programs, emphasizing physical, emotional, financial, career, and community wellness to support a balanced life. Our packages are tailored to meet local needs and may include healthcare coverage, mental well-being support, retirement savings, paid time off, family leaves, complimentary games, and more. We nurture environments where our teams can always bring their best to what they do.

Electronic Arts is an equal opportunity employer. All employment decisions are made without regard to race, color, national origin, ancestry, sex, gender, gender identity or expression, sexual orientation, age, genetic information, religion, disability, medical condition, pregnancy, marital status, family status, veteran status, or any other characteristic protected by law. We will also consider employment qualified applicants with criminal records in accordance with applicable law. EA also makes workplace accommodations for qualified individuals with disabilities as required by applicable law.



  • Hyderabad, Telangana, India Awign Expert Full time

    Position: SRE Observability EngineerExp: 5+ to 10 YearsLocation: HyderabadMandatory Skills: Observability, Grafana and Writing queries using Prometheus and Loki. Job Description:We are seeking a highly experienced and driven Senior Observability Engineer to lead the design, development, and maintenance of observability solutions across our infrastructure,...


  • Hyderabad, Telangana, India TerraGiG Full time

    We are looking forSRE Observability EngineerAbout the Role:Duration: PermanentLocation: HyderabadTimings: Full Time (As per company timings)Notice Period: (Immediate Joiner - Only)Experience: 6-10 YearsJD:Position: SRE Observability EngineerExp: 5+ to 10 YearsLocation: HyderabadMandatory Skills: Observability, Grafana and Writing queries using Prometheus and...


  • Hyderabad, Telangana, India Zensar Technologies Full time ₹ 12,00,000 - ₹ 24,00,000 per year

    DescriptionSite Reliability Engineer (SRE)As the world works and lives faster, FIS is leading the way. Our fintech solutions touch nearly every market, company and person on the planet. Our teams are inclusive and diverse. Our colleagues work together and celebrate together. If you want to advance the world of fintech, we'd like to ask you: Are you FIS?About...

  • sre ii

    2 weeks ago


    Hyderabad, Telangana, India Electronic Arts (EA) Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    Description & RequirementsElectronic Arts creates next-level entertainment experiences that inspire players and fans around the world. Here, everyone is part of the story. Part of a community that connects across the globe. A place where creativity thrives, new perspectives are invited, and ideas matter. A team where everyone makes play happen.SEII / SRE...

  • sre ii

    2 weeks ago


    Hyderabad, Telangana, India Electronic Arts Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    General InformationLocations: Hyderabad, Telangana, IndiaRole ID211517Worker TypeRegular EmployeeStudio/DepartmentCT - ITWork ModelHybridDescription & RequirementsElectronic Arts creates next-level entertainment experiences that inspire players and fans around the world. Here, everyone is part of the story. Part of a community that connects across the globe....

  • Senior Lead SRE

    1 week ago


    Hyderabad, Telangana, India JPMorgan Chase Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    Elevate your engineering prowess to unprecedented levels by joining a team of exceptionally gifted professionals and position yourself among the top echelon in site reliability. As a Principal Site Reliability Engineer at JPMorgan Chase within the AI/ML & Data platform team, you work with your fellow stakeholders to define non-functional requirements (NFRs)...


  • Hyderabad, Telangana, India Rosemallow Technologies Pvt Ltd Full time

    Job Title: Senior Site Reliability Engineer (SRE)Experience: 8 to 10 YearsLocation: HyderabadEmployment Type: Full-timeAbout the RoleAs a Senior Site Reliability Engineer, you will lead reliability efforts across production systems. You'll design scalable architectures, mentor junior engineers, and drive incident management, capacity planning, and...


  • Hyderabad, Telangana, India Apple Full time

    Imagine what we could do together. At Apple, new ideas have a way of becoming excellent products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish. The people here at Apple don't just build products - they craft the kind of wonder that's revolutionized entire...


  • Hyderabad, Telangana, India Jade Global Full time

    Senior Site Reliability Engineer (SRE) – Datadog Observability1Job Title: Senior Site Reliability Engineer (SRE) – Datadog ObservabilityExperience Required: 8+ years overall in SRE and Infrastructure Operations with minimum 3+ years hands-on experience in DatadogLocation: Hyderabad preferable but open for Pune and remoteJob Summary:We are seeking an...


  • Hyderabad, Telangana, India Jade Global Full time

    Job Title: Senior Site Reliability Engineer (SRE) – Datadog ObservabilityExperience Required: 8+ years overall in SRE and Infrastructure Operations with minimum 3 + years hands-on experience in Datadog Location: Hyderabad preferable but open for Pune and remoteJob Summary:We are seeking an experienced Site Reliability Engineer (SRE) to lead end-to-end SRE...