Staff Site Reliability Engineer
5 days ago
Expert-level proficiency operating large-scale, distributed, mission-critical systems: designing for high availability, multi-region resiliency, low latency, and predictable performance under extreme load.
SRE fundamentals at Staff level: defines and drives SLOs/SLIs, error budgets, availability targets, and capacity guardrails codifies reliability requirements into design reviews and change-management gates.
Deep hands-on with Kubernetes and container platforms: multi-cluster operations, workload placement, HPA/VPA, pod disruption budgets, network policies, admission control, service mesh (Istio/Linkerd), and progressive delivery (blue/green, canary, feature flags).
Infra as Code and GitOps: Terraform (and/or Pulumi), Helm/Kustomize, Argo CD/Flux builds reusable modules, policy-as-code (OPA/Conftest), environment drift detection, and automated remediation.
Observability at scale: OpenTelemetry instrumentation/tracing, metrics (Prometheus), logging (ELK/OpenSearch), distributed tracing (Jaeger/Tempo/Zipkin), dashboards and SLO burn-rate alerts (Grafana) designs actionable alerts with runbook automation.
Proven incident leadership: serves as Incident Commander for P0/P1 events, coordinates cross-functional response, stabilizes systems, restores service quickly, and drives blameless postmortems with measurable follow-through.
Performance engineering and capacity planning: load and resilience testing, GC/heap and thread tuning (for JVM services), profiling (CPU, memory, IO), caching strategies, queue backpressure, and cost-aware capacity models.
Strong systems and networking: Linux internals, filesystems, TCP/UDP, TLS/mTLS, HTTP/2/3, DNS, BGP/Anycast concepts, L4L7 load balancing (Envoy/HAProxy/NGINX), CDN/edge (Cloudflare/Fastly/Akamai), WAF, and DDoS mitigation.
Data/store reliability: operational experience with relational (PostgreSQL/MySQL/Oracle) and NoSQL (Cassandra/DynamoDB/MongoDB), streaming platforms (Kafka/Pulsar/Kinesis), and distributed caches (Redis/Hazelcast) backup/restore, consistency models, compaction/retention tuning, and multi-AZ/region failover.
Cloud and platform engineering: AWS/Azure/GCP core services, VPC design, IAM/RBAC, KMS, secrets management (Vault), service catalog, golden images/base containers, and paved-road platforms for developers.
Release engineering and CI/CD: Jenkins/GitHub Actions/GitLab CI, artifact/signing/SBOM, canary analysis, automated rollbacks, deployment safety checks, and change failure rate/MTTR improvements.
Reliability-by-design partnership: participates in and leads architecture/design reviews, threat modeling, and resilience patterns (bulkheads, circuit breakers, idempotency, retry/backoff, dead-letter handling).
Disaster recovery and business continuity: RTO/RPO objectives, runbooks, game days/chaos experiments (Litmus/Gremlin), regional evacuation, and active-active/active-passive strategies.
Security in depth for production systems: least privilege, workload identity, image and dependency scanning, supply-chain hardening (SLSA), SBOM, network segmentation/zero trust, and PCI-DSS-aligned operational controls.
Strong programming and automation: production-grade Go and/or Python (plus Bash), contributing SRE tooling, controllers/operators, and APIs code reviews, testing, and docs-as-code.
Effective communicator and influencer: aligns reliability strategy with business outcomes, mentors engineers, challenges assumptions with data, and proposes pragmatic, incremental improvements.
Experience leveraging GenAI/LLMs as copilots: accelerating runbook authoring, alert triage, knowledge retrieval, and post-incident synthesis with appropriate guardrails and data security.
Nice to have: JVM and runtime tuning experience, traffic engineering at Internet scale, mobile edge/network reliability considerations.
This is a hybrid position. Expectation of days in office will be confirmed by your hiring manager.
Locations - Job Location: Bangalore, INDIA
-
Staff Site Reliability Engineer
5 days ago
Bengaluru, Karnataka, India Okta Full time ₹ 8,00,000 - ₹ 24,00,000 per yearJoin our team Were building a world where Identity belongs to you.Oktas Workforce Identity Cloud Security Engineering group is looking for a Staff Site Reliability Engineer with a passion for DevSecOps , Infrastructure Security , and SRE . Join a team that is not just building solutions but redefining the standards for cloud security. If you have a proven...
-
Staff Site Reliability Engineer
2 weeks ago
Bengaluru, Karnataka, India Procore Technologies Full time ₹ 15,00,000 - ₹ 20,00,000 per yearJob DescriptionWe're looking for aStaff Site Reliability Engineerto join Procore's Infrastructure Platform division to work on our commercial initiatives. In this role, you'll help build Procore's next-generation construction compute platform for others to build upon, including Procore developers, analysts, partners, and customers.Procore software solutions...
-
Staff Site Reliability Engineer
2 weeks ago
Bengaluru, Karnataka, India Anlage Infotech (I) Pvt. Ltd. Full time ₹ 12,00,000 - ₹ 36,00,000 per yearAbout The Role : We are looking for a highly experienced Staff Site Reliability Engineer (SRE) to drive the reliability, performance, and operational excellence of our core production systems. This is a senior, hands-on role that requires deep expertise in large-scale distributed systems, complex incident management, and building world-class...
-
Senior Staff Site Reliability Engineer
2 days ago
Bengaluru, Karnataka, India Movius Full time ₹ 20,00,000 - ₹ 40,00,000 per yearAbout the Role : We are looking for a highly experienced Senior Staff Site Reliability Engineer (SRE) to join our dynamic team. The ideal candidate will bring deep technical expertise in DevOps, automation, and large-scale distributed systems, with a strong understanding of cloud operations and CI/CD frameworks. Experience in the telecom domain will be an...
-
Staff Site Reliability Engineer
2 weeks ago
Bengaluru, Karnataka, India Visa Full time ₹ 4,00,000 - ₹ 8,00,000 per yearCompany Description Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more than 200 countries and territories each year. Our mission is to connect the world through the most innovative, convenient, reliable, and secure...
-
Senior Staff Site Reliability Engineer
4 days ago
Bengaluru, Karnataka, India Zscaler Full time ₹ 12,00,000 - ₹ 36,00,000 per yearAbout ZscalerServing thousands of enterprise customers around the world including 45% of Fortune 500 companies, Zscaler (NASDAQ: ZS) was founded in 2007 with a mission to make the cloud a safe place to do business and a more enjoyable experience for enterprise users. As the operator of the world's largest security cloud, Zscaler accelerates digital...
-
Site Reliability Engineer
2 weeks ago
Bengaluru, Karnataka, India H&M Full time ₹ 15,00,000 - ₹ 25,00,000 per yearJob DescriptionWe are looking for a Site Reliability Engineer within eCommerce with experience of Headless SaaS (e.g., a headless CMS experience) and API based commerce frameworks and managed cloud services (e.g. managed Kubernetes). You will work within our SRE Capability supporting the next generation customer experience by blending fashion and tech. You...
-
Site Reliability Engineer
7 days ago
Bengaluru, Karnataka, India Programming Full time ₹ 10,00,000 - ₹ 25,00,000 per yearRole - Site Reliability Engineering.Location - BengaluruYears of Expereince - 4+ YearsProfessional & Technical Skills:Must To Have Skills: Proficiency in Site Reliability Engineering.Good To Have Skills: Experience with cloud service providers such as AWS, Azure, or Google Cloud.Strong understanding of CI/CD tools and practices.Experience with container...
-
Site Reliability Engineer
2 weeks ago
Bengaluru, Karnataka, India FIS Full time ₹ 12,00,000 - ₹ 36,00,000 per yearAbout the Role :Site Reliability Engineer (SRE)with deep expertise inMainframe technologies like COBOL, JCL, etc. to support and enhance ourCard Management & Payment processing functions. This role will be responsible for ensuring reliability, high availability, scalability, stability and performance of mission-critical mainframe software applications and...
-
Site Reliability Engineer
1 week ago
Bengaluru, Karnataka, India FOSS United Full time ₹ 12,00,000 - ₹ 36,00,000 per yearAll JobsSite Reliability Engineer at ZEISS IndiaSite Reliability EngineerApplyPosted on September 11, 2025ZEISS IndiaKadubeesanahalli, BengaluruFull TImeJob DescriptionZEISS in IndiaZEISS in India is headquartered in Bengaluru and present in the fields of Industrial Quality Solutions, Research Microscopy Solutions, Medical Technology, Vision Care and Sports...