
Reliability Advocate
4 days ago
A Senior Site Reliability Engineer (SRE II) is sought after to lead the availability, latency, performance, and efficiency of our SaaS on Azure.
The ideal candidate will define and enforce reliability standards, lead high-impact projects, mentor engineers, and eliminate toil at scale. They will report directly to the Director of SRE.
Key Responsibilities:
- Define customer-centric SLIs/SLOs for Tier-0/Tier-1 services, publish, review quarterly, and align teams to them.
- Run the error-budget policy with multi-window, multi-burn-rate alerts; clear runbooks and paging thresholds.
- Gate changes by budget status (freeze/relax rules) wired into CI/CD.
- Maintain SLO/EB dashboards (Azure Monitor, Grafana/Prometheus, App Insights). Run weekly SLO reviews with engineering/product.
- Drive roadmap tradeoffs when budgets are at risk; land reliability epics.
- Incidents without drama: Lead SEV1/SEV2, own comms, run blameless postmortems, and make corrective actions stick.
- Engineer reliability in: Multi-AZ/region patterns (active-active/DR), PDBs/Pod Topology Spread, HPA/VPA/KEDA, resilient rollout/rollback.
- AKS at scale: Harden clusters (network, identity, policy), optimize node/pod density, ingress (AGIC/Nginx); mesh optional.
- Observability that works: Metrics/traces/logs with Azure Monitor/App Insights, Log Analytics, Prometheus/Grafana, OpenTelemetry. Alert on symptoms, not noise.
- IaC & policy: Terraform/Bicep modules, GitOps (Flux/Argo), policy-as-code (Azure Policy/OPA Gatekeeper). No snowflakes.
- CI/CD reliability: Azure DevOps/GitHub Actions with canary/blue-green, progressive delivery, auto-rollback, Key Vault-backed secrets.
- Capacity & performance: Load testing, right-sizing, autoscaling; partner with FinOps to reduce spend without hurting SLOs.
- DR you can trust: Define RTO/RPO, test backups/restore, run game days/chaos drills, validate ASR and multi-region failover.
- Secure by default: Entra ID (Azure AD), managed identities, Key Vault rotation, VNets/NSGs/Private Link, shift-left checks in CI.
- Reduce toil: Automate recurring ops, build self-service runbooks/chatops, publish golden paths for product teams.
- Customer escalations: Be the technical owner on calls; communicate tradeoffs and recovery plans with authority.
- Document to scale: Architectures, runbooks, postmortems, SLIs/SLOs—kept current and discoverable.
- (If applicable) Streaming/ETL reliability: Apply SRE practices (SLOs, backpressure, idempotency, replay) to NiFi/Flink/Kafka/Redpanda data flows.
Requirements
- Bachelor's in CS/Engineering (or equivalent experience).
- 12+ years in production ops/platform/SRE, including 5+ years on Azure.
- PostgreSQL (must-have): Deep operational expertise incl. HA/DR, logical/physical replication, performance tuning (indexes/EXPLAIN/ANALYZE, pg_stat_statements), autovacuum strategy, partitioning, backup/restore testing, and connection pooling (pgBouncer). Prefer experience with Azure Database for PostgreSQL – Flexible Server.
- Azure core: AKS (must-have); Front Door/App Gateway, API Management, VNets/NSGs/Private Link, Storage, Key Vault, Redis, Service Bus/Event Hubs.
- Observability: Azure Monitor/App Insights, Log Analytics, Prometheus/Grafana; SLO design and error-budget operations.
- IaC/automation: Terraform and/or Bicep; PowerShell and Python; GitOps (Flux/Argo). Pipelines in Azure DevOps or GitHub Actions.
- Proven incident leadership at scale, blameless postmortems, and SLO/error-budget governance with change gating.
- Mentorship and crisp written/verbal communication.
Nice to Have
- Apache NiFi, Apache Flink, Apache Kafka or Redpanda (self-managed on AKS or managed equivalents); schema management, exactly-once semantics, backpressure, dead-letter/replay patterns.
- Azure Solutions Architect Expert, CKA/CKAD.
- ITSM (ServiceNow), on-call tooling (PagerDuty/Opsgenie).
- Compliance/SecOps (SOC 2, ISO 27001), policy-as-code, workload identity.
- OpenTelemetry, eBPF tooling, or service mesh.
- Multi-tenant SaaS and cost optimization at scale.
-
Highly Skilled Operations Engineer
1 week ago
Erode, Tamil Nadu, India beBeeReliability Full time US$ 1,20,000 - US$ 1,40,000About the Role:At elevenx-capital, we are seeking a skilled Site Reliability Engineer to join our team and help ensure the reliability, scalability, and performance of our critical systems.Key Responsibilities:Design, Implement and Maintain Scalable Infrastructure: Develop and maintain scalable and reliable infrastructure for production systems using...
-
Software Development Test Engineer
2 weeks ago
Erode, Tamil Nadu, India beBeeQuality Full time ₹ 15,00,000 - ₹ 25,00,000Job DescriptionA Software Quality Assurance Engineer is required to ensure the quality and reliability of software products. This role involves developing, automating, and executing unit and functional tests using C# and Python.The ideal candidate will have experience with RESTful API design and development, as well as proficiency in Python scripting for...
-
Senior UX Designer
5 days ago
Erode, Tamil Nadu, India beBeeUserExperience Full time ₹ 15,00,000 - ₹ 25,00,000Job OverviewUser Experience LeadWe are seeking a highly skilled User Experience Lead to join our team. The ideal candidate will have a minimum of 3-6 years of experience in UX design and a strong portfolio that showcases their skills in creating user-centered designs.Main Responsibilities:Create user-centric experiences for global projects with innovative...
-
AI Solutions Architect
7 days ago
Erode, Tamil Nadu, India beBeeArtificial Full time ₹ 1,50,00,000 - ₹ 2,01,00,000About the RoleWe are seeking a highly skilled and proactive individual to oversee the maintenance, optimization, and ongoing performance of deployed AI/ML systems and solutions. In this role, you'll act as the bridge between innovation and operations, ensuring our AI solutions consistently deliver value and operate seamlessly in real-world environments.Key...
-
Engineering Manager T500-1967
3 weeks ago
Erode, Tamil Nadu, India REA Cyber City Full timeAbout REA Group: In 1995, in a garage in Melbourne, Australia, REA Group was born from a simple question: "Can we change the way the world experiences property?" Could we? Yes. Are we done? Never. Fast forward 30 years, REA Group is a market leader in online real estate in three continents and continuing to grow rapidly across the globe. The secret to our...
-
Veterinary Doctor
13 hours ago
Erode, Tamil Nadu, India Oshadi Collective Full timeResponsibilities: - Assessing and examining cows and dogs, diagnosing their medical issues. - Promoting preventative care by advocating and administering routine vaccinations and positive diet for animals. - Maintaining animal medical records. Advise workers to general care of cows and dogs handling. - Knowledge in plant based medicine. Veterinarian...