Senior Site Reliability Engineer

17 hours ago

Chennai, Tamil Nadu, India 5f897309-c3ba-4c7a-8735-7f405937dfd8 Full time ₹ 2,00,00,000 - ₹ 4,00,00,000 per year

Senior Site Reliability Engineer
Who we are
Arcadia is the technology company empowering energy innovators and consumers to fight the climate crisis. Our software and APIs are revolutionizing an industry held back by outdated systems and institutions by creating unprecedented access to the data and clean energy needed to make a decarbonized energy grid possible.In 2014, Arcadia set out on its mission to break the fossil fuel monopoly and since then we have been knocking down the institutional barriers to unlock decarbonization. To date, we have connected hundreds of thousands of consumers and small businesses with high-quality clean energy options. Fast forward to today, and now, we're thinking even bigger. We have launched Arcadia Platform, an industry-defining SaaS platform that empowers developers and energy innovators to deliver their own custom, personalized energy experiences, accelerating the transformation of the industry from an analog energy system into a digitized information network.Tackling one of the world's biggest challenges requires out-of-the-box thinking & diverse perspectives. We're building a team of individuals from different backgrounds, industries, & educational experiences. If you share our passion for ushering in the era of the clean electron, we look forward to learning what you would uniquely bring to Arcadia Visit Greenwood Village, Colorado
What we're looking for:

We are seeking an experienced
Senior Site Reliability Engineer (L3)
to join our SRE/Platform Engineering team in India. This role will focus on building, scaling, and maintaining our AWS- and Kubernetes-based platform, ensuring high reliability, cost efficiency, and secure operations across multiple environments. The successful candidate will work closely with Engineering, Security, DevOps, and Product teams to drive automation, improve infrastructure resilience, and elevate observability across mission-critical systems.

The ideal candidate is a self-starter and hands-on engineer who can dive deep into complex distributed systems, automate away manual processes, and proactively identify reliability gaps. They should have a proven track record of managing production-grade AWS infrastructure, Kubernetes clusters, CI/CD pipelines, and cloud security. They will collaborate daily with US-based engineering teams and cross-functional partners to ensure our platform remains scalable, secure, and cost-optimized as we continue to grow.

What you'll do:

Design, build, and maintain
AWS infrastructure
(EKS, VPC, RDS, IAM, CloudWatch, CloudTrail, GuardDuty, Load Balancers, S3, CloudFront) using
Terraform and CloudFormation
Lead all aspects of
Kubernetes operations
including cluster upgrades, performance tuning, CNI troubleshooting, workload scaling, Helm chart packaging, and GitOps deployments
Own and evolve our
CI/CD ecosystem
across Jenkins (Groovy scripting), GitHub Actions, AWS CodePipeline, ArgoCD, and FluxCD
Improve platform reliability by reducing operational toil through automation, scripting (Python/Bash), and proactive system hardening
Implement and enhance
observability
across Prometheus, Grafana, Loki, Tempo, Datadog, and CloudWatch—ensuring actionable alerting, dashboards, and metrics alignment with SLO/SLIs
Drive
FinOps initiatives
, identifying cost inefficiencies and working with engineering teams to implement best practices, tagging standards, budgeting, and resource right-sizing
Manage
database operations
across MySQL and PostgreSQL including backups, performance tuning, replication, and operational runbooks
Maintain and improve
secret management
using Vault, AWS Secrets Manager, and Parameter Store
Strengthen cloud security posture with IAM least privilege, CSPM reviews, audit readiness, GuardDuty/CloudTrail monitoring, and environment hardening
Troubleshoot complex production issues across networking, Kubernetes, compute, databases, and CI/CD systems
Collaborate daily with US-based teams for incident reviews, migrations, roadmap work, and platform enhancements
Contribute to development and adoption of
AI-enabled tooling
(e.g., automation, debugging assistants, MCP, RAG pipelines—good to have, not mandatory)
Document runbooks, architecture diagrams, SOPs, troubleshooting guides, and operational best practices
Participate in on-call rotations (if required) and drive post-incident analysis and long-term fixes

What will help you succeed:

Must-haves:

Bachelor's degree in Computer Science, Engineering, or equivalent practical experience
8–10+ years of experience in
SRE/DevOps/Cloud Engineering
, with deep hands-on exposure to AWS and Kubernetes
Strong hands-on experience with:
Terraform
& Infrastructure as Code
AWS core services (EKS, IAM, RDS, EC2, VPC, CloudWatch, CloudTrail, GuardDuty)
Jenkins + Groovy
, GitHub Actions, ArgoCD, FluxCD
Kubernetes troubleshooting and operations
Prometheus/Grafana/Datadog
observability stacks
Proven ability to operate in high-scale, high-uptime, multi-environment production systems
Experience building automation via
Python/Bash
and reducing operational toil
Strong understanding of incident management, root cause analysis, and reliability engineering principles
Experience working with globally distributed teams across multiple time zones
Excellent communication skills (must interact with US teams daily)
Ability to work independently with minimal supervision, take ownership, and drive initiatives end-to-end
A growth mindset, strong troubleshooting ability, and comfort with complex cloud-native environments

Nice to have (Good-to-haves):

Experience with
n8n self-hosted
, workflow automation platforms
Exposure to
LLMs, RAG, vector DBs, MCP
concepts
Experience with cloud security/DevSecOps tools (Trivy, Inspector, OPA, Kyverno)
Hands-on experience with FinOps platforms and cloud cost governance
Certifications in related field ( AWS , Kubernetes , Terraform ..etc)

Benefits

Competitive compensation and employee stock options
Hybrid/remote-first working model (India-based role, with global collaboration)
Flexible leave policy
Comprehensive medical insurance (self + family members)
Annual performance cycle + quarterly recognition awards
A supportive, diverse engineering culture grounded in empathy, teamwork, and innovation

Eliminating carbon footprints, eliminating carbon copies.

Here at Arcadia, we cultivate diversity, celebrate individuality, and believe unique perspectives are key to our collective success in creating a clean energy future. Arcadia is committed to equal employment opportunities regardless of race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, disability, genetic information, protected veteran status, or any status protected by applicable federal, state, or local law. While we are currently unable to consider candidates who will require visa sponsorship, we welcome applications from all qualified candidates eligible to work in India

Thank you

Senior Site Reliability Engineer I

11 hours ago

Chennai, Tamil Nadu, India LexisNexis Legal & Professional Full time ₹ 12,00,000 - ₹ 36,00,000 per year

Senior Site Reliability Engineer IJoin Our Diverse and Inclusive Team Delivering High-Quality Software WorldwideAre you someone who enjoys working with others, solving problems creatively, and making a meaningful difference?About the BusinessAt ICIS, our purpose is to optimize the world's resources and empower strategic, sustainable decisions by bringing...
Site Reliability Engineer

2 weeks ago

Chennai, Tamil Nadu, India Grootan Technologies Full time ₹ 12,00,000 - ₹ 36,00,000 per year

About the RoleWe are seeking a skilledSite Reliability Engineer (SRE)with 4–5 years of hands-on experience to join our engineering team. In this role, you will be responsible for building and maintaining reliable, scalable, and secure infrastructure to support our applications. You will leverage your expertise in automation, cloud platforms, and monitoring...
Site Reliability Engineer

3 days ago

Chennai, Tamil Nadu, India GSR Business Services Full time ₹ 6,00,000 - ₹ 12,00,000 per year

Dear Aspirants,Urgent HiringSite reliability Engineer3-5 YearsChennaiRole Summary:Supports the reliability and performance of systems and infrastructure. Assists in monitoring, troubleshooting, and automating tasks to maintain high-availability environments.Key Responsibilities:Assist in managing VMware and Linux servers.Monitor system health and respond to...
Site Reliability Engineer

2 weeks ago

Chennai, Tamil Nadu, India HICS Technologies Pte Ltd Full time ₹ 8,00,000 - ₹ 16,00,000 per year

Job Title: Site Reliability Engineer (SRE) – Capital Markets / TradingLocation: [Chennai / Onsite / Full Time]Experience: 7 to 15 yearsDomain: IT Operations / Capital Markets / TradingAbout the Role:We are seeking a seasoned Site Reliability Engineer (SRE) to join our dynamic IT team supporting trading and capital markets applications. The ideal...
Devops + Site Reliability Engineer

4 days ago

Chennai, Tamil Nadu, India Flex Full time ₹ 12,00,000 - ₹ 36,00,000 per year

Flex is the diversified manufacturing partner of choice that helps market-leading brands design, build and deliver innovative products that improve the world.A career at Flex offers the opportunity to make a difference and invest in your growth in a respectful, inclusive, and collaborative environment. If you are excited about a role but don't meet every...
Site Reliability Engineer

1 week ago

Chennai, Tamil Nadu, India Ford Motor Company Full time ₹ 8,00,000 - ₹ 24,00,000 per year

Job DescriptionJob Description:Ford is seeking an experienced Site Reliability Engineer (SRE) to join our team and lead the development, enhancement, and extension of our global monitoring and observability platform.Enterprise Technology plays a critical part in shaping the future of mobility. If you're looking for the chance to leverage advanced technology...
Devops + Site Reliability Engineer

4 days ago

Chennai, Tamil Nadu, India Flex Full time ₹ 12,00,000 - ₹ 36,00,000 per year

Flex is the diversified manufacturing partner of choice that helps market-leading brands design, build and deliver innovative products that improve the world.A career at Flex offers the opportunity to make a difference and invest in your growth in a respectful, inclusive, and collaborative environment. If you are excited about a role but don't meet every...
Site Reliability Engineer

4 days ago

Chennai, Tamil Nadu, India Flex Full time ₹ 8,00,000 - ₹ 24,00,000 per year

Experience:3.5 to 7 yearsLocation:ChennaiWork mode:Hybrid.Role Overview:As a Site Reliability Engineer (SRE) on the Factory Applications team, you will help maintain and scale Brix" - a cloud-native, containerized, microservices-based platform used to build global shop floor systems. Your focus will be on automation, reliability, and performance.Key...
Site Reliability Engineer

2 days ago

Chennai, Tamil Nadu, India Trimble Inc. Full time ₹ 5,00,000 - ₹ 15,00,000 per year

Job SummaryWe are seeking a motivated Site Reliability Engineer (SRE) Level 1 / Level 2 to enhance the infrastructure and operational reliability of our ERP product, specifically within Azure and Windows environments. The ideal candidate will utilize SRE principles to ensure high system availability, stability, and performance while collaborating closely...
Site Reliability Engineer, AVP

19 hours ago

Chennai, Tamil Nadu, India NatWest Group Full time ₹ 12,00,000 - ₹ 36,00,000 per year

Join us as a Site Reliability EngineerYou'll manage the provision of stable, resilient, reliable applications with the end goal of minimising disruption to Customer & Colleague Journeys (CCJ)We'll look to you to identify and automate manual tasks and implement observability solutions, ensuring a thorough understanding of CCJ across applicationsThis is a...

Americas

Europe

Asia / Oceania

Africa

Senior Site Reliability Engineer