Site Reliability Engineer

2 weeks ago


Bengaluru Karnataka India, Karnataka PhonePe Full time

About the Role:This role is responsible for managing and maintaining complex, distributed big data ecosystems. It ensures the reliability, scalability, and security of large-scale production infrastructure. Key responsibilities include automating processes, optimizing workflows, troubleshooting production issues, and driving system improvements across multiple business verticals.Roles and Responsibilities:● Manage, maintain, and support incremental changes to Linux/Unix environments.● Lead on-call rotations and incident responses, conducting root cause analysis and driving postmortem processes.● Design and implement automation systems for managing big data infrastructure, including provisioning, scaling, upgrades, and patching clusters.● Troubleshoot and resolve complex production issues while identifying root causes and implementing mitigating strategies.● Design and review scalable and reliable system architectures.● Collaborate with teams to optimize overall system/cluster performance.● Enforce security standards across systems and infrastructure.● Set technical direction, drive standardization, and operate independently.● Ensure availability, performance, and scalability of systems and services through proactive monitoring, maintenance, and capacity planning.● Resolve, analyze, and respond to system outages and disruptions and implement measures to prevent similar incidents from recurring.● Develop tools and scripts to automate operational processes, reducing manual workload, increasing efficiency and improving system resilience.● Monitor and optimize system performance and resource usage, identify and address bottlenecks, and implement best practices for performance tuning.● Collaborate with development teams to integrate best practices for reliability, scalability, and performance into the software development lifecycle.● Stay informed of industry technology trends and innovations, and actively contribute to the organization's technology communities.● Develop and enforce SRE best practices and principles.● Align across functional teams on priorities and deliverables.● Drive automation to enhance operational efficiency.● Adapt new technologies as and when the need arises and define architectural recommendations for new tech stacks.Skills Required:● Over 4 years of experience managing and maintaining distributed big data ecosystems.● Strong expertise in Linux including IP, Iptables, and IPsec.● Proficiency in scripting/programming with languages like Perl, Golang, or Python.● Hands-on experience with the Hadoop stack (HDFS, HBase, Airflow, YARN, Ranger, Kafka, Pinot).● Familiarity with open-source configuration management and deployment tools such as Puppet, Salt, Chef, or Ansible.● Solid understanding of networking, open-source technologies, and related tools.● Excellent communication and collaboration skills.● DevOps tools: Saltstack, Ansible, docker, Git.● SRE Logging and monitoring tools: ELK stack, Grafana, Prometheus, opentsdb, Open Telemetry.Good to Have:● Experience managing infrastructure on public cloud platforms (AWS, Azure, GCP).● Experience in designing and reviewing system architectures for scalability and reliability.● Experience with observability tools to visualize and alert on system performance.● Experience in massive petabyte scale data migrations, massive upgrades.



  • Bengaluru, Karnataka, India, Karnataka Karix Full time

    Role: Site Reliability EngineerLocation: Bangalore (WFO)About the role: We are seeking an experienced professional Site Reliability Engineer who acts as a bridge between development and IT operations, taking operational tasks to ensure the efficient functioning of Service platforms. They are responsible for monitoring, automating, and improving the...


  • Bengaluru, Karnataka, India, Karnataka Glocomms Full time

    We are currently looking for an SRE Lead - to join our customer - an IT consultancy with urgent projects on board.This will be a 6 month contract initially with an option to extend further.Must have 10+ years exp.Responsibilities:Assess application architecture and implement patterns for reliability and performance.Automate workflows and reduce manual toil...


  • Bengaluru, Karnataka, India, Karnataka Landmark Group Full time

    What You’ll Do:• Ensure reliability and high availability of Java and microservices-based applications through proactive monitoring and automation.• Define and track SLIs/SLOs to maintain service performance and stability.• Troubleshoot and resolve production issues, performing detailed root cause analysis to prevent recurrence.• Build and enhance...


  • Bengaluru, Karnataka, India, Karnataka HireAlpha Full time

    Role-Site Reliability Engineer6+ Years Permanent/ Bangalore - HybridJob Description We are looking for an engineer to focus on Developer Experience and who can help us design, build, and maintain high-performance, scalable, and reliable services. As Company provides a Contact Center service, we play a very critical role in our Customer’s business...


  • Bengaluru, Karnataka, India, Karnataka Kaplan Full time

    Job Title Site Reliability Engineer (Hybrid)Job DescriptionFor more than 80 years, Kaplan has been a trailblazer in education and professional advancement. We are a global company at the intersection of education and technology, focused on collaboration, innovation, and creativity to deliver a best in class educational experience and make Kaplan a great...


  • Bengaluru, Karnataka, India, Karnataka Integrated Personnel Services Limited Full time

    We are seeking an experienced Site Reliability Engineer to join our team at, a leader in blockchain technology and solutions. The ideal candidate will have a strong background in infrastructure management and a deep understanding of blockchain ecosystems. You will be responsible for designing, implementing, and maintaining the foundational infrastructure...


  • Bengaluru, Karnataka, India Thakral One Full time US$ 60,000 - US$ 1,20,000 per year

    Company DescriptionThakral One, headquartered in Singapore, is a technology consulting and services company with a strong presence across Asia. The company specializes in technology-driven consulting, custom solution development, data analytics, and leveraging cloud capabilities to deliver enhanced decision support and practical outcomes. Collaborating...


  • Bengaluru, Karnataka, India Viraaj HR Solutions Private Limited Full time

    Site Reliability Engineer (SRE)About The OpportunityA fast-growing organization in the Enterprise Cloud Infrastructure & SaaS sector delivering highly available, mission-critical services to enterprise customers. We are hiring an on-site Site Reliability Engineer in India to own reliability, automation, and operational excellence across cloud-native...


  • Bengaluru, Karnataka, India super Full time

    Site Reliability Engineer (SRE) Level 3Overview:A Site Reliability Engineer (SRE) Level 3 is a senior technical leadership role focused on designing, implementing, and maintaining large-scale, complex, and highly reliable systems. This role emphasizes a blend of software and systems engineering to ensure the availability, latency, performance, and capacity...


  • Bengaluru, Karnataka, India Integers Full time ₹ 4,80,000 - ₹ 14,40,000 per year

    Job Title:Site Reliability Engineer (SRE)Location:Bengaluru, Karnataka (Hybrid – 1–2 days in office)Experience Level:8+ yearsJob DescriptionWe are seeking a highly skilled and experiencedSite Reliability Engineer (SRE)to join our engineering team. The ideal candidate will have a strong background in software development, DevOps practices, infrastructure...