Site Reliability Engineer 2

1 week ago


Gandhinagar, India PhonePe Full time

About PhonePe Limited: Headquartered in India, its flagship product, the PhonePe digital payments app, was launched in Aug 2016. As of April 2025, PhonePe has over 60 Crore (600 Million) registered users and a digital payments acceptance network spread across over 4 Crore (40+ million) merchants. PhonePe also processes over 33 Crore (330+ Million) transactions daily with an Annualized Total Payment Value (TPV) of over INR 150 lakh crore. PhonePe’s portfolio of businesses includes the distribution of financial products (Insurance, Lending, and Wealth) as well as new consumer tech businesses (Pincode - hyperlocal e-commerce and Indus AppStore Localized App Store for the Android ecosystem) in India, which are aligned with the company’s vision to offer every Indian an equal opportunity to accelerate their progress by unlocking the flow of money and access to services. Culture: At PhonePe, we go the extra mile to make sure you can bring your best self to work, Everyday. And that starts with creating the right environment for you. We empower people and trust them to do the right thing. Here, you own your work from start to finish, right from day one. PhonePe-rs solve complex problems and execute quickly; often building frameworks from scratch. If you’re excited by the idea of building platforms that touch millions, ideating with some of the best minds in the country and executing on your dreams with purpose and speed, join us Minimum Experience: 3 Years About the Role: This role is responsible for managing and maintaining complex, distributed big data ecosystems. It ensures the reliability, scalability, and security of large-scale production infrastructure. Key responsibilities include automating processes, optimizing workflows, troubleshooting production issues, and driving system improvements across multiple business verticals. Roles and Responsibilities: - Manage, maintain, and support incremental changes to Linux/Unix environments. - Lead on-call rotations and incident responses, conducting root cause analysis and driving postmortem processes. - Design and implement automation systems for managing infrastructure, including provisioning, scaling, upgrades, and patching clusters. - Troubleshoot and resolve complex production issues while identifying root causes and implementing mitigating strategies. - Design and review scalable and reliable system architectures. - Collaborate with teams to optimize overall system/cluster performance. - Enforce security standards across systems and infrastructure. - Set technical direction, drive standardization, and operate independently. - Ensure availability, performance, and scalability of systems and services through proactive monitoring, maintenance, and capacity planning. - Resolve, analyze, and respond to system outages and disruptions and implement measures to prevent similar incidents from recurring. - Develop tools and scripts to automate operational processes, reducing manual workload, increasing efficiency and improving system resilience. - Monitor and optimize system performance and resource usage, identify and address bottlenecks, and implement best practices for performance tuning. - Collaborate with development teams to integrate best practices for reliability, scalability, and performance into the software development lifecycle. - Stay informed of industry technology trends and innovations, and actively contribute to the organization's technology communities. - Develop and enforce SRE best practices and principles. - Align across functional teams on priorities and deliverables. - Drive automation to enhance operational efficiency. - Adapt new technologies as and when the need arises and define architectural recommendations for new tech stacks. Skills Required: - 3 to 7 years of experience managing and maintaining distributed big ecosystems. - Strong expertise in Linux, MySQL, Networking, System Setup, Azure - Proficiency in scripting/programming in any backend language. - Familiarity with open-source configuration management and deployment tools. - Solid understanding of networking, open-source technologies, and related tools. - Excellent communication and collaboration skills. - On-Prem experience mandatory. - DevOps tools: Saltstack, Ansible, docker, Git. - SRE Logging and monitoring tools: ELK stack, Grafana, Prometheus, opentsdb, Open Telemetry. Good to Have: - Experience managing infrastructure on public cloud platforms. - Experience in designing and reviewing system architectures for scalability and reliability. - Experience with observability tools to visualize and alert on system performance. - Experience in massive petabyte scale data migrations, massive upgrades. PhonePe Full Time Employee Benefits (Not applicable for Intern or Contract Roles) - Insurance Benefits - Medical Insurance, Critical Illness Insurance, Accidental Insurance, Life Insurance - Wellness Program - Employee Assistance Program, Onsite Medical Center, Emergency Support System - Parental Support - Maternity Benefit, Paternity Benefit Program, Adoption Assistance Program, Day-care Support Program - Mobility Benefits - Relocation benefits, Transfer Support Policy, Travel Policy - Retirement Benefits - Employee PF Contribution, Flexible PF Contribution, Gratuity, NPS, Leave Encashment - Other Benefits - Higher Education Assistance, Car Lease, Salary Advance Policy - Our inclusive culture promotes individual expression, creativity, innovation, and achievement and in turn helps us better understand and serve our customers. We see ourselves as a place for intellectual curiosity, ideas and debates, where diverse perspectives lead to deeper understanding and better quality results. PhonePe is an equal opportunity employer and is committed to treating all its employees and job applicants equally; regardless of gender, sexual preference, religion, race, color or disability. If you have a disability or special need that requires assistance or reasonable accommodation, during the application and hiring process, including support for the interview or onboarding process, please fill out this form.



  • Gandhinagar, India Insight Global Full time

    Company: Insight GlobalDuration: Approved for 1 year Location: Remote (India) Type: Contract with Insight Global Client Compensation: 14 LPA – 20 LPA Working Hours: Normal IST hours Start Date: Immediate (No notice period)About the RoleJoin our Site Reliability Engineering (SRE) team as a Fullstack Developer, focused on building and maintaining highly...


  • Gandhinagar, India Core Minds Tech SOlutions Full time

    Job Description :- Engage with our product teams to understand requirements, design, and implement resilient and scalable infrastructure solutions- Operate, monitor, and triage all aspects of our production and non-production environments- Collaborate with other engineers on code, infrastructure, design reviews, and process enhancements.- Evaluate and...


  • Gandhinagar, India PhonePe Full time

    SRE We are looking for engineers who are passionate about reliability, performance, and efficiency, and with experience in building tools, services, and automation to manage and improve production services. Systems internals/security, Linux, Network, and Monitoring work to improve the reliability and performance of the next generation of distributed systems...


  • Gandhinagar, India PhonePe Full time

    SRE We are looking for engineers who are passionate about reliability, performance, and efficiency, and with experience in building tools, services, and automation to manage and improve production services. Systems internals/security, Linux, Network, and Monitoring work to improve the reliability and performance of the next generation of distributed systems...

  • Site Engineer

    2 weeks ago


    Gandhinagar, Gujarat, India RR Manpower Management Services Full time ₹ 3,60,000 - ₹ 4,20,000 per year

    Job Title: Site EngineerLocation: Rajpur Near NandasanExperience: 1–2 YearsEmployment Type: Full-TimeJob Summary:We are looking for a motivated and detail-oriented Site Engineer with 1–2 years of hands-on experience in construction site operations. The ideal candidate will assist in managing on-site activities, ensuring timely execution and coordination...

  • Site Administrator

    1 week ago


    Gandhinagar, India Collated ventures LLP Full time

    **Job Summary**: **Key Responsibilities**: - Maintain and organize all site-related documentation including attendance registers, material inward/outward records, and contractor agreements. - Coordinate with HO for approvals, documentation, and communication flow. - Manage daily site office operations - housekeeping, supplies, utilities, courier, and...


  • Gandhinagar, Gujarat, India Prodigy Placement LLP Full time ₹ 28,00,000 - ₹ 72,00,000 per year

    Job description:Job Title: Senior Civil Site EngineerLocation: GandhinagarDepartment: Civil / Construction ProjectsReports To: Project Manager / Construction HeadSalary: Up to 60kJob Summary:We are seeking a highly skilled and experienced Senior Civil Site Engineer to oversee, monitor, and execute civil construction activities on-site. The role involves...


  • gandhinagar, India beBeeReliability Full time

    Site Reliability SpecialistWe are seeking a highly skilled and experienced Senior Site Reliability Professional to drive the development and maintenance of scalable systems.The ideal candidate will play a pivotal role in shaping the future direction of our system architecture, ensuring high performance and reliability.Key responsibilities include:Designing...

  • Site Engineer

    5 days ago


    Gandhinagar, Gujarat, India Klas Products Pvt Ltd Full time ₹ 12,00,000 - ₹ 24,00,000 per year

    Role and ResponsibilitiesYou will be responsible for ensuring smooth and efficient running of the manufacturing processes. Also, you will be responsible for:· Planning and organization production schedules.· Reading of layout drawings and placing equipment.· Assessing material requirements.· Arranging and handling of labour at site.· Organising and...


  • Gandhinagar, India beBeeReliability Full time

    Reliability Engineering LeadThe role of Reliability Engineering Lead is crucial in ensuring the delivery of high-quality, reliable products.Key responsibilities include defining and implementing reliability requirements, purchasing test equipment, developing lab procedures, and collaborating with R&D teams on reliability analysis. Essential skills for this...