Site Reliability Engineer

2 days ago


Hyderabad, Telangana, India Intraedge Technologies Ltd. Full time

L2Observability/AIOps :

Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems.

SRE ensures internally critical and externally visible systems have reliability and uptime appropriate to users' needs and a fast rate of improvement while keeping an ever-watchful eye on capacity and performance.

SRE is a mindset, and a set of engineering approaches focused on optimizing existing systems, building infrastructure, and eliminating work through automation.

As a Site Reliability Engineer with focus on observability you will build and operate next generation observability platforms.

As an SRE with Observability focus you will :

- Explore the complex IT estates of our clients to understand their observability/AIOps opportunities, identify the areas to improvise.

- Collaborate to architect unified observability and AIOps strategies which employ leading AI technology.

- Implement enterprise observability/AIOps technology and processes.

- Amplify observability/AIOps outcomes by accelerating adoption across technology and business include:

- Architect observability solutions to address the gaps in order to reduce organizational MTTD and MTTR objectives.

- Developing API-driven micro-services that combine into large and complex platforms.

- Planning and executing highly parallel distributed object storage transformations and migrations.

- Maintaining automated test suites using CI/CD tools.

- Participating in collaborative projects with small software engineering teams.

- Develop automation, processes, and tools designed to make our services simpler and more robust.

- Participate in troubleshooting, capacity planning and analysis, performance analysis activities.

- Advise management on service onboarding strategies and execution.

What we are looking for :

- Entrepreneurs who seek challenging problems to solve.

- Creativity, initiative and acute attention to detail.

- Thirst for innovation and solving problems at lightning speed.

- Passion for automating everything repetitive.

- Obsession with software scalability and performance under high loads.

- Love for using and contributing to open-source software.

Please bring to the table :

- Experience in architecting complex IT solutions.

- Understanding of observability dimensions(Metrics, logs, traces).

- Excellent communication and stakeholder management skills.

- Development experience, comfortable working in multiple languages(Python, Java, Go and Ruby a plus).

- Experience working in collaborative coding environments (peer review, continuous integration, etc).

- 7+ years of application development.

- Experience working in distributed remote teams across multiple time zones.

- Experience in large scale operations environments.

- 7+ years of experience with Linux/Unix development or systems administration.

- 3+ years of experience with networking systems and technologies.

- Deep understanding of network performance and security.

- Ability to identify tasks which require automation and implement required automation.

- Configuration Management tools experience with Puppet, Chef, SaltStack.

- Hands-on operational experience in a high-volume or critical production service environment distributed systems, capacity planning, continuous deployment.

- BA/BS in Computer Science preferred, or equivalent experience (advanced degrees preferred).

We have opportunities to work with and learn :

- Object Storage Minio/S3/etc.

- Data Collection OpenTelemetry/Grafana Alloy/etc.

- Message Bus Kafka/NSQ/etc.

- Scaling Databases Relational database technologies at large scale Scheduling & Orchestration Cloud Platforms AWS/Azure.

(ref:hirist.tech)

  • Hyderabad, Telangana, India Talent Worx Full time ₹ 9,00,000 - ₹ 12,00,000 per year

    Site Reliability Engineer (SRE)At Talent Worx, we are looking for a dedicated Site Reliability Engineer (SRE) to join our team. This role involves maintaining high availability and reliability of our services through the application of software engineering practices and systems administration skills. The ideal candidate will bridge the gap between...


  • Hyderabad, Telangana, India Talent Worx Full time US$ 1,20,000 - US$ 2,00,000 per year

    Talent Worx is seeking a talented SRE (Site Reliability Engineer) to enhance our technology team. In this role, you will be pivotal in ensuring the reliability, performance, and availability of our applications and services.Your work will involve both software engineering and systems operations as you strive to improve customer experiences and operational...


  • Hyderabad, Telangana, India Talent Worx Full time

    Talent Worx is seeking a talented SRE (Site Reliability Engineer) to enhance our technology team. In this role, you will be pivotal in ensuring the reliability, performance, and availability of our applications and services.Your work will involve both software engineering and systems operations as you strive to improve customer experiences and operational...


  • Hyderabad, Telangana, India IntraEdge Full time

    Site Reliability EngineerExperience: 7+ YearsLocation: HyderabadHybrid 4-day office and 1 Day remoteSkills for Principal:Strong leadership and people management skills.Exceptional technical proficiency in Pearson's technology stack.Advanced project management capabilities.Excellent communication and collaboration skills.Adept at risk assessment and crisis...


  • Hyderabad, Telangana, India IntraEdge Full time

    Site Reliability EngineerExperience: 7+ YearsLocation: HyderabadHybrid 4-day office and 1 Day remoteSkills for Principal:- Strong leadership and people management skills.- Exceptional technical proficiency in Pearson's technology stack.- Advanced project management capabilities.- Excellent communication and collaboration skills.- Adept at risk assessment and...


  • Hyderabad, Telangana, India IntraEdge Full time

    Site Reliability Engineer Experience: 7+ Years Location: Hyderabad Skills for Principal: ~ Strong leadership and people management skills. ~ Exceptional technical proficiency in Pearson's technology stack. ~ Advanced project management capabilities. ~ Excellent communication and collaboration skills. ~ Adept at risk assessment and crisis management. ~...


  • Hyderabad, Telangana, India IntraEdge Full time US$ 1,20,000 - US$ 2,00,000 per year

    Site Reliability EngineerExperience: 7+ YearsLocation: HyderabadSkills for Principal:Strong leadership and people management skills.Exceptional technical proficiency in Pearson's technology stack.Advanced project management capabilities.Excellent communication and collaboration skills.Adept at risk assessment and crisis management.Strategic thinking with a...


  • Hyderabad, Telangana, India IntraEdge Full time

    Site Reliability Engineer Experience: 7+ Years Location: Hyderabad Hybrid 4-day office and 1 Day remote Skills for Principal: Strong leadership and people management skills. Exceptional technical proficiency in Pearson's technology stack. Advanced project management capabilities. Excellent communication and collaboration skills. Adept at risk assessment...


  • Hyderabad, Telangana, India ServiceNow Full time

    Site Reliability Engineer (SRE)Experience : 6+ YearsAbout the Role : We are seeking a seasoned SRE to ensure the reliability, availability, and performance of our critical services. You will combine software engineering with systems administration to create scalable and highly reliable software systems.Responsibilities : - Design, build, and maintain...


  • Hyderabad, Telangana, India Talent Worx Full time ₹ 15,00,000 - ₹ 20,00,000 per year

    SRE (Site Reliability Engineer)Talent Worx is seeking a talented SRE (Site Reliability Engineer) to enhance our technology team. In this role, you will be pivotal in ensuring the reliability, performance, and availability of our applications and services. Your work will involve both software engineering and systems operations as you strive to improve...