Reliability Architect

2 weeks ago


INDIA HYDERABAD BIRLASOFT OFFICE IN Birlasoft Limited Full time ₹ 12,00,000 - ₹ 36,00,000 per year

Architect

Area(s) of responsibility

Job Description: Reliability Architect – 6A

Reliability Architect with over 10 years of experience in proactive monitoring, automation, and observability. Skilled in AIOps/MLOps, infrastructure management, and performance optimization using modern tools and practices. Adept at leading incident response, mentoring support teams, and driving cross-functional collaboration to ensure system reliability and scalability.

Key Responsibilities:

  • Monitoring and Automation
    Proactively monitor software systems to prevent incidents and automate routine operational tasks.
  • Effective Monitoring
    Design monitoring systems that trigger alerts based on symptoms rather than outages, ensuring early detection and resolution.
  • Application Performance Monitoring (APM)
    Implement and manage APM tools like New Relic or Dynatrace to track application performance, identify bottlenecks, and optimize resource usage.
  • Log Analysis with Splunk
    Use Splunk to analyze logs for troubleshooting, anomaly detection, and improving system reliability.
  • Dashboards Preparation
    Build intuitive dashboards to visualize system health, performance metrics, and operational KPIs.
  • Alerts Setup
    Configure intelligent alerts based on thresholds and anomalies to ensure timely incident response.
  • Reports Scheduling
    Automate regular reporting to provide insights into system performance, reliability, and trends.
  • Reliability Metrics
    Define and track metrics such as SLOs, SLIs, and error budgets to measure and maintain system reliability.
  • Observability Skills
    Apply observability practices including distributed tracing, logging, and metrics collection to gain deep insights into system behavior.
  • AI-Driven Monitoring & Automation
    Utilize AIOps techniques to proactively detect anomalies, automate incident response, and enable self-healing systems through intelligent alerting and predictive analytics.
  • Observability & ML Integration
    Integrate machine learning models with observability tools to enhance system insights, optimize performance, and ensure reliability of AI-powered services in production.
  • Cross-Team Collaboration
    Work closely with development and support teams to enhance service reliability through rigorous testing and release procedures.
  • Capacity Planning
    Participate in system design reviews and capacity planning to ensure scalability and performance.
  • Debugging and Incident Response
    Lead incident response efforts, analyze debugging information, and manage rollbacks of faulty software deployments.
  • Mentoring Support Teams
    Guide and mentor L1/L2 support teams to establish best practices in monitoring and observability.
  • Infrastructure Management
    Manage infrastructure using tools like Chef, Ansible, Terraform, GitLab CI/CD, and Kubernetes.
  • Documentation
    Maintain comprehensive documentation of processes and procedures to ensure operational consistency and reduce redundancy.
  • Proactive Mindset
    Approach challenges with enthusiasm, ownership, and a continuous improvement mindset.
Experience LevelSenior Level
  • Architect With B.arch

    3 weeks ago


    Panchkula, HR, IN Architect Suri and Associates Full time

    Architect Suri and Associates www instagram com architectsuri based out of Panchkula Haryana are looking for youg individuals with Bachelors in Architecture B Arch and relevant expierience in CAD Drafting 3D Modelling to work full time at thier office in Panchkula Haryana Job Types Full-time Permanent Fresher Internship Contract length 12 months Pay 20 000...


  • Hyderabad, Bengaluru, Pune, India Growel Softech Private Limited Full time

    Job Description Description We are seeking a Reliability Architect to join our team in India. The ideal candidate will have extensive experience in designing and implementing reliable systems that can scale effectively. This role involves collaborating with various teams to ensure system performance and resilience. Responsibilities - Design and implement...


  • Hyderabad, India Cyient Full time

    Cyient is a global engineering and technology solutions company. As a Design, Build, and Maintain partner for leading organizations worldwide, we take solution ownership across the value chain to help clients focus on their core, innovate, and stay ahead of the curve. We leverage digital technologies, advanced analytics capabilities, and our domain knowledge...


  • Hyderabad, India WS Audiology APAC Full time

    We are looking for Cloud Reliability Architect with outstanding domain expertise in at least one of the following fields: containers, public clouds, and cloud-native workloads. As an SRE you will be responsible for ensuring the reliability, performance, and security of the operational backbone of a partly medical cloud-based product suite **What you will...

  • Architect

    1 week ago


    Hyderabad, India Birlasoft Full time

    Job Description Area(s) of responsibility Job Description: Reliability Architect 6A Reliability Architect with over 10 years of experience in proactive monitoring, automation, and observability. Skilled in AIOps/MLOps, infrastructure management, and performance optimization using modern tools and practices. Adept at leading incident response, mentoring...


  • IN - TDC (IN) UPS Full time ₹ 12,00,000 - ₹ 24,00,000 per year

    Before you apply to a job, select your language preference from the options available at the top right of this page.Explore your next opportunity at a Fortune Global 500 organization. Envision innovative possibilities, experience our rewarding culture, and work with talented teams that help you become better every day. We know what it takes to lead UPS into...


  • IN - TDC (IN) UPS Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    Before you apply to a job, select your language preference from the options available at the top right of this page.Explore your next opportunity at a Fortune Global 500 organization. Envision innovative possibilities, experience our rewarding culture, and work with talented teams that help you become better every day. We know what it takes to lead UPS into...


  • Hyderabad, Telangana, India Assurant Full time ₹ 6,00,000 - ₹ 12,00,000 per year

    Site Reliability Engineer, GCC-AssurantThe Site Reliability Engineer (SRE) will be part of the Assurant Reliability Team, specifically within the Site Reliability Engineering area. This remote position, based in India, focuses on building and maintaining reliable, scalable systems through a combination of software development and network diagnostics. The...


  • India Grootan Technologies Full time

    About the Role We are seeking a skilled Site Reliability Engineer (SRE) with 4–5 years of hands-on experience to join our engineering team. In this role, you will be responsible for building and maintaining reliable, scalable, and secure infrastructure to support our applications. You will leverage your expertise in automation, cloud platforms, and...

  • AWS Architect

    2 weeks ago


    India Zensar Technologies Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    DescriptionJob Title: AWS Solutions Architect – CloudFormation, Cloud WAN & Well-Architected DesignAbout the Role:We are seeking a highly skilled AWS Solutions Architect with deep expertise in Infrastructure as Code (IaC) using CloudFormation, global networking with AWS Cloud WAN, and designing cloud solutions that meet the highest standards of...