Reliability Architect
2 weeks ago
Architect
Area(s) of responsibilityJob Description: Reliability Architect – 6A
Reliability Architect with over 10 years of experience in proactive monitoring, automation, and observability. Skilled in AIOps/MLOps, infrastructure management, and performance optimization using modern tools and practices. Adept at leading incident response, mentoring support teams, and driving cross-functional collaboration to ensure system reliability and scalability.
Key Responsibilities:
- Monitoring and Automation
Proactively monitor software systems to prevent incidents and automate routine operational tasks. - Effective Monitoring
Design monitoring systems that trigger alerts based on symptoms rather than outages, ensuring early detection and resolution. - Application Performance Monitoring (APM)
Implement and manage APM tools like New Relic or Dynatrace to track application performance, identify bottlenecks, and optimize resource usage. - Log Analysis with Splunk
Use Splunk to analyze logs for troubleshooting, anomaly detection, and improving system reliability. - Dashboards Preparation
Build intuitive dashboards to visualize system health, performance metrics, and operational KPIs. - Alerts Setup
Configure intelligent alerts based on thresholds and anomalies to ensure timely incident response. - Reports Scheduling
Automate regular reporting to provide insights into system performance, reliability, and trends. - Reliability Metrics
Define and track metrics such as SLOs, SLIs, and error budgets to measure and maintain system reliability. - Observability Skills
Apply observability practices including distributed tracing, logging, and metrics collection to gain deep insights into system behavior. - AI-Driven Monitoring & Automation
Utilize AIOps techniques to proactively detect anomalies, automate incident response, and enable self-healing systems through intelligent alerting and predictive analytics. - Observability & ML Integration
Integrate machine learning models with observability tools to enhance system insights, optimize performance, and ensure reliability of AI-powered services in production. - Cross-Team Collaboration
Work closely with development and support teams to enhance service reliability through rigorous testing and release procedures. - Capacity Planning
Participate in system design reviews and capacity planning to ensure scalability and performance. - Debugging and Incident Response
Lead incident response efforts, analyze debugging information, and manage rollbacks of faulty software deployments. - Mentoring Support Teams
Guide and mentor L1/L2 support teams to establish best practices in monitoring and observability. - Infrastructure Management
Manage infrastructure using tools like Chef, Ansible, Terraform, GitLab CI/CD, and Kubernetes. - Documentation
Maintain comprehensive documentation of processes and procedures to ensure operational consistency and reduce redundancy. - Proactive Mindset
Approach challenges with enthusiasm, ownership, and a continuous improvement mindset.
-
Architect With B.arch
3 weeks ago
Panchkula, HR, IN Architect Suri and Associates Full timeArchitect Suri and Associates www instagram com architectsuri based out of Panchkula Haryana are looking for youg individuals with Bachelors in Architecture B Arch and relevant expierience in CAD Drafting 3D Modelling to work full time at thier office in Panchkula Haryana Job Types Full-time Permanent Fresher Internship Contract length 12 months Pay 20 000...
-
Reliability Architect
3 days ago
Hyderabad, Bengaluru, Pune, India Growel Softech Private Limited Full timeJob Description Description We are seeking a Reliability Architect to join our team in India. The ideal candidate will have extensive experience in designing and implementing reliable systems that can scale effectively. This role involves collaborating with various teams to ensure system performance and resilience. Responsibilities - Design and implement...
-
Reliability Architect
2 days ago
Hyderabad, India Cyient Full timeCyient is a global engineering and technology solutions company. As a Design, Build, and Maintain partner for leading organizations worldwide, we take solution ownership across the value chain to help clients focus on their core, innovate, and stay ahead of the curve. We leverage digital technologies, advanced analytics capabilities, and our domain knowledge...
-
Cloud Reliability Architect
2 weeks ago
Hyderabad, India WS Audiology APAC Full timeWe are looking for Cloud Reliability Architect with outstanding domain expertise in at least one of the following fields: containers, public clouds, and cloud-native workloads. As an SRE you will be responsible for ensuring the reliability, performance, and security of the operational backbone of a partly medical cloud-based product suite **What you will...
-
Architect
1 week ago
Hyderabad, India Birlasoft Full timeJob Description Area(s) of responsibility Job Description: Reliability Architect 6A Reliability Architect with over 10 years of experience in proactive monitoring, automation, and observability. Skilled in AIOps/MLOps, infrastructure management, and performance optimization using modern tools and practices. Adept at leading incident response, mentoring...
-
Site Reliability Engineers
6 days ago
IN - TDC (IN) UPS Full time ₹ 12,00,000 - ₹ 24,00,000 per yearBefore you apply to a job, select your language preference from the options available at the top right of this page.Explore your next opportunity at a Fortune Global 500 organization. Envision innovative possibilities, experience our rewarding culture, and work with talented teams that help you become better every day. We know what it takes to lead UPS into...
-
Site Reliability Engineers
6 days ago
IN - TDC (IN) UPS Full time ₹ 12,00,000 - ₹ 36,00,000 per yearBefore you apply to a job, select your language preference from the options available at the top right of this page.Explore your next opportunity at a Fortune Global 500 organization. Envision innovative possibilities, experience our rewarding culture, and work with talented teams that help you become better every day. We know what it takes to lead UPS into...
-
Site Reliability Engineer
2 days ago
Hyderabad, Telangana, India Assurant Full time ₹ 6,00,000 - ₹ 12,00,000 per yearSite Reliability Engineer, GCC-AssurantThe Site Reliability Engineer (SRE) will be part of the Assurant Reliability Team, specifically within the Site Reliability Engineering area. This remote position, based in India, focuses on building and maintaining reliable, scalable systems through a combination of software development and network diagnostics. The...
-
Site Reliability Engineer
6 days ago
India Grootan Technologies Full timeAbout the Role We are seeking a skilled Site Reliability Engineer (SRE) with 4–5 years of hands-on experience to join our engineering team. In this role, you will be responsible for building and maintaining reliable, scalable, and secure infrastructure to support our applications. You will leverage your expertise in automation, cloud platforms, and...
-
AWS Architect
2 weeks ago
India Zensar Technologies Full time ₹ 12,00,000 - ₹ 36,00,000 per yearDescriptionJob Title: AWS Solutions Architect – CloudFormation, Cloud WAN & Well-Architected DesignAbout the Role:We are seeking a highly skilled AWS Solutions Architect with deep expertise in Infrastructure as Code (IaC) using CloudFormation, global networking with AWS Cloud WAN, and designing cloud solutions that meet the highest standards of...