
15h Left) Lead Site Reliability Engineer
3 weeks ago
What you''ll be Responsible for?
- Build, monitor and maintain highly scalable, large-scale deployments.
- Installation/deployment of new releases, environments for applications.
- Proactively monitor systems and applications, develop and maintain monitoring tools and dashboards, and ensure high availability of production environments by identifying performance issues and implementing corrective actions.
- Incident Management: Lead incident response efforts, diagnose root causes, and implement long-term solutions to prevent recurrence. Ensure effective communication during outages.
- Collaboration & Coordination: Work closely with cross-functional teams to ensure efficient platform integration, API management, and campaign execution, while providing technical guidance and support as needed.
- Troubleshooting and Root Cause Analysis: Utilize your expertise to investigate and resolve incidents quickly during crisis situations, performing root cause analysis to prevent recurrence.
- Ensure high availability of production environments by monitoring performance metrics and implementing corrective actions when necessary.
- Platform Integration: Manage and oversee the integration of various APIs, ensuring seamless interoperability between systems and third-party services.
- Support the compliance and security integrity of the environments.
- Adherence to process compliance & ensuring platform reliability.
- Experience in monitoring and automations in Prometheus Grafana or ELK or Datadog or Dynatrace or any observability tools
- Experience with container management and micro-services architectures such as Docker in cloud or on-premises infrastructure.
What You'd have?
- Kubernetes: Expertise in creation, maintenance, scaling, and upgrades of Production clusters.
- Docker: Must have experience in writing Docker files complying with Industry standard best practices.
- CI/CD: Must have hands-on experience with Azure-DevOps/Jenkins in creation & Execution of Pipelines in a multi-target environment.
- Troubleshooting skills: Expertise in analysis of applications logs to drilldown in identification of the issue with expertise on logging stacks such as ELK, Dynatrace, Splunk
- Monitoring Stacks: Expertise in using Grafana with skills on building & managing of dashboards on various data sources in Grafana.
- Programming Skills: Experience in creating & managing of Bash scripts & Ansible with some exposure on Terraform.
- Environment: Excellent skills and hands-on in Linux environments and able to troubleshoot issues at OS levels.
- Experience on usage of project management tools such as JIRA
- Experience in deploying & Managing of Distributed Queuing systems such as Redis, Kafka Rabbit-MQ, IBM-MQ, MSMQ
- Experience in deploying & managing of Databases in standalone & cluster modes with basic DB Skills on Postgres, MySQL, Click House
- Prior experience in working on high traffic & highly scalable platforms is an added advantage.
- Good command on Linux, Networking concepts (TLS/SSL, DNS, Load Balancers, etc.,) and troubleshooting skills in large scale environments
- Deep understanding of basic security concepts and protocols - authentication, authorization, signing, encryption, SSL/TLS, SSH/SFTP, X509 certificates
- Good knowledge of ITIL terminology for incident and problem management
- Track record of excellent interpersonal, analytical, and communication skills.
- Bachelor of Science in Computer Science or other related discipline.
Why join us?
- Impactful Work: Play a pivotal role in safeguarding Tanla's assets, data, and reputation in the industry.
- Tremendous Growth Opportunities: Be part of a rapidly growing company in the telecom and CPaaS space, with opportunities for professional development.
- Innovative Environment: Work alongside a world-class team in a challenging and fun environment, where innovation is celebrated. Tanla is an equal opportunity employer.
Tanla is an equal opportunity employer. We champion diversity and are committed to creating an inclusive environment for all employees.
www.tanla.com
-
Lead Site Reliability Engineer
4 weeks ago
Hyderabad, Telangana, India JP Morgan Chase & Co. Full timeJob DescriptionAssume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability.As a Lead Site Reliability Engineer at JPMorgan Chase within the Consumer & Community Banking Team, you will take the lead in conducting resiliency design reviews, break...
-
15h Left Junior Site Reliability Engineer
3 weeks ago
Hyderabad, Telangana, India Cubic Corporation Full timeJob DescriptionBusiness Unit:Cubic Transportation SystemsCompany Details:When you join Cubic, you become part of a company that creates and delivers technology solutions in transportation to make people's lives easier by simplifying their daily journeys, and defense capabilities to help promote mission success and safety for those who serve their nation. Led...
-
Lead - Site Reliability Engineer
3 weeks ago
Hyderabad, Telangana, India VXI Global Solutions Full timeWe are looking for a Lead - Site Reliability Engineer with 8+ years for Experience into design, implement, and manage robust observability solutions across our cloud infrastructure and applications. The ideal candidate will have hands-on experience with Prometheus, Grafana, Google Cloud Monitoring, and OpenTelemetry, along with exposure to SolarWinds. You...
-
Site Reliability Engineer
2 days ago
Hyderabad, Telangana, India Talent Worx Full time ₹ 15,00,000 - ₹ 25,00,000 per yearSite Reliability Engineer (SRE)At Talent Worx, we are looking for a dedicated Site Reliability Engineer (SRE) to join our team. This role involves maintaining high availability and reliability of our services through the application of software engineering practices and systems administration skills. The ideal candidate will bridge the gap between...
-
Site Reliability Engineer
2 days ago
Hyderabad, Telangana, India Apple Full time ₹ 15,00,000 - ₹ 25,00,000 per yearImagine what you could do here. Apple is a place where extraordinary people gather to do their best work. Together we craft products and experiences people once couldn't have imagined — and now can't imagine living without. If you're motivated by the idea of making a real impact, and joining a team where we pride ourselves in being one of the most diverse...
-
Site Reliability Engineer
3 weeks ago
Hyderabad, Telangana, India IntraEdge Full timeSite Reliability EngineerExperience: 7+ YearsLocation: HyderabadHybrid 4-day office and 1 Day remoteSkills for Principal:- Strong leadership and people management skills.- Exceptional technical proficiency in Pearson's technology stack.- Advanced project management capabilities.- Excellent communication and collaboration skills.- Adept at risk assessment and...
-
Senior Site Reliability Engineer
3 weeks ago
Hyderabad, Telangana, India Options Executive Search Private Limited Full timeJob Title : SRE Lead Engineer. Location : Hyderabad, India. We are seeking a DevOps / SRE Lead Engineer to architect and scale our client's multi-tenant SaaS platform with AI/ML at the core. Our client, a fast-growing AI-powered SaaS company in the FinTech space, is looking for a Site Reliability Engineering (SRE) Lead Engineer to join their dynamic team....
-
Site Reliability Engineer
3 weeks ago
Hyderabad, Telangana, India ServiceNow Full timeSite Reliability Engineer (SRE)Experience : 6+ YearsAbout the Role : We are seeking a seasoned SRE to ensure the reliability, availability, and performance of our critical services. You will combine software engineering with systems administration to create scalable and highly reliable software systems.Responsibilities : - Design, build, and maintain...
-
Site Reliability Engineer
3 weeks ago
Hyderabad, Telangana, India IntraEdge Full timeSite Reliability Engineer Experience: 7+ Years Location: Hyderabad Hybrid 4-day office and 1 Day remote Skills for Principal: Strong leadership and people management skills. Exceptional technical proficiency in Pearson's technology stack. Advanced project management capabilities. Excellent communication and collaboration skills. Adept at risk assessment...
-
Site Reliability Engineer
2 weeks ago
Hyderabad, Telangana, India IntraEdge Full timeSite Reliability EngineerExperience: 7+ YearsLocation: HyderabadHybrid 4-day office and 1 Day remoteSkills for Principal:Strong leadership and people management skills.Exceptional technical proficiency in Pearson's technology stack.Advanced project management capabilities.Excellent communication and collaboration skills.Adept at risk assessment and crisis...