
MindCraft Software
16 hours ago
SRE (Site Reliability Engineer)
Exp: 5-7 years
Location: Thane
- 5+ years in SRE or DevOps roles supporting high-scale platforms (fintech, OTT, ecommerce, net banking).
- Expertise in uptime and troubleshooting distributed systems (Redis, Golang, DocDB).
- Strong networking skills, including network and DNS troubleshooting.
- Experience with monitoring/APM tools (Kibana, Grafana, Instana, Dynatrace).
- Hands-on with container orchestration on AWS EKS and Red Hat OpenShift.
- Proficiency in CI/CD, cloud infrastructure (AWS/Azure), and infrastructure automation.
Preferred :
- Experience operating highly available, scalable platforms.
- Relevant AWS/Azure or SRE certifications.
- A proactive Site Reliability Engineer (SRE) to ensure 99.99% uptime for our scalable, multi-tier microservices platform.
- You will troubleshoot both networking and application uptime issues, supporting seamless service delivery.
Key Responsibilities :
- Maintain strict SLOs (99.99% uptime) across distributed systems including Redis, Golang services, and DocDB.
- Diagnose and resolve complex application and network issues, including DNS troubleshooting and network latency.
- Use monitoring and observability tools such as Kibana, Grafana, Instana, and Dynatrace for proactive incident detection.
- Automate infrastructure and workflows with Python, Bash, Terraform, and Ansible.
- Manage container orchestration on AWS Elastic Kubernetes Service (EKS) and Red Hat OpenShift, ensuring high availability and scalability.
- Collaborate with development and QA teams to embed reliability best practices and improve system observability.
- Participate in on-call rotations, incident response, and blameless postmortems.
- Document runbooks and mentor junior engineers on SRE and networking fundamentals.
(ref:hirist.tech)