MindCraft Software

16 hours ago


Thāne, Maharashtra, India MindCraft Software Pvt. Ltd. Full time

SRE (Site Reliability Engineer)

Exp: 5-7 years

Location: Thane

- 5+ years in SRE or DevOps roles supporting high-scale platforms (fintech, OTT, ecommerce, net banking).

- Expertise in uptime and troubleshooting distributed systems (Redis, Golang, DocDB).

- Strong networking skills, including network and DNS troubleshooting.

- Experience with monitoring/APM tools (Kibana, Grafana, Instana, Dynatrace).

- Hands-on with container orchestration on AWS EKS and Red Hat OpenShift.

- Proficiency in CI/CD, cloud infrastructure (AWS/Azure), and infrastructure automation.

Preferred :

- Experience operating highly available, scalable platforms.

- Relevant AWS/Azure or SRE certifications.

- A proactive Site Reliability Engineer (SRE) to ensure 99.99% uptime for our scalable, multi-tier microservices platform.

- You will troubleshoot both networking and application uptime issues, supporting seamless service delivery.

Key Responsibilities :

- Maintain strict SLOs (99.99% uptime) across distributed systems including Redis, Golang services, and DocDB.

- Diagnose and resolve complex application and network issues, including DNS troubleshooting and network latency.

- Use monitoring and observability tools such as Kibana, Grafana, Instana, and Dynatrace for proactive incident detection.

- Automate infrastructure and workflows with Python, Bash, Terraform, and Ansible.

- Manage container orchestration on AWS Elastic Kubernetes Service (EKS) and Red Hat OpenShift, ensuring high availability and scalability.

- Collaborate with development and QA teams to embed reliability best practices and improve system observability.

- Participate in on-call rotations, incident response, and blameless postmortems.

- Document runbooks and mentor junior engineers on SRE and networking fundamentals.

(ref:hirist.tech)