Site Reliability Engineer- Platform Engineering
2 weeks ago
This role is for one of Weekday's clients
Min Experience: 4 years
JobType: full-time
We are looking for an experienced and motivated Site Reliability Engineer (SRE) – Platform Engineering to join our growing technology team. In this role, you will be responsible for designing, building, and maintaining scalable, resilient, and secure infrastructure platforms that support business-critical applications and services. The SRE will work at the intersection of software development and systems engineering to ensure the availability, performance, and reliability of our platforms.
This role requires deep expertise in automation, cloud-native technologies, monitoring, and platform operations. The ideal candidate is passionate about solving complex infrastructure challenges, streamlining deployment pipelines, and building highly reliable systems.
Key Responsibilities- Platform Engineering: Design, implement, and optimize platform services and infrastructure to ensure high availability, scalability, and performance.
- Reliability & Resilience: Build self-healing and fault-tolerant systems while proactively identifying and eliminating reliability risks.
- Automation: Develop Infrastructure as Code (IaC) solutions using tools like Terraform, Ansible, or CloudFormation to automate infrastructure provisioning and configuration.
- Monitoring & Observability: Implement monitoring, logging, and alerting systems using tools such as Prometheus, Grafana, ELK, or Datadog to track platform health and performance.
- Incident Management: Troubleshoot incidents, perform root cause analysis, and ensure timely resolution while minimizing downtime and customer impact.
- DevOps & CI/CD: Collaborate with development teams to enhance CI/CD pipelines for seamless deployment and integration, ensuring reliability in production environments.
- Cloud Infrastructure: Manage cloud environments (AWS, Azure, or GCP) and optimize for cost, security, and performance.
- Security & Compliance: Implement security best practices, monitor vulnerabilities, and ensure compliance with industry standards across infrastructure and platforms.
- Collaboration: Partner with software engineers, product teams, and IT operations to align infrastructure capabilities with business requirements.
- Continuous Improvement: Analyze existing infrastructure and processes, identifying areas for improvement, and implementing best practices for operational efficiency.
- Capacity Planning: Forecast infrastructure requirements, ensuring the platform is always prepared to handle current and future workloads.
- Bachelor's degree in Computer Science, Information Technology, or related field. Equivalent practical experience may be considered.
- 4+ years of experience in Site Reliability Engineering, DevOps, or Platform Engineering.
- Strong proficiency with cloud platforms (AWS, Azure, or GCP).
- Hands-on experience with Infrastructure as Code (Terraform, Ansible, or CloudFormation).
- Solid understanding of Linux systems administration, networking, and container orchestration (Docker, Kubernetes).
- Experience with CI/CD pipelines (Jenkins, GitLab CI, or similar tools).
- Proficiency in scripting/programming languages such as Python, Go, Bash, or Java.
- Strong knowledge of monitoring and observability tools (Prometheus, Grafana, ELK, Datadog, Splunk).
- Familiarity with incident response and on-call support practices.
- Knowledge of security best practices and compliance frameworks.
- Excellent problem-solving, debugging, and analytical skills.
- Strong communication and collaboration abilities to work effectively across cross-functional teams.
-
Site Reliability Engineer
7 days ago
Bengaluru, India Relanto Full timeJob Description Job Title: Site Reliability Engineer Summary We are looking for a Site Reliability Engineer to join our Digital & Transformation department. The ideal candidate will have 2-3 years of experience in this field and will be responsible for ensuring the reliability, availability, and performance of our systems and applications. Roles And...
-
Site Reliability Engineer
4 weeks ago
Bengaluru, Karnataka, India, Karnataka ViewSonic Full timeJob Requirements:Bachelor's degree in Computer Science, Engineering, or a related field.3+ year of experience in a relevant role, such as Site Reliability Engineer, DevOps Engineer, or similar, is preferred but not mandatory.Basic understanding of AWS solutions including EC2, S3, CloudWatch, Lambda, and RDS.Interest and understanding of Platform Engineering...
-
Site Reliability Engineer
3 days ago
IND - Karnataka - BANGALORE, India Globalfoundries Engineering Private Limited Full time ₹ 12,00,000 - ₹ 24,00,000 per yearSite Reliability Engineer About GlobalFoundries GlobalFoundries is a leading full-service semiconductor foundry providing a unique combination of design, development, and fabrication services to some of the world's most inspired technology companies. With a global manufacturing footprint spanning three continents, GlobalFoundries makes possible the...
-
Site Reliability Engineer
3 weeks ago
, India, IN Sonata Software Full timeWe're Hiring: Senior Site Reliability Engineer Location: Onsite (Office: Hyderabad – Mandatory from Day 1) Employment Type: Full-time Notice Period: Immediate to 15 Days Only Experience: 8+ Years About the RoleWe’re looking for a Senior Site Reliability Engineer (SRE) to lead reliability initiatives across our production systems. This is a high-impact...
-
Site Reliability Engineer
2 weeks ago
India Akamai Full time ₹ 5,00,000 - ₹ 15,00,000 per yearDo you want to grow your career in Linux and Site Reliability Engineering?Would you like to contribute to the foundation of a new public cloud platform?Join our IaaS Site Reliability Engineering (SRE) team.We design, develop, and operate infrastructure and services that power the backbone of our cloud platform. This is a rare opportunity to help build a...
-
Site Reliability Engineer
4 weeks ago
Bengaluru, Karnataka, India, Karnataka HDFC Limited Full timeHiring for Lead / Sr Site Reliability Engineer for Mumbai & Bangalore LocationExperience - 8 - 14 Years Job PurposeAnalysing, troubleshooting, and designing vital services, platforms, and infrastructure on GCP while always thinking about reliability, scalability, resilience, security, and performance. Job Responsibilities: Help build a Site Reliability...
-
Site Reliability Engineer
4 weeks ago
Noida, Uttar Pradesh, India, Ghaziabad CorroHealth Full timeWe are seeking a highly skilled Site Reliability Engineer (SRE) to join our team. The ideal candidate will have a deep understanding of both software engineering and systems administration, with a focus on creating scalable and reliable systems. You will work closely with development and operations teams to ensure the reliability, availability, and...
-
Site Reliability Engineer
4 days ago
Chennai, India Ford Motor Company Full timeJob Description Job Description Job Description: Ford is seeking an experienced Site Reliability Engineer (SRE) to join our team and lead the development, enhancement, and extension of our global monitoring and observability platform. Enterprise Technology plays a critical part in shaping the future of mobility. If you're looking for the chance to leverage...
-
Site Reliability Engineer
3 weeks ago
Pune, India TechVerito Full timeJob Description About the Role: 3-5 years of proven and progressive experience as an SRE or DevOps Engineer. As a SRE Engineer, you will have a strong background in cloud infrastructure management, migration and deployment, with expertise in Google Cloud Platform (GCP), DevOps tools, and Kubernetes ecosystem. The primary focus of this role will be to migrate...
-
Site Reliability Engineer
4 days ago
India CareerUS Solutions Full timeJob Description Position Overview: The Site Reliability Engineer (SRE) is responsible for ensuring the stability, scalability, performance, and reliability of production systems and services. This role bridges software development and operations, using automation, monitoring, and performance optimization to build resilient systems that can scale efficiently...