Site Reliability Engineer

1 month ago


Bengaluru, India emagine Consulting Full time

Site Reliability Engineer - Kubernetes

 · Responsible for the reliability and efficiency of infrastructure through the delivery of common, repeatable tools and processes that greatly reduce the amount of toil operations must perform

· Member of L3 Engineering team providing subject matter expertise and ultimate escalation

Primary:

· Develop software to make infrastructure services self-managing and self-service dashboards.

· Deliver continuous service improvement by developing Infrastructure as Code

· Eliminate manual, repetitive, automatable, tactical tasks that are devoid from value

· Improve system performance, make effective use of resources, distribute load and reduce latency

· Identify SLO’s (Service Level Objectives) to meet availability and latency objectives

· Develop pro-active monitoring solutions that alert on symptoms and not just on outages

· Perform detailed root cause analysis (RCA’s) on incidents and outages to prevent future

· Partner with development teams to improve services via rigorous testing and release procedures

· Identify technical debt and partner with application teams to build remediation plans

· Develop standard operational procedures and produce effective documentation

· Analyse workloads and devise suitable cloud migration strategies where appropriate

· Ensure all project / investment workloads are delivered according to plans and budget defined

· Liaise with Infrastructure Control and IT Risk teams to satisfy internal and external audit requests

· Deputise for team lead when required to do so and act-up accordingly

· Identify cost saving and optimisation opportunities across the group

· Build strong working relationships across the organisation

· Adhere to the core values of the bank

Secondary:

· Perform daily health and compliance checks for all systems as required

· Ensure all systems are backed up successfully and any issues are promptly resolved

· Validate monitoring alerts and batch job failures are detected promptly and satisfactorily resolved

· Ensure sufficient capacity is available to accommodate drive growth

· Respond to emails sent to the team distribution list / mailboxes in a timely manner

· Handle incidents and requests with efficiency and a “customer first” mindset

· Maintain infrastructure in a highly available, reliable, secure and performant manner

· General Server / Database / Virtualisation Administration maintenance activities

· Provide technical support to application support and development teams

· Provide consultancy to application support and development teams

· Take part in On-Call & weekend work rotation; triaging and addressing production issues as they arise

SKILLS AND EXPERIENCE

Essential:

· Exceptional skills in Docker/Kubernetes deployment and configuration, scaling and management of containerized applications.

· Excellent skills in managing, performance optimisation of complex Prometheus, Influxdb and Grafana monitoring stack.

· Excellent skills in writing/maintaining Grafana Dashboard using PromQL, InfluxQL/Flux.

· Experience in distributed technologies like Rook, Ceph, Noobaa, Trino, MariaDB Xpand, Dremio, Kibana, KX platform

· Experience in CI/CD/CT platforms like Git, Ansible, Terraform and TeamCity

· Serena Deployment Automation (SDA) and Jenkins

· “Infrastructure as Code” Principles and practices.

· “Continuous Integration (CI) and Continuous Development (CD)” Principles and practices

· Agile, Site Reliability Engineering (SRE) and DevOps Principles and practices

· Scripting and programming languages such as PowerShell, Python, Bash and C#

· Fluent in Backup and Recovery processes and procedures

· Advanced knowledge of Clustering, High-Availability, Replication and Disaster Recovery techniques

· Ability to tune Network, Storage, Server and Virtualisation layers for optimal performance and reliability

· Excellent Performance Tuning skills, in-depth knowledge of system internals

· Ability to interpret and implement CIS security hardening recommendations in a controlled manner

· Acute awareness of Security and Auditing requirements in a regulated environment

Highly Desirable:

· RHEL, Oracle Linux, Oracle Solaris and related technologies

· Microsoft Windows Server and related technologies

· Microsoft SQL Server, Oracle, Sybase ASE, MongoDB and Snowflake

· Active Directory, LDAP and Kerberos

· IBM Tivoli / Netcool

· Nutanix HCI and VMWare ESX

· Networking Protocols (TCP/IP, DNS, DHCP, VLAN’s)

· Cloud computing - IaaS, PaaS and SaaS offerings across Azure, AWS, GCP and Oracle

· Knowledge of data security governance and regulations such as GDPR and SOX

Desirable:

· Dell EMC PowerStore (SAN) and Isilon (NAS)

· Rubrik, EMC Networker, Data Domain and IBM Tivoli Storage Manager

· CyberArk

· Splunk

· Qualys

· Cisco Tetration

· ServiceNow

· JIRA and Confluence



  • Bengaluru, India Ensono Full time

    About RoleEnsono is continuing its growth and building a cloud-native managed service offering for our clients. We are looking for energetic and skilled remote Site Reliability Engineers to join us on this exciting new journey. As a Site Reliability Engineer, you and your team will be responsible for between four and ten of Ensono cloud-native managed...


  • Bengaluru, India Ensono Full time

    About RoleEnsono is continuing its growth and building a cloud-native managed service offering for our clients. We are looking for energetic and skilled remote Site Reliability Engineers to join us on this exciting new journey. As a Site Reliability Engineer, you and your team will be responsible for between four and ten of Ensono cloud-native managed...


  • Bengaluru, India Cyitechsearch Full time

    We are hiring for Site Reliability Engineer Skills : - Develop and provide operational support for full-stack software applications.- Relevant industry certifications, such as through the Site Reliability Engineering (SRE) Foundation.- Five years' experience as a site reliability engineer or similar role.- Collaborate with development operations staff...


  • Bengaluru, India Ensono Full time

    About RoleEnsono is continuing its growth and building a cloud-native managed service offering for our clients. We are looking for energetic and skilled remote Site Reliability Engineers to join us on this exciting new journey. As a Site Reliability Engineer, you and your team will be responsible for between four and ten of Ensono cloud-native managed...


  • Bengaluru, India Ensono Full time

    About RoleEnsono is continuing its growth and building a cloud-native managed service offering for our clients. We are looking for energetic and skilled remote Site Reliability Engineers to join us on this exciting new journey. As a Site Reliability Engineer, you and your team will be responsible for between four and ten of Ensono cloud-native managed...


  • Bengaluru, India Cricbuzz.com Full time

    Site Reliability EngineerWe are looking for a highly skilled and motivated Web Server Site Reliability Engineer to join our team. As a Web Server Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our web server infrastructure and CDN services.Experience - 3 - 5 years Responsibilities:● Design,...


  • Bengaluru, India Cricbuzz.com Full time

    Site Reliability EngineerWe are looking for a highly skilled and motivated Web Server Site Reliability Engineer to join our team. As a Web Server Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our web server infrastructure and CDN services.Experience - 3 - 5 years Responsibilities:● Design,...


  • Bengaluru, India ViewSonic Full time

    Job Requirements:1. Bachelor's degree in Computer Science, Engineering, or a related field.2. 1+ year of experience in a relevant role, such as Site Reliability Engineer, DevOps Engineer, or similar, is preferred but not mandatory.3. Basic understanding of AWS solutions including EC2, S3, CloudWatch, Lambda, and RDS.4. Interest and understanding of Platform...


  • Bengaluru, India ViewSonic Full time

    Job Requirements: Bachelor's degree in Computer Science, Engineering, or a related field. 1+ year of experience in a relevant role, such as Site Reliability Engineer, DevOps Engineer, or similar, is preferred but not mandatory. Basic understanding of AWS solutions including EC2, S3, CloudWatch, Lambda, and RDS. Interest and understanding of Platform...


  • Bengaluru, India ViewSonic Full time

    Job Requirements:Bachelor's degree in Computer Science, Engineering, or a related field.1+ year of experience in a relevant role, such as Site Reliability Engineer, DevOps Engineer, or similar, is preferred but not mandatory.Basic understanding of AWS solutions including EC2, S3, CloudWatch, Lambda, and RDS.Interest and understanding of Platform Engineering...


  • Bengaluru, India Qure.ai Full time

    About the jobJob Title: Site Reliability EngineerDepartment: EngineeringLocation: BangaloreYears of experience: 2-5 yearsType: Full Time EmploymentAbout Qure.ai:Qure.ai is one of the fastest-growing startups in India, which develops Artificial Intelligence enabled products and platforms for healthcare diagnostics. We create cutting-edge solutions that...


  • Bengaluru, India ViewSonic Full time

    Job Requirements:1. Bachelor’s degree in computer science, Engineering, or a related field.2. 3+ years of experience as a Site Reliability Engineer, DevOps Engineer, or similar role.3. Proficient in AWS solutions including but not limited to EC2, S3, CloudWatch, Lambda, and RDS.4. Strong understanding of Platform Engineering concepts and principles.5....


  • Bengaluru, India ViewSonic Full time

    Job Requirements:Bachelor’s degree in computer science, Engineering, or a related field.3+ years of experience as a Site Reliability Engineer, DevOps Engineer, or similar role.Proficient in AWS solutions including but not limited to EC2, S3, CloudWatch, Lambda, and RDS.Strong understanding of Platform Engineering concepts and principles.Experience with...


  • Bengaluru, India ViewSonic Full time

    Job Requirements:Bachelor’s degree in computer science, Engineering, or a related field.3+ years of experience as a Site Reliability Engineer, DevOps Engineer, or similar role.Proficient in AWS solutions including but not limited to EC2, S3, CloudWatch, Lambda, and RDS.Strong understanding of Platform Engineering concepts and principles.Experience with...


  • Bengaluru, India Integra Connect Full time

    About IntegraConnectIntegra Connect delivers a comprehensive, integrated suite of cloud-based technologies and services that enable specialty groups to optimize clinical and financial performance as reimbursement shifts to value-based models. Connected by the IntegraCloud platform, the company’s core applications span population health including care...


  • Bengaluru, Karnataka, India Signify Netherlands B.V. Full time

    Site Reliability EngineerSignify, the new company name of Philips Lighting, is the global leader in lighting building on 125+ years of innovations. Our purpose is to unlock the extraordinary potential of light for brighter lives and a better world.We are proud to be ahead of the game in the Internet of Things and on track to be carbon neutral by 2020. We...


  • Bengaluru, India Microsoft Full time

    OverviewLooking to join an exciting industry and organization at the forefront of the next Tech industry transformation? Are you ready to join a team of the world’s best technical experts to enable the success of Microsoft solutions for our commercial & enterprise customers? We are seeking to build out the team of next generation Site Reliability Engineers...


  • Bengaluru, India Ultrabot Innovations Full time

    Position Overview :As a Senior Site Reliability Engineer with 5-8 years of experience, you will play a key role in ensuring the reliability, scalability, and performance of our systems and infrastructure. You will leverage your expertise in Site Reliability Engineering (SRE) to implement best practices and methodologies, effectively troubleshoot complex...


  • Bengaluru, India TERRAGIG LLP Full time

    Role : Site Reliability EngineerExperience : 5+ Years Work Model : Remote / Contract 3 years Skills :- Develop and provide operational support for full-stack software applications.- Relevant industry certifications, such as through the Site Reliability Engineering (SRE) Foundation.- Five years' experience as a site reliability engineer or similar role.-...


  • Bengaluru, India Andor Tech Full time

    Role: Site Reliability EngineerExp: 5 to 7 yrsSkills Required:Primary Skill: LinuxPreferred: Python & MysqlPreferred Qualifications:- 5+ years in python programming, specifically for systems automation.- 1+ years of experience with Distributed data systems- Previous experience working with geographically-distributed coworkers.- Strong interpersonal...