Dev Ops Engineering III-SUPPORT SERVICES-Applications-CTB

2 weeks ago


Bengaluru, Karnataka, India Kotak Mahindra Bank Full time US$ 1,25,000 - US$ 1,75,000 per year

Title : Observability Platforms and SRE Engg.

The Company : World of Kotak product suite encompasses a powerful suite of cross banking assets, all-in-one stop banking services, securities, and investment banking; insights across a wide spectrum of the major financial and banking markets.

The Team : You will be working with a team of highly seasoned set of Observability Platform and Site Reliability Engineers part of the Run-The-Bank initiative to deliver Engineering and Technology Operations Excellence for Kotak Banking Product Suite and associated delivery platform.

The Observability Platforms and SRE team is a group of experts developing, maintaining, scaling Observability Platform solutions, driving engineering and automation within the Banking Solutions platform and operation in onPrem and the cloud.

We are looking for a highly motivated individual to take on a role of a Observability Platform Engg. and SRE to help implement our platforms using Open-Source and Enterprise solutions, through IaC, automated operations and configuration management, bringing together observability, and engineering for architecture and operational excellence.

The role will have to develop, test, validate software and hardware systems that enable our Observability Platform. Coordinate the processes and tools to support site stability, resilience and performance of the banking system that is capable of supporting multiple business requirements across an array of technologies. The Engineer will work across Architecture, development, Infrastructure and vendor teams to deliver and support the Observability Platform and SRE guided processes and tools supporting the banking systems.

Impactfulness : The team has an opportunity to advocate and participate in building engineering services that are resilient, optimally monitored, alerted and capability to self-heal thorough reliability engineering practices using software and runbook automation tools to deliver world class banking and related content globally.

  • Observability Platform engineers will implement site-wide Observability solutions for metrics, logs, traces, alerting and monitoring to be used by development and business teams across the org to monitor their systems and applications. Site Reliability Engineering (SREs) is responsible for keeping all user-facing services, user journey and other Kotak production systems running smoothly.

  • Said engineers should be a match of software engineers and pragmatic system engineers that embed operational discipline with engineering principles, and mature automation and documentation to our operating environments and associated Kotak Code Base.

  • Said engineers would have expertise in systems (networking, operating systems, storage, etc), while implementing best practice guidelines for stability, availability, reliability and scalability while keeping the compute and cost factor optimal.

  • Kotak Platforms are critical applications that have unique used cases and challenges associated that would need to be optimized over time with re-engineering and revised tools and practices.

What's in it for you: / Role : An Observability Platform Engg and SRE is ultimately accountable for building, maintaining and scaling an Observability Platform that can be used by various systems across the Org. They are also accountable for system reliability, resiliency, scalability and reducing time to market by striving to improve end to end service and reduce technical debt. We seek leaders who are passionate about observability and system reliability to influence and drive the strategic platform mission and maturity.

Your mission will be to ensure our services are fast, highly available, and run efficiently through scaling optimally during peak business traffic and load. Your focus would be to solve production problems across the stack going up to the edge. Gain critical domain knowledge to effectively troubleshoot symptoms that impair health leading to performance degradation or service outages. The position requires the flexibility to take a holistic approach to troubleshooting and the ability to deep dive into core technical details working with various development, infra and vendor teams. Build automation tools and processes for system health and acceptance tests to validate changes in lower environment leading to production changes. The Systems Reliability Engineer will ensure the system is well instrumented and highly fault tolerant with proper metrics to report upon.

Key Leadership Responsibilities:

  • Influence and drive engagement on Observability and SRE practices with development, engineering and product groups to align solution delivery with technology services.

  • Build quality engineering practices around automation through well-defined processes and monitoring metrics that exhibit process quality.

  • Conduct transparent and effective blameless post mortems and ensuring Post Incident Reviews have clear Root Cause and Actions with Problem tickets and closures.

  • Deliver on availability, latency, performance, scalability of Kotak applications by evangelizing engineering principles into development lifecycle with a template on fault tolerant at each level.

  • Drive non functional requirement review including capacity planning, cost analysis and instrumentation integration to provide complete delivery cycle.

  • Define Observability and SRE initiatives, tasks and report to all stakeholders, business and build a onboarding template for new and future applications.

  • Implement metrics driven approach towards service quality targets.

Basic Qualification s : 7+ years system & solutions engineering, software development, or technology operations background with 3+ years work experience working as a Systems Engineer, DevOps and/or SRE Roles.

  • Experience automating infrastructure, testing, and deployments using tools like Terraform, CFT with Jenkins, Ansible, Chef & other industry recognized tools to deliver Infrastructure as Code.

  • Relevant work experience or familiar with languages / web technologies (Python, Java,C, C++, ASP.NET, JavaScript, Go etc)

  • Experience with 2 or more scripting languages such as python, perl, unix shell, powershell, groovy, etc...

  • Experience with AWS technologies: VPC, EC2, EKS, ELB, RDS, Lambda, SES, SNS, Containers, etc.

  • Experience with any identity management systems such as (SAML/OAuth), MFA, etc.

  • CI/CD delivery using code and configuration management automation tools such as GitHub, VSTS, Ansible, DSC, Puppet, Ambari, Chef, Salt, Jenkins, Maven, etc.

  • Delivery using modern methodologies especially SAFE Agile, Lean, etc.

  • Experience with networking protocols, CDN, App acceleration, Load Balancers, DNS, VPN, PaaS, IaaS, etc.

  • Experience with troubleshooting networking protocols such as TCP/IP, HTTPS/ TLS/ Websockets, Multicast and Broadcast messaging.

  • Experience with cloud infrastructure, storage, platforms, data and with containers (Kubernetes, Container, Docker, virtualization).

  • Experience with monitoring and observability such as with Grafana, Prometheus, Datadog, Splunk, AppDynamics, New Relic, and Nagios, etc.

Preferred Qualifications :

  • Bachelor's/Master's Degree in Computer Science, Information Systems, or equivalent

  • AWS Certified Solution Architect – Professional/Associate

  • Good Leadership skills capable of leading a team.

  • Good communication skills and a sense of ownership and drive.

  • Have a software-centric mindset and can understand the full software stack – and beyond.

  • Embrace automation over manual effort, debugging complex problems and view problems as an opportunity to improve.

  • Experience designing, building, and operating large-scale production systems

  • Experience working in enterprise-scale internal or customer-centric projects.

  • Experience working closely with development & engineering teams.

  • Good understanding of software development lifecycle (SDLC) and Software Testing in an Agile/Scrum framework.

  • Strong analytical thinking, problem solving, oral and written communication skills.

  • Experience working with multiple stakeholders and vendors at various levels.

  • Understanding of SQL and databases, should be comfortable in writing SQL queries

  • Hands on doing operational automation using any automation framework.

  • Good knowledge of working with SOAP, REST services and SOA architecture.

  • Knowledge of testing in continuous integration/DevOps models is a plus.

  • Understanding of Cloud technologies like AWS/Azure and micro-services, containers.

  • Experience in DevOps, Big Data Testing, IOT, Cloud will be added advantage.

  • Experience automating infrastructure, testing, and deployments using Terraform, CFT with Ansible, Rundeck, Autosys, Jenkins to deliver Infrastructure as Code.

  • Experience working with the Rundeck tool (Design, Setup, Deployment, Automation & Integration)

  • Terraform / Kubernetes / Ansible expertise a plus

Responsibilities :

  • Experience with maintaining SLA 99.99% of the Banking Platform and Applications.

  • Experience in troubleshooting and resolving incidents and using problem management to bring about service improvement using automation to drive resiliency and stability.

  • Experience in service restoration through standard automized tools and engineering processes to reduce our downtime and improve our SLA/SLI/SLO metrics.

  • Creating production and migration schedules for large projects with timelines/milestones

  • Develop and leverage AWS tools and services to manage and automate key operations capabilities.

  • Proactively ensure the highest levels of systems and infrastructure availability

  • Monitor and test application performance for potential bottlenecks, identify possible solutions and work with developers to implement those fixes.

  • Write and maintain custom scripts to increase system efficiency and reduce human intervention time on tasks.

  • Increase alerting & monitoring quality, Reduce Alarm noise, and Increase Observability Gaps

  • Optimize Cloud Costing and analyse Capacity Planning

  • Reduce Operations exposure, Increase the pace of incidents recovery, and Implement Resiliency and remediation plans

  • Identifying and correcting problems stemming from audit and compliance.

  • Liaise with vendors and other IT personnel for problem resolution

Performance Indicators : Observability Platform and Site Reliability Engineers have the following performance indicators:

  • Platform adoptability, availability, scalability and performance

  • Tech Dashboard

  • Site Availability, Performance

  • Mean Time to Detection

  • Mean Time to Resolution

  • Mean Time Between Failure

  • Mean Time to Production

  • Disaster Recovery Time to Recovery

  • Change Success / Failure Metrics

Soft Skills : Communication is core to the success of this role

Evangelize adoption and use of tools, processes and technologies

Lead engagements to encourage collaboration within and across teams

Showcase roadmap and engagement model to relevant stakeholders; through write up, teams groups and webinars

Documentation is core to maintain up to date information on use of tools, process and methodologies. [eg: wiki posts, Confluence write ups]

Create internal training programs for new staff and upskilling of existing team

Demonstrate humility, trust and transparency in the way we interact with individuals



  • Bengaluru, Karnataka, India Kotak Mahindra Bank Limited Full time

    Job DescriptionTitle : Observability Platforms and SRE Engg.The Company : World of Kotak product suite encompasses a powerful suite of cross banking assets, all-in-one stop banking services, securities, and investment banking insights across a wide spectrum of the major financial and banking markets.The Team : You will be working with a team of highly...

  • ML/Dev Ops Engineer

    7 days ago


    Bengaluru, Karnataka, India Ubique Systems Full time ₹ 2,40,000 - ₹ 3,00,000 per year

    Req 1 - ML/Dev Ops EngineerMandatory Skills:Databricks, mlFlow, Seldon, AWS, Kubeflow, Tecton, JenkinsSkill to Evaluate:Databricks,-mlFlow,-Seldon,-AWS,-Kubeflow,-Tecton,-Jenkins,-Graffana-,-Python-,Experience: 5 to 10 YearsLocation:BengaluruBudget: 24 to 30 LPANotice - ImmediateJob Description:· Focus on ML model load testing and creation of E2E test...


  • Bengaluru, Karnataka, India Kotak Mahindra Bank Limited Full time

    Job DescriptionSoftware Engineer II, SITE RELIABILITY ENGINEERINGWHO WE ARE LOOKING FORWe're looking for a talented full stack developer as an independent contributor to join our Technology Reliability Engineering team in the role of Software Engineer II, Reliability Engineering.This indivisual will have a strong software engineering background, a...


  • Bengaluru, Karnataka, India Kotak Mahindra Bank Full time US$ 1,25,000 - US$ 1,75,000 per year

    Software Engineer II, SITE RELIABILITY ENGINEERINGWHO WE ARE LOOKING FORWe're looking for a talented full stack developer as an independent contributor to join our Technology Reliability Engineering team in the role of Software Engineer II, Reliability Engineering.This indivisual will have a strong software engineering background, a demonstrated ability to...

  • Dev Ops

    2 weeks ago


    Bengaluru, Karnataka, India Cognizant Full time ₹ 1,04,000 - ₹ 1,30,878 per year

    Dev Ops -AWS/Azure -Terraform/Kubernetes:Hands on experience with Terraform & docker containerization.Hands on experience with AWS service like S3, API Gateway, Lambda , ECS and EKS. Hands on experience in different scripting languages like Groovy, Python, Shell Scripting etc.Implement security controls and best practices throughout the CI/CD pipeline,...

  • Sdet Iii

    3 weeks ago


    Bengaluru, Karnataka, India Talentoj Full time

    Role Purpose As a Senior Software Development Engineer in Test (SDET III), you will drive the design and development of scalable, automated test systems, ensure quality across complex systems, and play a critical role in product reliability and performance.Role Value As an SDET III, you will be a quality champion and technical mentor.You'll design robust...

  • Senior Dev Engineer

    1 week ago


    Bengaluru, Karnataka, India HARMAN International Full time ₹ 15,00,000 - ₹ 20,00,000 per year

    HARMAN's engineers and designers are creative, purposeful and agile. As part of this team, you'll combine your technical expertise with innovative ideas to help drive cutting-edge solutions in the car, enterprise and connected ecosystem. Every day, you will push the boundaries of creative design, and HARMAN is committed to providing you with the...


  • Bengaluru, Karnataka, India Kotak Mahindra Bank Full time US$ 90,000 - US$ 1,20,000 per year

    Job DescriptionAt Kotak Mahindra Bank, customer experience is at the forefront of everything we do on Digital Platform. To help us build & run platform for Digital Applications , we are now looking for an experienced Sr.DevOps Engineer . They will be responsible for deploying product updates, identifying production issues and implementing integrations that...


  • Bengaluru, Karnataka, India NTT Data Full time

    Job DescriptionIn these role your roles & responsibilities are- Collaborate with Platform Specialists to understand detailed business and user requirements.- Assist in creating technical documentation and system flowcharts.- Analyze business needs and ensure solutions meet those requirements.- Convert requirements into detailed system design documents.-...

  • Software Engineer III

    2 weeks ago


    Bengaluru, Karnataka, India JPMorganChase Full time US$ 1,50,000 - US$ 2,00,000 per year

    JOB DESCRIPTIONWe have an exciting and rewarding opportunity for you to take your software engineering career to the next level.As a Software Engineer III at JPMorgan Chase within the Corporate Technology, you serve as a seasoned member of an agile team to design and deliver trusted market-leading technology products in a secure, stable, and scalable way....