
Dev Ops Engineering III-SUPPORT SERVICES-Applications-CTB
2 days ago
Title : Observability Platforms and SRE Engg.
The Company : World of Kotak product suite encompasses a powerful suite of cross banking assets, all-in-one stop banking services, securities, and investment banking insights across a wide spectrum of the major financial and banking markets.
The Team : You will be working with a team of highly seasoned set of Observability Platform and Site Reliability Engineers part of the Run-The-Bank initiative to deliver Engineering and Technology Operations Excellence for Kotak Banking Product Suite and associated delivery platform.
The Observability Platforms and SRE team is a group of experts developing, maintaining, scaling Observability Platform solutions, driving engineering and automation within the Banking Solutions platform and operation in onPrem and the cloud.
We are looking for a highly motivated individual to take on a role of a Observability Platform Engg. and SRE to help implement our platforms using Open-Source and Enterprise solutions, through IaC, automated operations and configuration management,bringing together observability, and engineering for architecture and operational excellence.
The role will have to develop, test, validate software and hardware systems that enable our Observability Platform. Coordinate the processes and tools to support site stability, resilience and performance of the banking system that is capable of supporting multiple business requirements across an array of technologies. The Engineer will work across Architecture, development, Infrastructure and vendor teams to deliver and support the Observability Platform and SRE guided processes and tools supporting the banking systems.
Impactfulness : The team has an opportunity to advocate and participate in building engineering services that are resilient, optimally monitored, alerted and capability to self-heal thorough reliability engineering practices using software and runbook automation tools to deliver world class banking and related content globally.
. Observability Platform engineers will implement site-wide Observability solutions for metrics, logs, traces, alerting and monitoring to be used by development and business teams across the org to monitor their systems and applications. Site Reliability Engineering (SREs) is responsible for keeping all user-facing services, user journey and other Kotak production systems running smoothly.
. Said engineers should be a match of software engineers and pragmatic system engineers that embed operational discipline with engineering principles, and mature automation and documentation to our operating environments and associated Kotak Code Base.
. Said engineers would have expertise in systems (networking, operating systems, storage, etc), while implementing best practice guidelines for stability, availability, reliability and scalability while keeping the compute and cost factor optimal.
. Kotak Platforms are critical applications that have unique used cases and challenges associated that would need to be optimized over time with re-engineering and revised tools and practices.
What's in it for you: / Role : An Observability Platform Engg and SRE is ultimately accountable for building, maintaining and scaling an Observability Platform that can be used by various systems across the Org. They are also accountable for system reliability, resiliency, scalability and reducing time to market by striving to improve end to end service and reduce technical debt. We seek leaders who are passionate about observability and system reliability to influence and drive the strategic platform mission and maturity.
Your mission will be to ensure our services are fast, highly available, and run efficiently through scaling optimally during peak business traffic and load. Your focus would be to solve production problems across the stack going up to the edge. Gain critical domain knowledge to effectively troubleshoot symptoms that impair health leading to performance degradation or service outages. The position requires the flexibility to take a holistic approach to troubleshooting and the ability to deep dive into core technical details working with various development, infra and vendor teams. Build automation tools and processes for system health and acceptance tests to validate changes in lower environment leading to production changes. The Systems Reliability Engineer will ensure the system is well instrumented and highly fault tolerant with proper metrics to report upon.
Key Leadership Responsibilities:
. Influence and drive engagement on Observability and SRE practices with development, engineering and product groups to align solution delivery with technology services.
. Build quality engineering practices around automation through well-defined processes and monitoring metrics that exhibit process quality.
. Conduct transparent and effective blameless post mortems and ensuring Post Incident Reviews have clear Root Cause and Actions with Problem tickets and closures.
. Deliver on availability, latency, performance, scalability of Kotak applications by evangelizing engineering principles into development lifecycle with a template on fault tolerant at each level.
. Drive non functional requirement review including capacity planning, cost analysis and instrumentation integration to provide complete delivery cycle.
. Define Observability and SRE initiatives, tasks and report to all stakeholders, business and build a onboarding template for new and future applications.
. Implement metrics driven approach towards service quality targets.
Basic Qualifications : 7+ years system & solutions engineering, software development, or technology operations background with 3+ years work experience working as a Systems Engineer, DevOps and/or SRE Roles.
. Experience automating infrastructure, testing, and deployments using tools like Terraform, CFT with Jenkins, Ansible, Chef & other industry recognized tools to deliver Infrastructure as Code.
. Relevant work experience or familiar with languages / web technologies (Python, Java,C, C++, ASP.NET, JavaScript, Go etc)
. Experience with 2 or more scripting languages such as python, perl, unix shell, powershell, groovy, etc...
. Experience with AWS technologies: VPC, EC2, EKS, ELB, RDS, Lambda, SES, SNS, Containers, etc.
. Experience with any identity management systems such as (SAML/OAuth), MFA, etc.
. CI/CD delivery using code and configuration management automation tools such as GitHub, VSTS, Ansible, DSC, Puppet, Ambari, Chef, Salt, Jenkins, Maven, etc.
. Delivery using modern methodologies especially SAFE Agile, Lean, etc.
. Experience with networking protocols, CDN, App acceleration, Load Balancers, DNS, VPN, PaaS, IaaS, etc.
. Experience with troubleshooting networking protocols such as TCP/IP, HTTPS/ TLS/ Websockets, Multicast and Broadcast messaging.
. Experience with cloud infrastructure, storage, platforms, data and with containers (Kubernetes, Container, Docker, virtualization).
. Experience with monitoring and observability such as with Grafana, Prometheus, Datadog, Splunk, AppDynamics, New Relic, and Nagios, etc.
Preferred Qualifications:
. Bachelor's/Master's Degree in Computer Science, Information Systems, or equivalent
. AWS Certified Solution Architect - Professional/Associate
. Good Leadership skills capable of leading a team.
. Good communication skills and a sense of ownership and drive.
. Have a software-centric mindset and can understand the full software stack - and beyond.
. Embrace automation over manual effort, debugging complex problems and view problems as an opportunity to improve.
. Experience designing, building, and operating large-scale production systems
. Experience working in enterprise-scale internal or customer-centric projects.
. Experience working closely with development & engineering teams.
. Good understanding of software development lifecycle (SDLC) and Software Testing in an Agile/Scrum framework.
. Strong analytical thinking, problem solving, oral and written communication skills.
. Experience working with multiple stakeholders and vendors at various levels.
. Understanding of SQL and databases, should be comfortable in writing SQL queries
. Hands on doing operational automation using any automation framework.
. Good knowledge of working with SOAP, REST services and SOA architecture.
. Knowledge of testing in continuous integration/DevOps models is a plus.
. Understanding of Cloud technologies like AWS/Azure and micro-services, containers.
. Experience in DevOps, Big Data Testing, IOT, Cloud will be added advantage.
. Experience automating infrastructure, testing, and deployments using Terraform, CFT with Ansible, Rundeck, Autosys, Jenkins to deliver Infrastructure as Code.
. Experience working with the Rundeck tool (Design, Setup, Deployment, Automation & Integration)
. Terraform / Kubernetes / Ansible expertise a plus
Responsibilities:
. Experience with maintaining SLA 99.99% of the Banking Platform and Applications.
. Experience in troubleshooting and resolving incidents and using problem management to bring about service improvement using automation to drive resiliency and stability.
. Experience in service restoration through standard automized tools and engineering processes to reduce our downtime and improve our SLA/SLI/SLO metrics.
. Creating production and migration schedules for large projects with timelines/milestones
. Develop and leverage AWS tools and services to manage and automate key operations capabilities.
. Proactively ensure the highest levels of systems and infrastructure availability
. Monitor and test application performance for potential bottlenecks, identify possible solutions and work with developers to implement those fixes.
. Write and maintain custom scripts to increase system efficiency and reduce human intervention time on tasks.
. Increase alerting & monitoring quality, Reduce Alarm noise, and Increase Observability Gaps
. Optimize Cloud Costing and analyse Capacity Planning
. Reduce Operations exposure, Increase the pace of incidents recovery, and Implement Resiliency and remediation plans
. Identifying and correcting problems stemming from audit and compliance.
. Liaise with vendors and other IT personnel for problem resolution
Performance Indicators : Observability Platform and Site Reliability Engineers have the following performance indicators:
. Platform adoptability, availability, scalability and performance
. Tech Dashboard
. Site Availability, Performance
. Mean Time to Detection
. Mean Time to Resolution
. Mean Time Between Failure
. Mean Time to Production
. Disaster Recovery Time to Recovery
. Change Success / Failure Metrics
Soft Skills : Communication is core to the success of this role
Evangelize adoption and use of tools, processes and technologies
Lead engagements to encourage collaboration within and across teams
Showcase roadmap and engagement model to relevant stakeholders through write up, teams groups and webinars
Documentation is core to maintain up to date information on use of tools, process and methodologies. [eg: wiki posts, Confluence write ups]
Create internal training programs for new staff and upskilling of existing team
Demonstrate humility, trust and transparency in the way we interact with individuals
-
Senior Dev Ops Specialist
4 days ago
Bengaluru, Karnataka, India beBeeDevOps Full time ₹ 20,00,000 - ₹ 25,00,000Key Dev Ops RoleWe are seeking a seasoned professional to lead the design and implementation of Dev Ops solutions that align with business requirements.Design and architect Dev Ops practices using Microsoft technologies.Collaborate with cross-functional teams to streamline and automate build, test, and deployment processes.Define best practices, governance,...
-
Bengaluru, Karnataka, India Kotak Mahindra Bank Full time US$ 1,25,000 - US$ 1,75,000 per yearSoftware Engineer II, SITE RELIABILITY ENGINEERINGWHO WE ARE LOOKING FORWe're looking for a talented full stack developer as an independent contributor to join our Technology Reliability Engineering team in the role of Software Engineer II, Reliability Engineering.This indivisual will have a strong software engineering background, a demonstrated ability to...
-
Dev Ops Engineering II-SUPPORT SERVICES-Applications-CTB
55 minutes ago
Bengaluru, Karnataka, India Kotak Mahindra Bank Limited Full timeJob DescriptionSoftware Engineer II, SITE RELIABILITY ENGINEERINGWHO WE ARE LOOKING FORWe're looking for a talented full stack developer as an independent contributor to join our Technology Reliability Engineering team in the role of Software Engineer II, Reliability Engineering.This indivisual will have a strong software engineering background, a...
-
Dev Ops
18 hours ago
Bengaluru, Karnataka, India Cognizant Full time ₹ 1,04,000 - ₹ 1,30,878 per yearDev Ops -AWS/Azure -Terraform/Kubernetes:Hands on experience with Terraform & docker containerization.Hands on experience with AWS service like S3, API Gateway, Lambda , ECS and EKS. Hands on experience in different scripting languages like Groovy, Python, Shell Scripting etc.Implement security controls and best practices throughout the CI/CD pipeline,...
-
Bengaluru, Karnataka, India Kotak Mahindra Bank Limited Full timeJob DescriptionJob Title:SDE-2/3Location: Mumbai/Bangalore/HyderabadExperience range: 3 to 12 yearsWhat we offer:Our mission is simple - Building trust. Our customer's trust in us is not merely about the safety of their assets but also about how dependable our digital offerings are. That's why, we at Kotak Group are dedicated to transforming banking by...
-
Sdet Iii
4 days ago
Bengaluru, Karnataka, India Talentoj Full timeRole Purpose As a Senior Software Development Engineer in Test (SDET III), you will drive the design and development of scalable, automated test systems, ensure quality across complex systems, and play a critical role in product reliability and performance.Role Value As an SDET III, you will be a quality champion and technical mentor.You'll design robust...
-
Dev Engineer
3 days ago
Bengaluru, Karnataka, India beBeeSoftware Full time ₹ 1,50,00,000 - ₹ 2,00,00,000About the RoleAs a Dev Engineer at People Prime Worldwide, you will work in an agile squad together with ops engineers and other Pega dev engineers. Your team is part of the Global KYC Tribe, focusing on business delivery, and you'll be part of a Pega Dev Chapter, focusing on engineer craftsmanship.The team uses Agile-Scrum methodology, focusing on iterative...
-
Software Engineer III
19 hours ago
Bengaluru, Karnataka, India JPMorganChase Full time US$ 1,50,000 - US$ 2,00,000 per yearJOB DESCRIPTIONWe have an exciting and rewarding opportunity for you to take your software engineering career to the next level.As a Software Engineer III at JPMorgan Chase within the Corporate Technology, you serve as a seasoned member of an agile team to design and deliver trusted market-leading technology products in a secure, stable, and scalable way....
-
Ops Engineer – Backend Applications with SIP/VOIP
19 hours ago
Bengaluru, Karnataka, India Vonage Full time US$ 1,00,000 - US$ 1,50,000 per yearJoin Vonage and help us innovate cloud communications for businesses worldwideWhy This Role MattersOps Engineer role will be responsible for all aspects of engineering for applications supported by Vonage, which includes: deploying, configuring, administering, maintaining, and troubleshooting issues. Further responsibilities include (but not limited to)...
-
Cloud engineer
6 days ago
Bengaluru, Karnataka, India Tata Consultancy Services Full timeRole : GCP Cloud Engineer Technical Experience and Qualifications: Experience: 10+ years of relevant work experience in cloud engineering. Google Cloud Platform: Advanced proficiency in Google Cloud Platform (GCP). Infrastructure as Code: Advanced proficiency in Terraform for infrastructure automation. Configuration Management: Advanced skills in Ansible for...