Cloud Operations L2 Support Engineer

1 day ago


bangalore, India Rakuten Symphony Full time

Job Summary :

We are seeking a highly skilled and experienced Cloud Engineer with a strong Site Reliability Engineering (SRE) mindset to join our team. This role will be critical in ensuring the availability, reliability, and performance of our platform services and applications, particularly those supporting our Radio Access Network (RAN) and Core Network functions deployed on cloud infrastructure. The ideal candidate will possess deep expertise in Kubernetes, cloud operations, and a passion for optimizing complex distributed systems. You will be instrumental in running our production environment, responding to critical incidents, and driving continuous improvement in system reliability and efficiency across both RAN and Core cloud deployments.


Key Responsibilities:

  • Platform Reliability & Availability (SRE Focus):
  • Run the production environment by proactively monitoring availability and taking a holistic view of system health for our cloud-based RAN and Core Network platforms.
  • Improve the reliability and quality of the system through automation, process refinement, and best practices for both RAN and Core cloud components.
  • Measure and optimize system performance to ensure efficient resource utilization and optimal user experience for network services.
  • Ensure services are available, the underlying infrastructure is properly functioning and monitor critical applications and related services to guarantee system availability for RAN and Core functions.
  • Cloud Operations & Kubernetes Management:
  • Design, deploy, and manage Kubernetes clusters and related cloud infrastructure for both RAN and Core Network application deployments.
  • Implement and maintain containerization strategies and orchestration best practices for telecom workloads.
  • Manage and troubleshoot Robin storage solutions within the Kubernetes environment, supporting the unique storage needs of RAN and Core applications.
  • Implement and manage CI/CD pipelines for cloud-native RAN and Core applications.
  • Responsible for cloud resource provisioning, scaling, and cost optimization for all deployed network functions.
  • Incident & Problem Management:
  • Collaborate for high-priority incident tickets (e.g., MIC Reported Incident, Serious/Medium/Small Network Incidents, RIUD Faults), ensuring rapid system recovery for both RAN and Core impacted services.
  • Be on standby to interface with developers when issues arise and get escalated, providing immediate technical insights and support for cloud-native network functions.
  • Lead Problem Management efforts, including Root Cause Analysis (RCA), for complex incidents affecting RAN and Core cloud deployments.
  • Identify bugs and work with development teams to prioritize and implement fixes for cloud-native network elements.
  • Monitoring & Alerting:
  • Implement and maintain robust monitoring, logging, and alerting solutions for cloud infrastructure and applications supporting RAN and Core services.
  • Define and track Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for critical RAN and Core services running in the cloud.
  • Automation & Tooling:
  • Develop and implement automation scripts and tools to streamline operational tasks, deployments, and incident response for cloud-native RAN and Core components.
  • Evaluate and integrate new tools and technologies to enhance operational efficiency.
  • Collaboration & Knowledge Sharing:
  • Support for Governance Reports, providing technical data and insights on cloud platform performance for RAN and Core.
  • Handle customer queries with technical expertise and provide timely resolutions related to cloud-deployed network services.
  • Provide training and mentorship to junior team members on cloud technologies and SRE practices, specifically in the context of telecom networks.
  • Work closely with development, network, and security teams to ensure seamless service delivery across the entire network architecture.


Technical Requirements (Most Visible):

  • Deep expertise in Kubernetes:
  • Cluster deployment, management, and troubleshooting for high-performance telecom workloads.
  • Container orchestration, Pod lifecycle, Deployments, Services, Ingress.
  • Helm charts, Kustomize.
  • Advanced networking within Kubernetes (CNI, CoreDNS, service mesh concepts).
  • Security best practices in Kubernetes, especially for critical network functions.
  • Proficiency in Cloud Platforms: Experience with at least one major cloud provider (e.g., AWS, Azure, GCP) with focus on enterprise-grade infrastructure.
  • Containerization Technologies: Docker, container.
  • Robin Storage: Hands-on experience with Robin.io or similar distributed persistent storage solutions for Kubernetes, particularly for stateful RAN and Core applications.
  • Infrastructure as Code (IaC): Terraform, Ansible, or similar tools for automating cloud and Kubernetes deployments.
  • Scripting & Automation: Strong proficiency in Python, Go, Bash, or similar for developing automation and tooling.
  • Monitoring & Logging Tools: Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana), Splunk, Datadog, or similar, with experience in large-scale data ingestion and analysis.
  • CI/CD Tools: Jenkins, GitLab CI/CD, Argo CD, or similar, for continuous deployment of network functions.
  • Operating Systems: Linux (e.g., CentOS, Ubuntu, RHEL) expert-level knowledge.
  • Networking Fundamentals: Deep understanding of TCP/IP, DNS, Load Balancing, Firewalls, VPNs, and advanced network concepts relevant to telecom (e.g., SRv6, Segment Routing, GTP-U/C).
  • Telecommunications Network Knowledge:
  • Strong understanding of Radio Access Network (RAN) architecture, components, and interfaces (e.g., O-RAN, vRAN concepts).
  • Strong understanding of Core Network (EPC/5GC) architecture, functions (e.g., AMF, SMF, UPF, MME, SGW, PGW), and protocols.
  • Familiarity with network function virtualization (NFV) and software-defined networking (SDN) principles.


Qualifications:

  • Education: Bachelor’s degree in computer science, Engineering, or a related field.
  • Experience: Minimum of 5-6 years of experience in a Cloud Engineering, DevOps, or SRE role, with a significant focus on Kubernetes and cloud operations, ideally within a telecommunications or high-availability environment.
  • Problem-Solving: Exceptional analytical and problem-solving skills, with a methodical approach to debugging complex distributed systems.
  • Communication: Excellent verbal and written communication skills, capable of effectively collaborating with technical and non-technical stakeholders.
  • Proactive Mindset: Ability to anticipate issues, identify risks, and propose preventative solutions.
  • Incident Response: Proven experience in responding to and resolving critical production incidents in a fast-paced environment.
  • Continuous Improvement: A strong desire to learn, adapt, and drive continuous improvement in processes and systems.



  • Bangalore, India Rakuten Symphony Full time

    Job Summary : We are seeking a highly skilled and experienced Cloud Engineer with a strong Site Reliability Engineering (SRE) mindset to join our team. This role will be critical in ensuring the availability, reliability, and performance of our platform services and applications, particularly those supporting our Radio Access Network (RAN) and Core...


  • bangalore, India Covenant HR Full time

    Company – Our client is a global technology services and consulting leader, recognized for driving innovation in enterprise IT and cybersecurity. Known for its collaborative culture and digital transformation expertise, this Fortune 500 organization partners with top enterprises worldwide to elevate their security posture and resilience.Job Title –...


  • bangalore, India TRDFIN Support Services Pvt Ltd Full time

    Roles & Task Responsibilities • Perform health checks, monitor alerts and Work on Incident & Service Request tickets across platforms Manage disk space utilization, backup alerts. • Improve KB usage and contribute to SOP enhancements. • Identify opportunities for automation and left-shift enablement to reduce L2 dependency • Familiarity on AWS or...


  • Bangalore, India DDN Full time

    This is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. DataDirect Networks (DDN) is a global market leader renowned for powering many of the world's most demanding AI data centers, in industries ranging from life sciences and healthcare to financial...


  • Bangalore, India DDN Full time

    This is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. DataDirect Networks (DDN) is a global market leader renowned for powering many of the world's most demanding AI data centers, in industries ranging from life sciences and healthcare to financial...


  • bangalore, India DDN Full time

    This is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. DataDirect Networks (DDN) is a global market leader renowned for powering many of the world's most demanding AI data centers, in industries ranging from life sciences and healthcare to financial...


  • Bangalore Rural, Bengaluru, India Genxhire Services Full time ₹ 9,00,000 - ₹ 12,00,000 per year

    Role & responsibilitiesJob TitleL2 Production Support LeadOracle RID79511Job Type: Full-Time / ContractFull TimeExperience Range8+ YearsJob LocationBangaloreWork Mode ( WFO / Hybrid / Remote)RemoteNotice Period Requirement0-30 days etcRecruiterAarti ShahJob Summary:We are looking for an experienced L2 Production Support Engineer with strong expertise in...


  • bangalore, India Cisco Full time

    🚀 We’re Hiring: Software Engineer – Enterprise Switching (C/Linux, L2/L3 Protocols | 4–8 years)At Cisco, the Enterprise Switching organization is building the backbone of the modern network with our industry-leading Catalyst Cat9000 series. These switches power the world’s most critical networks—supporting hybrid work, AI/ML-driven security,...


  • bangalore, India Abacus.AI Full time

    Cloud Cost Optimization Engineer (AWS/GCP)We are looking for a Cloud Operations Engineer / Cloud Cost Optimization Engineer to join our growing team. The ideal candidate will have hands-on expertise in AWS, GCP and Azure cloud infrastructure, scripting/programming, monitoring, and cost optimization, with a passion for improving reliability and...


  • Bangalore, India Black Box Full time

    About Black Box : Black Box is a trusted IT solutions provider delivering cutting-edge technology solutions and world-class consulting services in Unified Communications, Enterprise Networking, Data Center, Digital Applications and Cyber Security. We deliver solutions, services and products to more than 8,000 clients worldwide. These clients trust our...