Staff Sre, Application Sre

1 day ago


Bengaluru Karnataka, India Netskope Full time

**About Netskope**:
Today, there's more data and users outside the enterprise than inside, causing the network perimeter as we know it to dissolve. We realized a new perimeter was needed, one that is built in the cloud and follows and protects data wherever it goes, so we started Netskope to redefine Cloud, Network and Data Security.

**About the role**

The Application SRE Team supports several critical components of our foundational technologies for real-time protection, as well as Data services. We are a team of software engineers focused on improving availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of the engineering stacks. If you are passionate about solving complex problems and developing cloud services at scale, we would like to speak with you.

As a SRE MLOps, you will be critical to deploying and managing cutting-edge infrastructure crucial for AI/ML operations, and you will collaborate with AI/ML engineers and researchers to develop a robust CI/CD pipeline that supports safe and reproducible experiments. Your expertise will also extend to setting up and maintaining monitoring, logging, and alerting systems to oversee extensive training runs and client-facing APIs. You will ensure that training environments are optimally available and efficiently managed across multiple clusters, enhancing our containerization and orchestration systems with advanced tools like Docker and Kubernetes.
- Work closely with AI/ML engineers and researchers to participate in the designing and architecture of AI ML Applications for scale and reliability. Design and deploy a CI/CD pipeline that ensures safe and reproducible experiments.
- Involve in production troubleshooting of AI ML Application code as well as infrastructure configurations.
- Set up and manage monitoring, logging, and alerting systems for extensive training runs and client-facing APIs.
- Ensure training environments are consistently available and prepared across multiple clusters.
- Develop and manage containerization and orchestration systems utilizing tools such as Docker and Kubernetes.
- Operate and oversee large Kubernetes clusters with GPU workloads.
- Improve reliability, quality, and time-to-market of our suite of software solutions
- Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement

Is this you?
- You have professional experience with:
- Model training
- Huggingface Transformers
- Pytorch
- LLM
- TensorRT
- Infrastructure as code tools like Terraform
- Scripting languages such as Python or Bash
- Cloud platforms such as Google Cloud, AWS or Azure
- Git and GitHub workflows
- Tracing and Monitoring
- Familiar with high-performance, large-scale ML systems
- You have a knack for troubleshooting complex systems and enjoy solving challenging problems
- Proactive in identifying problems, performance bottlenecks, and areas for improvement
- Take pride in building and operating scalable, reliable, secure systems
- Familiar with monitoring tools such as Prometheus, Grafana, or similar
- Are comfortable with ambiguity and rapid change

**Preferred skills and experience**:

- Familiar with monitoring tools such as Prometheus, Grafana, or similar
- 8+ years building core infrastructure
- Experience running inference clusters at scale
- Experience operating orchestration systems such as Kubernetes at scale

LI-DB1

Netskope is committed to implementing equal employment opportunities for all employees and applicants for employment. Netskope does not discriminate in employment opportunities or practices based on religion, race, color, sex, marital or veteran statues, age, national origin, ancestry, physical or mental disability, medical condition, sexual orientation, gender identity/expression, genetic information, pregnancy (including childbirth, lactation and related medical conditions), or any other characteristic protected by the laws or regulations of any jurisdiction in which we operate.

Netskope respects your privacy and is committed to protecting the personal information you share with us, please refer to Netskope's Privacy Policy for more details.


  • Sre

    5 days ago


    Bengaluru, Karnataka, India Virtusa Full time

    Role: SRE Experience: 6 to 10 years Work Mode: Hybrid Work timings: 2pm to 11pm Location: Chennai & Hyderabad Primary Skills: SRE You are passionate about driving SRE / DevSecOps mindset and culture in a fast-paced, challenging environment where you get the opportunity to work with a spectrum of latest tools and technologies to drive forward Automation,...

  • Sre Intern

    12 hours ago


    Bengaluru, Karnataka, India InfraCloud Technologies Full time

    Have you always dreamt of having a career as a DevOps/SRE Engineer? Here is your chance. InfraCloud is training a batch of Engineers in SRE domain. If you can code a little, you like exploring how systems work and have close to 2 years of experience in any IT domain here is your chance - Solve this assignment. Reading the documentation is the key to solve...


  • Bengaluru, Karnataka, India Netskope Full time ₹ 5,00,000 - ₹ 8,00,000 per year

    About NetskopeToday, there's more data and users outside the enterprise than inside, causing the network perimeter as we know it to dissolve. We realized a new perimeter was needed, one that is built in the cloud and follows and protects data wherever it goes, so we started Netskope to redefine Cloud, Network and Data Security.  Since 2012, we have built...

  • Sre Architect

    5 days ago


    Bengaluru, India CIEL HR Services Full time

    Strong understanding and knowledge on SRE setup on GCP development environment. Understanding on monitor performance, resource utilization, and error logs when products move into production. Experience on SRE tool implementation (incident and configuration management tools) from scratch. Good grip on the foundational concepts of SRE (observability and...

  • Sre Architect

    5 days ago


    Bengaluru, India CIEL HR Services Full time

    **JD**: Strong understanding and knowledge on SRE setup on GCP development environment. Understanding on monitor performance, resource utilization, and error logs when products move into production. Experience on SRE tool implementation (incident and configuration management tools) from scratch. Good grip on the foundational concepts of SRE (observability...


  • Bengaluru, Karnataka, India Bahwan Cybertek Group Full time

    We are looking for a talented DevOps / SRE Engineer with strong Python skills to join our team at Bahwan Cybertek Group. As a DevOps / SRE Engineer, you will be responsible for maintaining and improving our software development and deployment processes, as well as ensuring the reliability and scalability of our infrastructure. **Responsibilities**: -...

  • Application SRE

    2 weeks ago


    Bengaluru, Karnataka, India Infosys Limited Full time

    Job DescriptionResponsibilities :A day in the life of an InfoscionAs a Senior Site Reliability Engineer, you will play a critical role in supporting application developers by providing expert guidance on Application and infrastructure best practices from reliability perspective.Improve reliability, quality, and time-to-market of our suite of...

  • OpenShift SRE

    1 week ago


    Bengaluru, Karnataka, India Nexturn Full time US$ 90,000 - US$ 1,20,000 per year

    RedHat OpenShift SRELocation: BangaloreMode: WFOKey Responsibilities:Design, implement, and manage container-based platforms using Docker, Kubernetes, and Red Hat OpenShift SRE.Administer and optimize RHEL-based systems across cloud and on-premise environments.Implement infrastructure as code (IaC) using Ansible, Terraform, or Helm.Manage CI/CD pipelines and...


  • Bengaluru, Chennai, Mumbai, India Hexaware Technologies Full time US$ 1,50,000 - US$ 2,00,000 per year

    Title DevOps & SRE Architect / DevOps & SRE Presales LeadKey Responsibilities: He / She will be responsible for working with the DevOps & SRE practice team on dierent practice related activities along with working on consulting opportunities.Job DescriptionA minimum of 15 years of experience in IT (preferably in software development, testing or...

  • Application SRE

    7 days ago


    Bengaluru, Karnataka, India Infosys Full time ₹ 1,04,000 - ₹ 1,30,878 per year

    Educational RequirementsBachelor of Engineering,BTech,Bachelor Of Science,Master Of Engineering,Master Of TechnologyService LineInfosys Cobalt UnitResponsibilitiesA day in the life of an Infoscion- As a Senior Site Reliability Engineer, you will play a critical role in supporting application developers by providing expert guidance on Application and...