PCAI And AI Factory
15 hours ago
PCAI And AI FactoryThis role has been designed as 'Hybrid' with an expectation that you will work on average 2 days per week from an HPE office.
Who We Are:
Hewlett Packard Enterprise is the global edge-to-cloud company advancing the way people live and work. We help companies connect, protect, analyze, and act on their data and applications wherever they live, from edge to cloud, so they can turn insights into outcomes at the speed required to thrive in today's complex world. Our culture thrives on finding new and better ways to accelerate what's next. We know varied backgrounds are valued and succeed here. We have the flexibility to manage our work and personal needs. We make bold moves, together, and are a force for good. If you are looking to stretch and grow your career our culture will embrace you. Open up opportunities with HPE.
Job Description:
HPE Operations is our innovative IT services organization. It provides the expertise to advise, integrate, and accelerate our customers' outcomes from their digital transformation. Our teams collaborate to transform insight into innovation. In today's fast paced, hybrid IT world, being at business speed means overcoming IT complexity to match the speed of actions to the speed of opportunities. Deploy the right technology to respond quickly to market possibilities. Join us and redefine what's next for you.
What you'll do:
We are seeking a Subject Matter Expert (SME) – Admin, Operate & Manage (HPE PCAI & AI Factory Solutions) to manage and optimize HPE's next-generation AI infrastructure platforms. The ideal candidate will have deep hands-on expertise in AI, HPC, and GPU-accelerated environments, with strong knowledge of HPE Ezmeral, NVIDIA AI Enterprise, Containerized workloads, and Automation frameworks. This role focuses on the operational stability, lifecycle management, and continuous improvement of large-scale Private Cloud for AI (PCAI) and AI Factory deployments.
Key Responsibilities:
- Platform Administration
•
Administer and maintain HPE PCAI and AI Factory environments, ensuring optimal uptime and performance.
- Manage compute nodes (HPE DL380a, DL325, Cray XD670), GPU clusters (NVIDIA L40S/H100/H200), and InfiniBand NDR networks.
- Administer virtualization and container platforms such as vSphere, RHEL/RHOS, Ezmeral Runtime Enterprise, Kubernetes, and Rancher Harvester.
Perform configuration, patching, version upgrades, and firmware updates across hardware and software layers.
Operational Monitoring & Incident Management
• Proactively monitor system health using DCGM, NetQ, Grafana, and Exivity dashboards.
• Handle alerts, performance anomalies, and incidents across GPU, network, and storage layers.
- Lead root cause analysis (RCA) and corrective action plans to prevent recurring issues.
- Maintain operational documentation, runbooks, and incident logs.
3. Lifecycle & Configuration Management
- Manage cluster lifecycle through Ansible, AWX, HPE Performance Cluster Manager (HPCM), and SLURM.
- Oversee automation for provisioning, scaling, and patch management of Compute and Containerized workloads.
Manage configuration changes, infrastructure templates, and version baselines in production and staging environments.
AI Platform & Software Operations
Operate HPE Ezmeral Unified Analytics, Data Fabric, and AI Essentials platforms.
- Support NVIDIA AI Enterprise (NVAIE) components including NIMs, NeMO frameworks, and RAPIDS runtime.
- Manage and monitor AI/ML workloads (LLM, NLP, Computer Vision, Chatbots) on containerized clusters.
Ensure smooth operation of development tools like Jupyter, Spark, Airflow, MLflow, Kubeflow, and Ray.
Storage & Data Operations
Administer VAST, WEKA, and Alletra MP storage solutions for file, object, and distributed storage.
- Monitor storage performance, replication, and capacity utilization.
Coordinate with storage engineering teams for performance optimization and capacity planning.
Security, IAM & Compliance
Implement and maintain Keycloak for authentication and role-based access control.
- Ensure adherence to compliance, audit, and governance standards for AI workloads.
Support user and service account provisioning, credential management, and access reviews.
Continuous Improvement & Knowledge Enablement
Optimize automation workflows to reduce manual intervention and improve service response time.
- Drive service health reviews, operational dashboards, and SLA compliance reporting.
- Conduct enablement sessions for L1/L2 teams and act as the final escalation point for operational issues.
- Collaborate with HPE Engineering for patch validation, release readiness, and operational feedback. Required Skills & Technical Expertise: Core Infrastructure Skills
- Administration of HPE DL380a, DL325, Cray XD670, and GPU-based Compute environments.
- Strong knowledge of NVIDIA GPU stack, InfiniBand NDR, and Spectrum-X switches.
- Experience in managing VAST, WEKA, or Alletra MP storage systems. Software & Platform Operations
- Virtualization: vSphere, RHEL, Ezmeral Runtime Enterprise
• Containers: Kubernetes, Rancher Harvester, KubeSphere, Morpheus
• Automation: Ansible, AWX, NetBox, HPCM, SLURM
- Observability: Grafana, NetQ, Exivity, DCGM
- Security: Keycloak, IAM integrations AI/ML Platform Administration
- Experience in HPE Ezmeral Unified Analytics and Data Fabric operations
• Familiarity with NVIDIA AI Enterprise, NIMs, NeMO, and Triton Inference Server
• Working knowledge of TensorFlow, PyTorch, Spark, Kubeflow, MLflow, and Jupyter Preferred Certifications
:
• HPE ASE / Master ASE (Compute, Storage, or Ezmeral)
- NVIDIA Certified Professional / NVAIE Certification
- RHCE / Kubernetes Administrator (CKA) / VMware VCP Soft Skills:
- Strong analytical and troubleshooting capabilities.
- Excellent communication and collaboration skills across global teams.
- Ability to lead operations improvement initiatives and mentor support engineers.
- Focused on reliability, scalability, and service excellence. For Internal Job Movement:
- Approval of the employee's current manager is required.
- Employees are expected to notify their manager prior to an interview.
- Employees in Performance Improvement Plan are not eligible to apply.
- Minimum level should be EXP if applying as part of Internal Job Posting. Why Join Us:
- Work on next-generation AI infrastructure operations and automation
.
• Be part of a global team managing HPE's AI Factory and PCAI platforms supporting large-scale AI workloads.
- Opportunity to contribute to service innovation and continuous improvement initiatives in AI infrastructure management
What you need to bring:
Bachelor's / Master's Degree in Computer Science, IT, or equivalent field.
8+ years of IT infrastructure administration experience, including 3+ years in AI/HPC or GPUbased environments.
Proven experience in platform operations, monitoring, and lifecycle management of enterprise-grade AI and HPC environments.
Hands-on experience in automation and orchestration across bare metal and containerized infrastructure.
Additional Skills:
Accountability, Accountability, Action Planning, Active Learning, Active Listening, Bias, Business Growth, Business Planning, Coaching, Commercial Acumen, Creativity, Critical Thinking, Cross-Functional Teamwork, Customer Experience Strategy, Customer Solutions, Data Analysis Management, Data Collection Management (Inactive), Data Controls, Design Thinking, Empathy, Follow-Through, Growth Mindset, Intellectual Curiosity (Inactive), Long Term Planning, Managing Ambiguity {+ 5 more}
What We Can Offer You:
Health & Wellbeing
We strive to provide our team members and their loved ones with a comprehensive suite of benefits that supports their physical, financial and emotional wellbeing.
Personal & Professional Development
We also invest in your career because the better you are, the better we all are. We have specific programs catered to helping you reach any career goals you have — whether you want to become a knowledge expert in your field or apply your skills to another division.
Unconditional Inclusion
We are unconditionally inclusive in the way we work and celebrate individual uniqueness. We know varied backgrounds are valued and succeed here. We have the flexibility to manage our work and personal needs. We make bold moves, together, and are a force for good.
Let's Stay Connected:
Follow @HPECareers on Instagram to see the latest on people, culture and tech at HPE.
indiaoperations
Job:
Services
Job Level:
Expert
HPE is an Equal Employment Opportunity/ Veterans/Disabled/LGBT employer. We do not discriminate on the basis of race, gender, or any other protected category, and all decisions we make are made on the basis of qualifications, merit, and business need. Our goal is to be one global team that is representative of our customers, in an inclusive environment where we can continue to innovate and grow together. Please click here: Equal Employment Opportunity.
Hewlett Packard Enterprise is EEO Protected Veteran/ Individual with Disabilities.
HPE will comply with all applicable laws related to employer use of arrest and conviction records, including laws requiring employers to consider for employment qualified applicants with criminal histories.
No Fees Notice & Recruitment Fraud Disclaimer
It has come to HPE's attention that there has been an increase in recruitment fraud whereby scammer impersonate HPE or HPE-authorized recruiting agencies and offer fake employment opportunities to candidates. These scammers often seek to obtain personal information or money from candidates.
Please note that Hewlett Packard Enterprise (HPE), its direct and indirect subsidiaries and affiliated companies, and its authorized recruitment agencies/vendors will never charge any candidate a registration fee, hiring fee, or any other fee in connection with its recruitment and hiring process. The credentials of any hiring agency that claims to be working with HPE for recruitment of talent should be verified by candidates and candidates shall be solely responsible to conduct such verification. Any candidate/individual who relies on the erroneous representations made by fraudulent employment agencies does so at their own risk, and HPE disclaims liability for any damages or claims that may result from any such communication.
-
PCAI And AI Factory
13 hours ago
Bengaluru, Karnataka, India Hewlett Packard Enterprise Full time US$ 1,20,000 - US$ 1,80,000 per yearPCAI And AI FactoryThis role has been designed as 'Hybrid' with an expectation that you will work on average 2 days per week from an HPE office Who We Are: Hewlett Packard Enterprise is the global edge-to-cloud company advancing the way people live and work. We help companies connect, protect, analyze, and act on their data and applications wherever they...
-
Factory Supervisor
1 week ago
Bengaluru, Karnataka, India 7d25a8a1-8b52-4c11-97dc-cfad23ceeb56 Full time ₹ 12,00,000 - ₹ 36,00,000 per yearCompany DescriptionMindtel Global is a leading software consulting organization dedicated to delivering customized digital solutions for businesses striving to scale and innovate. With expertise across technologies and a proven track record, Mindtel Global provides services such as custom software development, IT consulting, cloud solutions, enterprise...
-
AI Architect
2 days ago
Bengaluru, Karnataka, India Atos Full time ₹ 12,00,000 - ₹ 36,00,000 per yearThe future is our choiceAt Atos, as the global leader in secure and decarbonized digital, our purpose is to help design the future of the information space. Together we bring the diversity of our people's skills and backgrounds to make the right choices with our clients, for our company and for our own futures.Position: Senior AI Architect – AI Factory...
-
AI Solution Architect
6 days ago
Bengaluru, Karnataka, India Zifcare Com Full time ₹ 15,00,000 - ₹ 25,00,000 per yearAI architecture deployment and scaling Agentic RAG, Gen AI Hub, Langchain, Python, .NET, Microsoft AI services, Azure ML, Databricks, Data Factory, TensorFlow, scikit-learn DevOps, Azure Cloud, CI/CD, containerization, MLOps, LLMs/SLMs
-
, AI Professional
6 days ago
Bengaluru, Karnataka, India Capgemini Full time ₹ 15,00,000 - ₹ 25,00,000 per yearYour Role - Design and implement AI models using Azure Machine Learning.- Develop cognitive services, bots, and AI applications.- Use Azure Data Factory and Databricks for data transformation.- Deploy and monitor AI models in production.- Ensure compliance with data protection regulations.Your Profile - Python or C#- Azure AI services (Cognitive Services,...
-
, AI
2 days ago
Bengaluru, Karnataka, India Capgemini Full time ₹ 12,00,000 - ₹ 36,00,000 per yearChoosing Capgemini means choosing a company where you will be empowered to shape your career in the way you'd like, where you'll be supported and inspired by a collaborative community of colleagues around the world, and where you'll be able to reimagine what's possible. Join us and help the world's leading organizations unlock the value of technology and...
-
, AI
11 hours ago
Bengaluru, Karnataka, India Capgemini Engineering Full time ₹ 12,00,000 - ₹ 36,00,000 per yearChoosing Capgemini means choosing a company where you will be empowered to shape your career in the way you'd like, where you'll be supported and inspired by a collaborative community of colleagues around the world, and where you'll be able to reimagine what's possible. Join us and help the world's leading organizations unlock the value of technology and...
-
Data Analyst Engineer
2 weeks ago
Bengaluru, Karnataka, India Weekday AI Full time ₹ 5,00,000 - ₹ 8,00,000This role is for one of the Weekday's clientsSalary range: Rs Rs ie INR 5-8 LPA)Min Experience: 3 yearsLocation: BengaluruJobType: full-timeData Engineering: Design, build, and maintain robust ETL/ELT pipelines for data integration and transformation. Develop and manage data lakes, data warehouses, and data marts to support analytics and reporting needs. ...
-
Data Analyst Engineer
2 weeks ago
Bengaluru, Karnataka, India Weekday AI Full time ₹ 4,00,000 - ₹ 8,00,000 per yearThis role is for one of the Weekday's clientsSalary range: Rs Rs ie INR 5-8 LPA)Min Experience: 3 yearsLocation: BengaluruJobType: full-timeRequirementsData Engineering:Design, build, and maintain robust ETL/ELT pipelines for data integration and transformation.Develop and manage data lakes, data warehouses, and data marts to support analytics and reporting...
-
AI-Enabled Data Architect
1 week ago
Bengaluru, Karnataka, India Touchcore Systems Full time ₹ 8,00,000 - ₹ 12,00,000 per yearAbout UsFounded in 2015, Touchcore Systems is a mid-sized IT company providing software solutions and services in the healthcare space. We are a team of engineers and designers building innovative and cutting-edge software solutions. We are a people-centric organization where we aim to not only make profits but create a space that motivates people to be...