
AI Infrastructure Engineers – Network Design, Deployment
4 weeks ago
We're hiring NCP-Certified Engineers Join us as a Network (AIN), Deployment (AII), or Operations (AIO) Engineer and help power next-gen AI infrastructure with NVIDIA H100 racks.
Apply now to be part of cutting-edge AI deployments and scalable data center innovation
1. Network Design & Installation Engineer (NCP-AIN Certified)
Location: India REMOTE
Duration: Long Term Contract
Overview:
We are seeking a certified Network Design & Installation Engineer with deep expertise in InfiniBand and Ethernet-based networking solutions. This role is pivotal in architecting and deploying robust, high-performance network fabrics for NVIDIA H100 GPU-powered AI racks.
Key Responsibilities:
- Design and implement scalable InfiniBand/Ethernet networks to support large-scale H100 GPU clusters.
- Configure Spectrum-X switches, BlueField DPUs, and Cumulus Linux-based environments.
- Integrate networking architecture with existing data center infrastructure.
- Perform on-site installations, including racking, cable management, and connectivity validation.
- Utilize tools such as UFM and IBDiagnet to run diagnostics and optimize network performance.
- Collaborate with infrastructure and operations teams to ensure seamless deployment and expansion.
Qualifications:
- NCP-AIN certification (required) or strong equivalent hands-on experience.
- In-depth knowledge of InfiniBand, RoCE v2, Spectrum switches, BlueField DPUs, and Cumulus Linux.
- Proven experience in designing and deploying high-performance or HPC network environments.
- Willingness to travel for on-site deployments and hands-on hardware installation.
- Experience with telemetry, diagnostics, and fabric tuning tools.
2. AI Infrastructure Deployment Engineer (NCP-AII Certified)
Location: India REMOTE
Duration: Long Term Contract
Overview:
We are hiring an experienced AI Infrastructure Deployment Engineer to lead the deployment of full-stack AI infrastructure powered by NVIDIA H100 GPUs. This role focuses on validating and configuring the entire stack — from bare-metal systems to orchestration platforms — ensuring production-ready AI environments.
Key Responsibilities:
- Lead end-to-end deployment of AI racks, including servers, GPUs, switches, and interconnects.
- Validate bare-metal hardware, Spectrum-X switches, routers, and storage systems.
- Configure multi-tenant GPU environments using MIG, MPS, and virtualization tools.
- Deploy NVIDIA Base Command, DGX OS, and associated AI/ML software stacks.
- Integrate systems with Kubernetes, Helm, and other orchestration platforms.
- Implement monitoring and telemetry using DCGM, UFM, and performance benchmarking tools.
Qualifications:
- NCP-AII certification (required) or equivalent hands-on infrastructure experience.
- Expertise in GPU server configurations, MIG/MPS, Base Command, and virtualization (K8s, vSphere).
- Experience with BIOS/firmware updates, system burn-in, and power/cooling validation.
- Strong understanding of data center infrastructure and AI workload requirements.
- Experience integrating AI infrastructure with cloud-native tools and container environments.
3. AI Infrastructure Operations Engineer (NCP-AIO Certified)
Location: India REMOTE
Duration: Long Term Contract
Overview:
We are looking for a proactive and skilled AI Infrastructure Operations Engineer to manage and optimize large-scale AI clusters built with NVIDIA H100 GPUs. This role focuses on post-deployment operations — ensuring performance, reliability, and maintainability of AI infrastructure environments.
Key Responsibilities:
- Manage day-to-day operations of GPU clusters, networking fabric, and server infrastructure.
- Monitor and maintain the health of InfiniBand/Ethernet networks and DGX/H100 nodes.
- Apply firmware upgrades, OS patches, and handle infrastructure lifecycle management.
- Troubleshoot hardware, network, and container-level failures using telemetry tools like UFM and DCGM.
- Create and maintain operational runbooks, automate workflows, and improve incident response.
- Support infrastructure scaling, upgrades, and collaborate with deployment teams.
Qualifications:
- NCP-AIO certification (required) or comparable operational experience in large-scale AI environments.
- Strong troubleshooting skills across compute, network, and storage domains.
- Experience with monitoring and telemetry tools (Prometheus, Grafana, DCGM, UFM).
- Familiarity with log aggregation and alerting systems.
- Background in data center operations, capacity planning, and support automation.
How These Roles Collaborate
- NCP-AIN (Design & Install): Builds and installs the high-speed network fabric that powers AI workloads.
- NCP-AII (Deploy): Deploys and validates the full AI infrastructure stack, including hardware and software integration.
- NCP-AIO (Operate): Ensures continuous, reliable, and optimized operations of deployed AI environments.
-
Senior Ai Engineer
4 weeks ago
India BugRaid AI Full timeCompany Description Bug Raid.AI harnesses advanced AIOps and AI bots to proactively manage and respond to incidents, revolutionizing the entire process.Our innovative solution integrates comprehensive incident analysis with real-time response capabilities, distinguishing us within the industry.We expedite resolution by swiftly identifying and addressing...
-
Backend Ai Engineer
4 weeks ago
India Coderbotics AI Full timeCompany Description Coderbotics AI is a team of passionate tech enthusiasts dedicated to revolutionizing the way software evolves.We specialize in AI-powered code migration solutions—helping businesses seamlessly transition legacy systems, refactor codebases, and manage technical debt with speed and precision.Our advanced technology and expert team ensure...
-
Backend AI Engineer
4 weeks ago
India Coderbotics AI Full timeCompany DescriptionCoderbotics AI is a team of passionate tech enthusiasts dedicated to revolutionizing the way software evolves. We specialize in AI-powered code migration solutions—helping businesses seamlessly transition legacy systems, refactor codebases, and manage technical debt with speed and precision. Our advanced technology and expert team ensure...
-
Senior Sre Engineer
4 weeks ago
India BugRaid AI Full timeCompany Description Bug Raid.AI adopts advanced AIOPS and AI bots for proactive incident management and response, transforming the entirety of the process.By integrating sophisticated AIOPS for comprehensive incident analysis with AI bots for immediate response, Bug Raid.AI provides automated and intelligent incident handling.Our platform enables...
-
Sr Systems Engineer Linux – AI Infrastructure
4 weeks ago
India DC Tech Consulting Full timeWe are seeking a highly skilled Senior Linux Administrator to join our team, focusing on the implementation and management of on-premises Linux servers optimized for AI/ML workloads. The ideal candidate will have deep expertise in core Linux system administration, with a strong foundation in configuring and optimizing servers for high-performance computing...
-
Sr Systems Engineer Linux – AI Infrastructure
4 weeks ago
India DC Tech Consulting Full timeWe are seeking a highly skilled Senior Linux Administrator to join our team, focusing on the implementation and management of on-premises Linux servers optimized for AI/ML workloads. The ideal candidate will have deep expertise in core Linux system administration, with a strong foundation in configuring and optimizing servers for high-performance computing...
-
Network Infrastructure Engineer
20 hours ago
India Nityo Infotech Full timeJob Title- Network Infrastructure - Data RemediationYears of experience 5-8 yearsRole- PermanentLocation - RemoteDomain- Telecom onlyTransmission data remediation specialistKey skills & Technical expertise- 5 to 6 years mid to senior experience in the following areas:- Network engineering/ transformation experience in Telco- Network infrastructure link...
-
Staff ML Engineer
4 weeks ago
India Adalat AI Full timeAbout Adalat AIAdalat AI is a legal-tech nonprofit revolutionizing the Indian judicial system through cutting-edge AI. We are building the country's first end-to-end justice tech stack — from speech-to-text transcription in courtrooms to intelligent legal assistants — to eliminate judicial delays and make justice more accessible.We currently operate...
-
Lead Infrastructure Engineer
2 weeks ago
India JPMorgan Chase Full timeAssume a vital position as a key member of a high-performing team that delivers infrastructure and performance excellence Your role will be instrumental in shaping the future at one of the world s largest and most influential companies As a Lead Infrastructure Engineer at JPMorgan Chase within the Infrastructure Platforms team you apply deep knowledge of...
-
AI Engineer
3 weeks ago
India TechKareer Full timeJob Title: Backend Systems Engineer – Agent & Workflow InfrastructureLocation: RemoteEmployment Type: Full-timeWhat You'll Do:- Build and scale a FastAPI backend powering multi-agent workflows.- Design workflow orchestration pipelines with Temporal or similar to plan multi-step actions, handle fallbacks, and reason over user intent.- Implement memory and...