Sr Systems Engineer Linux – AI Infrastructure

3 weeks ago


India DC Tech Consulting Full time
We are seeking a highly skilled Senior Linux Administrator to join our team, focusing on the implementation and management of on-premises Linux servers optimized for AI/ML workloads. The ideal candidate will have deep expertise in core Linux system administration, with a strong foundation in configuring and optimizing servers for high-performance computing tasks. Responsibilities include deploying and maintaining robust Linux environments, automating system processes, and ensuring security and stability for AI/ML pipelines. While training on NVIDIA technologies will be provided, the candidate must demonstrate proficiency in Linux ecosystem tools, scripting, and troubleshooting complex on-premises systems. This role demands a proactive problem-solver capable of delivering reliable, high-performance infrastructure to support cutting-edge AI/ML initiatives.

Key Responsibilities

- Support deployment and maintenance of NVIDIA GPU-accelerated systems.

- Deploy and support Kubernetes clusters across various environments and distros (e.g., RKE, OpenShift, AKS, EKS, GKE).

- Perform day-to-day system administration across compute, storage, and networking layers.
- Automate infrastructure tasks using Shell scripts, Ansible, or similar tools.

- Collaborate with DevOps, data science, and engineering teams to ensure scalable, resilient infrastructure for AI/ML workloads.

- Monitor infrastructure health and performance; participate in troubleshooting and root cause analysis.

- Extensive experience in managing, configuring, and troubleshooting Linux-based systems (e.g., RHEL, Ubuntu, CentOS, Debian) in enterprise environments, including kernel tuning, system monitoring, and performance optimization.

- Hands-on experience in deploying and configuring Linux servers for AI/ML applications, including setup of GPU-accelerated environments, storage optimization for large datasets (e.g., using RAID, LVM), and ensuring system stability under intensive computational loads—note that training on NVIDIA technologies will be provided.

- Expertise in tuning Linux systems for performance, including CPU/GPU resource allocation, memory management, and I/O optimization, tailored to on-premises setups handling AI/ML training and inference workloads.

- Proven ability to diagnose and resolve intricate problems in Linux environments, such as hardware failures, network bottlenecks, or software conflicts, with a emphasis on minimizing downtime in mission-critical on-premises AI/ML systems.

Qualifications

• Min 7 years of experience in systems engineering or enterprise infrastructure roles.

• Understanding of enterprise storage, networking, and system monitoring tools.

• Scripting and automation experience (e.g., Bash, Python, Ansible).

• Strong communication, documentation, and troubleshooting skills.

• Comfortable working independently in a remote environment.

  • India DC Tech Consulting Full time

    We are seeking a highly skilled Senior Linux Administrator to join our team, focusing on the implementation and management of on-premises Linux servers optimized for AI/ML workloads. The ideal candidate will have deep expertise in core Linux system administration, with a strong foundation in configuring and optimizing servers for high-performance computing...

  • Senior Ai Engineer

    4 weeks ago


    India BugRaid AI Full time

    Company Description Bug Raid.AI harnesses advanced AIOps and AI bots to proactively manage and respond to incidents, revolutionizing the entire process.Our innovative solution integrates comprehensive incident analysis with real-time response capabilities, distinguishing us within the industry.We expedite resolution by swiftly identifying and addressing...

  • Backend Ai Engineer

    3 weeks ago


    India Coderbotics AI Full time

    Company Description Coderbotics AI is a team of passionate tech enthusiasts dedicated to revolutionizing the way software evolves.We specialize in AI-powered code migration solutions—helping businesses seamlessly transition legacy systems, refactor codebases, and manage technical debt with speed and precision.Our advanced technology and expert team ensure...


  • India Scubyt Full time

    We're hiring NCP-Certified Engineers Join us as a Network (AIN), Deployment (AII), or Operations (AIO) Engineer and help power next-gen AI infrastructure with NVIDIA H100 racks.Apply now to be part of cutting-edge AI deployments and scalable data center innovation1. Network Design & Installation Engineer (NCP-AIN Certified)Location: India REMOTEDuration:...

  • Backend AI Engineer

    4 weeks ago


    India Coderbotics AI Full time

    Company DescriptionCoderbotics AI is a team of passionate tech enthusiasts dedicated to revolutionizing the way software evolves. We specialize in AI-powered code migration solutions—helping businesses seamlessly transition legacy systems, refactor codebases, and manage technical debt with speed and precision. Our advanced technology and expert team ensure...

  • System Engineer

    3 weeks ago


    India WTMF AI Full time

    Position: System EngineerLocation: RemoteType: Full-timeExperience:1–3 years (but problem-solving instincts matter more than years on paper)At WTMF, we're building something more than just another app — we're creating an emotionally intelligent space where people feel heard, understood, and safe. Behind all the AI magic and mood-matching conversations,...


  • India GlobalLogic Full time

    Description Linux Administration Experience Range 6 yearsRequirements Education Bachelors degree in Computer Science Information Technology or related field Masters Degree Preferred xc2xb7 In depth knowledge of Linux RedHat CentOS Debian etc xc2xb7 Hands on experience with MySQL or related databasexc2xb7 Experience with server hardware and...


  • India ZettaMine Labs Pvt. Ltd. Full time

    Hello, Greetings from ZettaMine Hiring for Linux Server Administration Exp: 3 to 8 Years Location: Hyderabad,Bangalore Immediate joiners Only Job Description: Overall 3 – 8 years' Experience in Server Management Administration (Linux). Experience in Server (physical/VM) installation, maintenance & decommissioning. Profound Linux OS...


  • India Amazon Music Full time

    Job DescriptionDESCRIPTIONThe Amazon Fulfillment Technologies team in Hyderabad is looking for a Sr System Development Engineer to manage all aspects of mission-critical services. Our team of engineers innovate, automate, drive process and service improvements and manage highly available systems that power Amazon fulfillment network worldwide.The ideal...

  • Senior Sre Engineer

    4 weeks ago


    India BugRaid AI Full time

    Company Description Bug Raid.AI adopts advanced AIOPS and AI bots for proactive incident management and response, transforming the entirety of the process.By integrating sophisticated AIOPS for comprehensive incident analysis with AI bots for immediate response, Bug Raid.AI provides automated and intelligent incident handling.Our platform enables...