Senior Engineer I
1 week ago
Dive in and do the best work of your career at DigitalOcean. Journey alongside a strong community of top talent who are relentless in their drive to build the simplest scalable cloud. If you have a growth mindset, naturally like to think big and bold, and are energized by the fast-paced environment of a true industry disruptor, you'll find your place here. We value winning together—while learning, having fun, and making a profound difference for the dreamers and builders in the world.
We are seeking a skilled DevOps and AI Cloud Infrastructure Engineer to provision, deploy, manage, and optimize our GPU-based compute environment, ensuring high availability, performance, and security for compute-intensive workloads. The ideal candidate will have expertise in Linux system administration, cloud platforms, containerization, GPU hardware management, and cluster computing, with a focus on supporting AI/ML and high-performance computing (HPC) workloads. In this role, you will also provide technical support to investigate and resolve customer-reported issues related to the GPU-based compute environment. You will work closely with architects, AI engineers, and software developers to ensure seamless deployment, scalability, and reliability of our cloud-based AI/ML pipelines and GPU-based compute environments.
What You'll Be Doing:- Infrastructure Management: Provision, deploy, and maintain scalable, secure, and high-availability cloud infrastructure on platforms such as Digital Ocean Cloud to support AI workloads.
- Documentation: Maintain clear documentation for infrastructure setups, and processes.
- System Management: Administer and maintain Linux-based servers and clusters optimized for GPU compute workloads, ensuring high availability and performance.
- GPU Infrastructure: Configure, monitor, and troubleshoot GPU hardware (e.g., NVIDIA GPUs) and related software stacks (e.g., CUDA, cuDNN) for optimal performance in AI/ML and HPC applications.
- Troubleshooting: Diagnose and resolve hardware and software issues related to GPU compute nodes and performance issues in GPU clusters.
- High-Speed Interconnects: Implement and manage high-speed networking technologies like RDMA over Converged Ethernet (RoCE) to support low-latency, high-bandwidth communication for GPU workloads.
- Automation: Develop and maintain Infrastructure as Code (IaC) using tools like Terraform, Ansible to automate provisioning and management of resources.
- CI/CD Pipelines: Build and optimize continuous integration and deployment (CI/CD) pipelines for testing GPU-based servers and managing deployments using tools like GitHub Actions.
- Containerization & Orchestration: Build and manage LXC-based containerized environments to support cloud infrastructure and provisioning toolchains
- Monitoring & Performance: Set up and maintain monitoring, logging, and alerting systems (e.g., Prometheus, Victoria Metrics, Grafana) to track system performance, GPU utilization, resource bottlenecks, and uptime of GPU resources.
- Security and Compliance: Implement network security measures, including firewalls, VLANs, VPNs, and intrusion detection systems, to protect the GPU compute environment and comply with standards like SOC 2 or ISO 27001.
- Cluster Support: Collaborate with other engineers to ensure seamless integration of networking with cluster management tools like Slurm, or PBS Pro.
- Scalability: Optimize infrastructure for high-throughput AI workloads, including GPU and auto-scaling configurations.
- Collaboration: Work closely with Architects, Software engineers to streamline model deployment, optimize resource utilization, and troubleshoot infrastructure issues.
- Experience: 3+ years of experience in DevOps, Site Reliability Engineering (SRE), or cloud infrastructure management, with at least 1 year working on GPU-based compute environments in the cloud.
- Linux Administration: Strong knowledge of Linux system administration for managing network services and tools in a GPU compute environment.
- High-Speed Interconnects: Experience with high-performance networking technologies like RoCE, or 100GbE Ethernet in compute-intensive environments.
- GPU-Specific Networking: Proficiency with NVIDIA GPU networking technologies, such as Mellanox ConnectX adapters, and configuring Netplan to support their drivers and firmware.
- Cloud Platforms: Hands-on experience with at least one major cloud provider (AWS, Azure, GCP).
- Networking & Security: Knowledge of networking concepts (VPC, subnets) and security best practices (IAM, encryption, firewall configurations).
- Container Technologies: Proficiency in LXC and Docker for container orchestration and management.
- IaC Tools: Expertise in Infrastructure as Code tools such as Terraform, and Ansible.
- CI/CD Tools: Experience with CI/CD pipelines using Jenkins, GitHub Actions, or similar tools.
- Scripting & Programming: Strong scripting skills in Python, Bash, or similar languages; familiarity with Go or other programming languages is a plus.
- Monitoring Tools: Experience with monitoring and logging tools like Prometheus, Victoria metrics, and Grafana.
- Problem-Solving: Strong analytical and troubleshooting skills to resolve complex infrastructure and performance issues.
- Communication: Excellent collaboration and communication skills to work with cross-functional teams.
- Experience with GPU-based workloads and familiarity with AI/ML frameworks like TensorFlow or PyTorch.
- Knowledge of configuring Netplan to work with cloud-specific networking features like VPCs or virtual network interfaces
*This is role located in Hyderabad, India
#LI-Hybrid
-
Senior Engineer
1 week ago
Hyderabad, Telangana, India Exyte Full time ₹ 12,00,000 - ₹ 36,00,000 per yearJob Title: Senior Engineer - I&CLocation: HyderabadWork Mode: Work from officeJob Summary:The instrumentation and Control (I&C) engineer is responsible for engineering design at various phases of the project to meet project requirements in the field of instrumentation and process automation. The engineer shall be involved in proposal preparation along with...
-
Automation Engineer I
23 hours ago
Hyderabad, Telangana, India TechnipFMC Full time ₹ 9,00,000 - ₹ 12,00,000 per yearJob Purpose TechnipFMC is a global leader in the energy industry, specialized in subsea and surface technologies. With our proprietary technologies and production systems, integrated expertise, and comprehensive solutions, we are transforming our clients' project economics. To learn more about how we are enhancing the performance of the world's energy...
-
Software Engineer I
5 days ago
Hyderabad, Telangana, India Spacelabs Healthcare Full time US$ 80,000 - US$ 1,20,000 per yearOverviewSoftware Engineer-I will be involved in the development of software technologies for medical devices. The right candidate will be proactive, with great communication skills, demonstrate attention to details, have a passion for technology, and an excitement to produce great products. Software Engineer-I shall be responsible for the development of...
-
Senior Engineer
3 days ago
Hyderabad, Telangana, India Cyient Full time ₹ 9,00,000 - ₹ 12,00,000 per yearJob Title: Senior Engineer - Instrumentation & ControlsJob DescriptionYour responsibilities:Should have experience in researching and selecting components within a scope, including IO components, motor controllers, eFuses, etc.In PLC hardware engineering, you'll design, install, and maintain the physical components like controllers and I/O modules....
-
Senior Civil Engineering
5 days ago
Hyderabad, Telangana, India VB® Engineering (I) Pvt Ltd Full time ₹ 15,00,000 - ₹ 25,00,000 per yearRole DescriptionThis is a full-time on-site role for a Senior Civil Engineer, located in Telangana, India. The Senior Civil Engineer will be responsible for designing and managing civil engineering projects, overseeing water resource management, and ensuring effective project management. Day-to-day tasks will include planning and executing engineering...
-
Senior Engineer I
7 days ago
Hyderabad, Telangana, India DigitalOcean Full time ₹ 10,00,000 - ₹ 25,00,000 per yearDive in and do the best work of your career at DigitalOcean. Journey alongside a strong community of top talent who are relentless in their drive to build the simplest scalable cloud. If you have a growth mindset, naturally like to think big and bold, and are energized by the fast-paced environment of a true industry disruptor, you'll find your place here....
-
Senior Test Automation Engineer
23 hours ago
Hyderabad, Telangana, India Luxoft Full time ₹ 6,00,000 - ₹ 18,00,000 per yearProject description Information and Document Systems is a global technology change and delivery organization comprising nearly 200 individuals located mostly in Switzerland, Poland, and Singapore. Providing global capturing and document processing, archiving, and retrieval solutions to all business divisions focusing on supporting Legal, Regulatory, and...
-
Junior Mechanical Engineer
1 week ago
Hyderabad, Telangana, India Sri Avantika Contractors (I) limited Full time ₹ 1,20,000 - ₹ 2,40,000 per yearJob Title: Jr Mechanical Engineer – FresherLocation: OrissaQualification: Diploma / I T I in MechanicalExperience: 0–1 Year (Freshers can apply)Job Type: Full TimeJob Description:We are looking for a highly motivated and enthusiastic Mechanical Engineer (Fresher) to join our team. The candidate will be responsible for supporting production, quality, and...
-
Software Engineer I
5 days ago
Hyderabad, Telangana, India Electronic Arts (EA) Full time ₹ 6,00,000 - ₹ 18,00,000 per yearDescription & RequirementsElectronic Arts creates next-level entertainment experiences that inspire players and fans around the world. Here, everyone is part of the story. Part of a community that connects across the globe. A place where creativity thrives, new perspectives are invited, and ideas matter. A team where everyone makes play happen.Platform...
-
Engineer I
5 days ago
Hyderabad, Telangana, India Silicon Labs Full time ₹ 5,00,000 - ₹ 12,00,000 per yearSilicon Labs (NASDAQ: SLAB) is the leading innovator in low-power wireless connectivity, building embedded technology that connects devices and improves lives. Merging cutting-edge technology into the world's most highly integrated SoCs, Silicon Labs provides device makers the solutions, support, and ecosystems needed to create advanced edge connectivity...