Freelance Senior Kubernetes Reliability

3 weeks ago


India ThreatXIntel Full time
Company Description

ThreatXIntel is a startup cyber security company dedicated to protecting businesses and organizations from cyber threats. The company offers services including cloud security, web and mobile security testing, cloud security assessment, DevSecOps, and more. ThreatXIntel takes a proactive approach to security, continuously monitoring and testing clients' digital environments to identify vulnerabilities before they can be exploited.

Role Description

We are looking for a senior-level freelance engineer to ensure platform reliability and design a policy-driven traffic enforcement layer for our Kubernetes-hosted Secure Access Service Edge (SASE) platform. This role will focus on achieving high availability, deep observability, and fair multi-tenant performance across distributed ingress points and cloud-native workloads.

Responsibilities

- Own platform stability, uptime, and SLA attainment across multi-region Kubernetes environments.
- Design fault-tolerant Kubernetes architectures with automated scaling, disaster recovery, and change management.
- Build and operate a complete observability stack (Prometheus, Grafana, OpenTelemetry, Jaeger, Loki) including golden signal monitoring, distributed tracing, and automated remediation.
- Develop a Kubernetes-native traffic control plane with dynamic per-tenant/session/bandwidth enforcement using CRDs, custom controllers, and Cilium.
- Optimize network policy enforcement, integrate telemetry pipelines, and operate a service mesh (Istio or Linkerd) for secure traffic routing.
- Automate cluster operations with GitOps workflows (Helm, ArgoCD, Flux, Terraform) and implement FinOps practices for cost optimization.

Required Skills & Experience

- 5+ years managing production Kubernetes environments with 99.9%+ availability.
- Expertise in observability tools, Linux networking (tc, nftables, conntrack, iptables, WireGuard), and service mesh.
- Strong programming in Go for Kubernetes controllers, plus scripting skills in Python/Bash.
- Experience with OpenStack integration, CNI plugins (Cilium preferred), and dynamic policy enforcement.

  • India ThreatXIntel Full time

    Company DescriptionThreatXIntel is a startup cyber security company dedicated to protecting businesses and organizations from cyber threats. The company offers services including cloud security, web and mobile security testing, cloud security assessment, DevSecOps, and more. ThreatXIntel takes a proactive approach to security, continuously monitoring and...


  • India beBeeReliability Full time US$ 1,25,000 - US$ 1,75,000

    Reliability EngineerAbout us:We're a Document Workflow platform that converts unstructured documents into structured, actionable data with the help of Agentic Workflows. We have strong backing from investors and are trusted by leading banks and fintechs worldwide.The opportunity as Senior DevOps / SRE Engineer: Join our team to lead a small group of...


  • India PriyaQubit Privated Full time

    Freelancer / Upwork / LinkedIn Bidder Type: Internship (3–6 months)Location: Remote Stipend: 3k to 5k monthly (Incentives per project.)Opportunity: Pre-Placement Offer (PPO) for top performersAbout UsPriyaQubit Pvt. Ltd. is a services and product-based company based in Hyderabad, Telangana, with a strong presence in AI, ML, Image Processing, Quantum...


  • India beBeeExpertise Full time ₹ 24,56,888 - ₹ 32,13,625

    Job Title: Senior Cloud Reliability EngineerAbout the RoleWe are seeking a highly skilled and experienced Senior Cloud Reliability Engineer to join our team. As a key member of our Platform Engineering Practice, you will design, manage, and scale large-scale observability infrastructure.Your primary focus will be on ensuring the high availability and...


  • India iVedha Inc. Full time

    Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering PracticeLocation: India (Remote) - Must be available to work in the EST (US/Canada) Time Zone.Role Summary:Are you a Senior Site Reliability Engineer (SRE) with deep ELK expertise, ready to take ownership of large-scale observability infrastructure?We're looking for an SRE with 7+...


  • India CES Full time

    Job DescriptionWe are seeking a hands-on SRE with expertise in infrastructure automation, cloud scalability, and performance optimization. Youll design, manage, and monitor large-scale AWS environments, ensuring high availability, security, and reliability for our SaaS platformsKey Responsibilities- Develop and execute UI automation using Cypress with...


  • India CES Full time

    We are seeking a hands-on SRE with expertise in infrastructure automation, cloud scalability, and performance optimization. You'll design, manage, and monitor large-scale AWS environments, ensuring high availability, security, and reliability for our SaaS platformsKey ResponsibilitiesDevelop and execute UI automation using Cypress with TypeScript.Conduct...


  • India MindBrain Full time

    Position SITE Reliability Engineer Budget- 1.7 LPMExp- 10 yrsDuration- 6 monthsTechnical Skills:Programming: Proficiency in languages like Python.Operating Systems: Deep understanding of Linux/Windows operating systems and networking concepts. Cloud Technologies: Experience with Azure including services, architecture, and best practices. Containerization and...


  • India beBeeCloudReliability Full time ₹ 1,80,00,000 - ₹ 2,20,00,000

    Key Responsibilities:Design and implement scalable, secure, and cost-efficient cloud infrastructure.Automate infrastructure provisioning using Terraform / CloudFormation and AWS SDKs.Manage containerized workloads (Docker, Kubernetes) and automate deployment utilities.Develop automation scripts and tooling using Python (Boto3).Implement monitoring/alerting...


  • India Xebia Full time

    We are looking for a highly skilled AWS Engineer with strong Python development and Chaos Engineering expertise to design, build, and validate resilient, scalable, and automated cloud-native environments. The ideal candidate will combine cloud engineering, DevOps, and chaos experimentation to improve reliability, fault tolerance, and operational efficiency...