Site Reliability Engineering

3 days ago


Navi Mumbai, Maharashtra, India Koantek Full time

About the Role:

We are seeking a highly skilled and experienced SREDatabricks Platform Administrator to join our DataOperations Team. In this critical role, you will be responsible for the availability, performance,Reliability and scalability of our enterprise Databricks platform. You will blend deep expertise in Databricks administration with SRE principles to automate operations, proactively identify and resolve issues, and ensure a seamless experience for our data engineering, data science, and analytics teams. You will champion best practices for platform governance, security, and cost optimization, playing a pivotal role in our data ecosystem.

Key Responsibilities:

Platform Operations & Reliability:

Design, implement, and maintain the Databricks platform infrastructure across multiple cloud environments (AWS, Azure,or GCP).

Ensure high availability, disaster recovery, and business continuity of Databricks workspaces, clusters, and associated services.

Develop and implement robust monitoring, alerting, and logging solutions for the Databricks platform using tools like Prometheus, Grafana, ELK stack, or cloud-native monitoring services (CloudWatch, Azure Monitor, GCP Operations Suite).

Proactively identify and address performance bottlenecks, resource constraints, and potential issues within the Databricks environment.

Participate in on-call rotations to respond to and resolve critical incidents swiftly, performing root cause analysis (RCA) and implementing preventative measures.

Manage and optimize Databricks clusters, including auto-scaling,instance types, and cluster policies, for both interactive and job compute workloads to ensure cost-effectiveness and performance.

Automation & Tooling:

Develop and maintain Infrastructure as Code (IaC) using tools like Bicep/Terraform or CloudFormation to automate the provisioning, configuration, and management of Databricks resources.

Automate repetitive operational tasks, deployments, and environment provisioning using scripting languages (Python,Bash) and CI/CD pipelines (Jenkins, Azure DevOps, GitLab CI).

Build and maintain custom tools and scripts to enhance Databricks platform capabilities, improve observability, and streamline workflows.

Security & Governance:

Implement and enforce Databricks security best practices, including identity and access management (IAM) with Unity Catalog, SSO integration (Azure AD, Okta), service principals, and granular access controls (RBAC, row-level/column-level security).

Ensure compliance with organizational security policies, data governance standards, and regulatory requirements (e.g., GDPR,HIPAA, industry-specific compliance).

Conduct security audits and vulnerability assessments of the Databricks environment.

Manage secrets using Databricks secrets or a cloud provider secret manager.

Performance Optimization & Cost Management:

Analyze Databricks usage patterns, DBU consumption, and cloud resource costs to identify opportunities for optimization and efficiency gains.

Implement strategies for cost control, including spot instances utilization, intelligent cluster resizing, and effective use of instance pools.

Work with data teams to optimize Spark jobs, notebooks, and SQL queries for performance and cost.

Collaboration & Mentorship:

Collaborate closely with data engineers, data scientists, architects, and other SREs to understand their requirements and provide expert guidance on Databricks best practices.

Provide technical leadership and mentorship to junior administrators and engineers, fostering a culture of reliability and operational excellence.

Stay up-to-date with the latest Databricks features, cloud services, and SRE methodologies, evaluating and recommending new technologies.



  • Mumbai, Maharashtra, India, Maharashtra Insight Global Full time

    Site Reliability EngineerLocation: Mumbai, India - working onsite 1x a week Salary: 22-25 LPATarget Start Date: January 2026Join our dynamic and highly collaborative agile team, where you'll play a pivotal role in ensuring the reliability, scalability, and efficiency of our premier InsurTech solution. Our platform enables clients to obtain quotes and issue...


  • Mumbai, Maharashtra, India Hirexa Solutions Full time ₹ 4,00,000 - ₹ 12,00,000 per year

    HI All, We are hiring for Site Reliability Engineer with one of our product-based client - Permanent hiring Skills: Should Have At least 7+ years of Experience on AWSShould have Good Hands-On Experience on Below skillsObservability/Monitoring*Python*Bash/Shell ScriptTerraform*Automation*Account PipelineService NowGitlabJira Exp: 7 to 14 Yrs CTC: Exp*2.5...


  • Navi Mumbai, Maharashtra, India Sovos Full time US$ 80,000 - US$ 16,00,000 per year

    Build your future with Sovos.If you're seeking a career where innovation meets impact, you've come to the right place. As a global leader, Sovos is transforming tax compliance from a business requirement to a force for growth while revolutionizing how businesses navigate the ever-changing regulatory landscape.At Sovos, we're dedicated to more than just...


  • Navi Mumbai, Maharashtra, India Sovos Compliance Full time US$ 2,00,000 - US$ 6,00,000 per year

    Build your future with Sovos.If you're seeking a career where innovation meets impact, you've come to the right place. As a global leader, Sovos is transforming tax compliance from a business requirement to a force for growth while revolutionizing how businesses navigate the ever-changing regulatory landscape.At Sovos, we're dedicated to more than just...


  • Mumbai, Maharashtra, India APTO SOLUTIONS - EXECUTIVE SEARCH & CONSULTANTS Full time ₹ 6,00,000 - ₹ 18,00,000 per year

    #Hiring Alert – Site Reliability Engineer L2 (SRE) Location: Mumbai - contractualExperience - 5+ YearsNotice - Immediate Joiners Apply Now: Skills & Experience:5+ years of proven tech experience.Hands-on in Data Center Operations (DCOps) – Linux installation, configuration & troubleshooting.Strong experience in Java, container technologies...


  • Mumbai, Maharashtra, India equentis Full time

    Job Title: Site Reliability Engineer (SRE)Company – Equentis Wealth Advisory LimitedLocation – Lower Parel, MumbaiJob Summary: We are seeking a talented Site Reliability Engineer (SRE) to join our team andplay a critical role in ensuring the reliability, scalability, and performance of our systems andapplications. The ideal candidate will have a strong...


  • Mumbai, Maharashtra, India Fynd Full time ₹ 8,00,000 - ₹ 24,00,000 per year

    Fynd is India's largest omnichannel platform and a multi-platform tech company specializing in retail technology and products in AI, ML, big data, image editing, and the learning space. It provides a unified platform for businesses to seamlessly manage online and offline sales, store operations, inventory, and customer engagement. Serving over 2,300 brands,...

  • Site Engineer

    1 week ago


    Navi Mumbai, Maharashtra, India Kaavi Recruitment Consultancy Full time ₹ 4,00,000 - ₹ 8,00,000 per year

    We have new openings - Site Engineer / Operation Engineer. Have worked in Oil & Gas sectorClient & vendor coordination.Supporting site engineer for all their needs.Should have handled workers, technicians at site.Preparing daily progress report.

  • Site Engineer

    2 weeks ago


    Navi Mumbai, Maharashtra, India SYSTRA Full time ₹ 10,00,000 - ₹ 15,00,000 per year

    It has been more than 60 years since SYSTRA has garnered expertise that spans the entire spectrum of Mass Rapid Transit System. SYSTRA India's valuable presence in India roots back to 1957, where SYSTRA worked on the electrification of Indian Railways. Our technical excellence, holistic approach and the tremendous talent provides a career that puts people...

  • Site Engineer

    5 days ago


    Navi Mumbai, Maharashtra, India ELEVATE PARKING SYSTEMS Full time

    Site Supervisor Job Description - Elevate Parking Systems LLPLocation: Bhandarli Village, Shil-Kalyan Phata Company: Elevate Parking Systems LLPRole SummaryThe Site Supervisor will be responsible for overseeing and managing all on-site activities related to the installation, commissioning, and maintenance of automated/multi-level parking systems. This role...