Database Site Reliability Engineer

4 days ago


Hyderabad, Telangana, India Lilly Full time
Job Description

Lillyu2019s Purpose:u202Fu00A0u00A0

Come help us unlock the power of SQL Operations through AI & Automationu00A0

The Enterprise Data organization is actively looking for an SQL Database Site Reliability Engineer to join our team. For solving complex challenges and driving large-scale impact here is the opportunity to shape the future of SQL database operations across on-prem SQL and Cloud SQL PaaS, and multi-cloud environments (AWS, Azure, GCP) through automation and AI-driven solutions.u00A0

Job Summary:u202Fu00A0

SQL Database Site Reliability Engineer will be supporting mission-critical SQL server environments across on-prem, Cloud SQL PaaS.

Also responsible for driving automation of day-to-day DBA tasks, enabling multi-cloud database operations and migrations, and ensuring cost-effective and secure environments.

u00A0The role provides opportunities to work with Oracle PaaS/IaaS services, qualify and adopt new database technologies, and implement CI/CD pipelines to automate deployments.u00A0

As part of this role, you will Collaborate with SMEs, Cloud Infrastructure, and Self-Service Enabling teams to bring enhancements and automation to database operations.u00A0

Improve service qualification and database lifecycle management through automation at an enterprise scale.u00A0

Ensure compliance with Lilly security standards by performing vulnerability scanning, timely patching, and upgrades.u00A0Work on AI-driven insights to improve observability, reliability, and proactive incident prevention.u00A0

Roles & Responsibilities:u00A0

Operational Support: Provide day-to-day administration, monitoring, and support for SQL Server all the editions (2014, 2016 and above). Manage database performance tuning, backup/recovery, patching, and upgrades. Ensure database availability, resiliency, and security compliance. Handle Database INC, Requests and Changes. Provide 24x7 production support with on-call rotation during weekends.

Database Administration:u00A0 Install, configure, upgrade, and patch Microsoft SQL Server (SQL 2014 and above all editions). Monitor database performance, tuning queries, indexes, and optimizing execution plans.Troubleshoot database issues, errors, and performance bottlenecks. Work closely with application teams to support new server and database creations, maintenance job creation, CDC configuration.

Multi-cloud Database Modernization: Support and administrate SQL server migrations to various Cloud platforms. Ensure to support both Homogeneous and Heterogeneous Database migration. Assist with database replication, DR solutions, and cross-cloud failover planning.u00A0

Automation of Operational Tasks: Develop and maintain scripts, pipelines, and Infrastructure-as-Code (IaC) (Bicep, ARM, Terraform, Ansible, Shell, Python) to automate routine database tasks. Implement AIOps practices for proactive issue detection, anomaly detection, and predictive alerting.

Collaboration & Continuous Improvement: Work closely with operations, application, and cloud engineering teams to deliver reliable database services.u00A0Drive innovation through automation, monitoring, and AI-driven solutions to reduce manual efforts.u00A0Document best practices, runbooks, and operational procedures.u00A0u00A0

Capacity Planning & Scalability: Forecast database growth, manage resource utilization, and ensure scalability to handle increasing workloads while maintaining performance.

Database Security & Compliance: Ensure compliance with enterprise security policies (e.g., encryption, access control, auditing, vulnerability scans). Maintain adherence to Lilly security standards as required. Ensure providing less permissions to customers accounts and manage it.

Backup & Recovery Management: Design, automate, and test robust backup and restore processes (full, differential, transaction log backups) by storing the backup copies in PPDM (DELL Storage solution). Regularly validate recovery time objective (RTO) and recovery point objective (RPO). Ensure high availability of PPDM storages and backups.

Cost Optimization & Resource Efficiency: Optimize SQL resource allocation in both on-premises and cloud (Azure SQL DB, Managed Instance, IaaS VMs). Ensure cost efficiency by monitoring and right-sizing workloads.

SLA & SRE Principles Enforcement: Define and enforce SLIs (Service Level Indicators), SLOs (Service Level Objectives), and SLAs (Service Level Agreements) for SQL services, aligning with SRE principles.

Disaster Recovery (DR) Drills & Testing: Work with DR team to perform regular DR drills to validate business continuity, ensuring systems can failover and recover as expected.

Documentation & Knowledge Sharing: Maintain detailed documentation of database configurations, operational runbooks, and troubleshooting guides. Share knowledge across teams to reduce operational silos.

Soft Skills:

- Strong problem-solving, troubleshooting, and analytical mindset with a focus on reliability and continuous improvement.

- Excellent communication and collaboration abilities to work effectively with cross-functional and global teams.

- Adaptability to dynamic environments and ability to manage multiple priorities in fast-paced operations.

- Commitment to accountability, ownership, and driving results through innovation and automation.

- Willingness to participate in 24u00D77 operational support with an on-call rotation.

Your Qualification:

- Bacheloru2019s degree in computer science, Information Technology, MCA, or a related technical field.

- 3-12+ years of experience as an SQL Platform Engineer in an enterprise environment with relevant experience in SQL database administration plus supporting mission-critical production environments.

Additional Skills/requirements:

- SQL Server Administration: Strong expertise in SQL Server (on-premises, Azure SQL Database, and Managed Instance) administration.

- Incident & On-Call Management: Experience in on-call rotations, handling high-severity incidents, and performing RCA (Root Cause Analysis).

- Backup & Recovery: Proficiency in native SQL Server backups, restore strategies, and validating RPO/RTO objectives.

- Database Internals & Query Optimization: Deep understanding of query execution plans, indexing, partitioning, statistics, and troubleshooting T-Log file growth issues.

- Performance Tuning: Hands-on expertise in query optimization, indexing strategies, and performance troubleshooting.

- Monitoring & Observability: Experience with Splunk or equivalent monitoring/logging tools for proactive issue detection.

- Incident/Change Management Tools: Hands-on with ServiceNow or equivalent ITSM tools.

- Documentation & Compliance: Expertise in documentation to maintain Security/CSQA standards and ensure audit readiness.

- High Availability & Disaster Recovery (HA/DR): Strong knowledge of Always On Availability Groups, Failover Cluster Instances, Log Shipping, Replication, and Geo-Replication.

- Disaster Recovery Planning: Designing and executing DR plans, conducting DR drills, and validating business continuity.

- Database DevOps (DBOps): Strong understanding of DevOps practices, Git-based workflows, schema versioning, and collaboration with developers for schema changes/migrations.

- Change Data Capture (CDC): Experience configuring and managing CDC for data replication and auditing.

- Security & Compliance: Deep knowledge of RBAC, least privilege, encryption (TDE, Always Encrypted), auditing, vulnerability assessments, and penetration testing.

- Cloud Platforms: Solid understanding of Azure SQL Database, Azure Managed Instance, SQL on Azure VMs AWS experience is an additional advantage.

- Containers & Orchestration: Exposure to Kubernetes and Docker for database containerization.

- Automation & Infrastructure as Code (IaC): Experience with Ansible, Terraform, and Bicep/ARM templates for database provisioning, patching, and configuration.

- Scripting & Programming: Strong skills in PowerShell, Python, and T-SQL (PL/SQL experience is a plus for cross-platform).

- Troubleshooting & Root Cause Analysis: Skilled at analyzing logs (SQL error logs, Windows event logs, monitoring tool logs) to identify and resolve underlying issues.

- Cloud Cost Optimization: Knowledge of scaling strategies, hybrid architectures, and cost management in cloud environments.

Desirable Skills:

- Exposure to AI/ML-driven monitoring or AIOps tools for predictive insights and automated remediation.

- Familiarity with multi-cloud environments (Oracle Cloud, AWS, Azure, GCP) for database deployment and operations.

- Experience in project management methodologies (like Agile or Scrum) will be added advantage.

Lilly is dedicated to helping individuals with disabilities to actively engage in the workforce, ensuring equal opportunities when vying for positions. If you require accommodation to submit a resume for a position at Lilly, please complete the accommodation request form () for further assistance. Please note this is for individuals to request an accommodation as part of the application process and any other correspondence will not receive a response.

Lillyu00A0does not discriminate on the basis of age, race, color, religion, gender, sexual orientation, gender identity, gender expression, national origin, protected veteran status, disability or any other legally protected status.

#WeAreLilly

  • Hyderabad, Telangana, India Product based Pahrmaceutical Company Full time

    Allegis is partnering with one of the leading pharmaceutical client, headquartered in the USA, currently expanding their development center in Hyderabad, India. We are seeking to fill the position of SQL Database Site Reliability EngineerPosition Details:- Role: SQL Database Site Reliability Engineer- Experience: 3 to 12 years- Location: Hyderabad, India...


  • Hyderabad, Telangana, India Product based Pahrmaceutical Company Full time

    Allegis is partnering with one of the leading pharmaceutical client, headquartered in the USA, currently expanding their development center in Hyderabad, India. We are seeking to fill the position of SQL Database Site Reliability Engineer Position Details: Role: SQL Database Site Reliability Engineer Experience: 3 to 12 years Location: Hyderabad, India...


  • Hyderabad, Telangana, India Eli Lilly and Company Full time ₹ 15,00,000 - ₹ 20,00,000 per year

    At Lilly, we unite caring with discovery to make life better for people around the world. We are a global healthcare leader headquartered in Indianapolis, Indiana. Our employees around the world work to discover and bring life-changing medicines to those who need them, improve the understanding and management of disease, and give back to our communities...


  • Hyderabad, Telangana, India beBeeReliability Full time ₹ 2,00,00,000 - ₹ 2,50,00,000

    Database Reliability EngineerWe are seeking a skilled Database Reliability Engineer to join our team.The ideal candidate will have experience in providing day-to-day administration, monitoring, and support for Oracle databases (19c/21c and above).They should also be proficient in developing scripts/tools in Python, Shell, Ansible, Terraform, or similar to...


  • Hyderabad, Telangana, India Talent Worx Full time ₹ 9,00,000 - ₹ 12,00,000 per year

    Site Reliability Engineer (SRE)At Talent Worx, we are looking for a dedicated Site Reliability Engineer (SRE) to join our team. This role involves maintaining high availability and reliability of our services through the application of software engineering practices and systems administration skills. The ideal candidate will bridge the gap between...


  • Hyderabad, Telangana, India Talent Worx Full time

    Talent Worx is seeking a talented SRE (Site Reliability Engineer) to enhance our technology team. In this role, you will be pivotal in ensuring the reliability, performance, and availability of our applications and services.Your work will involve both software engineering and systems operations as you strive to improve customer experiences and operational...


  • Hyderabad, Telangana, India IntraEdge Full time

    Site Reliability EngineerExperience: 7+ YearsLocation: HyderabadHybrid 4-day office and 1 Day remoteSkills for Principal:Strong leadership and people management skills.Exceptional technical proficiency in Pearson's technology stack.Advanced project management capabilities.Excellent communication and collaboration skills.Adept at risk assessment and crisis...


  • Hyderabad, Telangana, India Intraedge Technologies Ltd. Full time

    L2Observability/AIOps :Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems.SRE ensures internally critical and externally visible systems have reliability and uptime appropriate to users' needs and a fast rate of improvement...


  • Hyderabad, Telangana, India Kshema General Insurance Limited Full time

    About Us: Kshema General Insurace is a leading innovator in Crop Insurance. We are building scalable, reliable, and high-performance cloud-native applications on Microsoft Azure. We are seeking a talented and passionate Site Reliability Engineer (SRE) to join our team, focusing on establishing robust observability with OpenTelemetry and driving operational...


  • Hyderabad, Telangana, India Kshema General Insurance Limited Full time

    About Us: Kshema General Insurace is a leading innovator in Crop Insurance. We are building scalable, reliable, and high-performance cloud-native applications on Microsoft Azure. We are seeking a talented and passionate Site Reliability Engineer (SRE) to join our team, focusing on establishing robust observability with OpenTelemetry and driving operational...