Principal Service Reliability Engineer
1 week ago
Job Description Summary JOB DESCRIPTION Own and scale mission-critical ERP/SaaS services while building intelligent, cloud-native capabilities. This role requires a SRE mindset combined with AI/ML expertise and strong application engineering skills across public and private cloud environments. Key Responsibilities - End-to-end service ownership: design for telemetry, security, resiliency, scalability, and performance; lead sizing/architecture; drive service health reviews and process simplification. - Incident management and prevention: lead postmortems/RCAs, coordinate fixes, define repair items, and implement data-driven prevention and continuous improvement. - AI/ML and GenAI delivery: design and integrate solutions with LLMs, RAG, agentic workflows, and conversational AI; build low-latency model serving and retraining pipelines. - Application engineering: develop performant microservices for distributed, containerized, cloud-native systems. - Automation: eliminate toil by automating operational workflows, recovery procedures, code delivery, and configuration management; build internal tools and reusable scripts/services to accelerate delivery and reduce errors. - Observability: define and implement monitoring, logging, alerting, and tracing strategies; establish SLOs/SLIs/error budgets; improve diagnostics and performance visibility for rapid triage. - Cross-functional collaboration: partner with product, operations, and data teams to translate requirements into secure, scalable solutions; communicate effectively with technical and non-technical stakeholders. Minimum Qualifications - BS/MS in Computer Science or related field; 10+ years of software engineering in cloud environments. - Strong in distributed systems/microservices using java / python; SQL/data modeling; python for AI/automation. - SRE/DevOps expertise: systems and networking fundamentals, application security, observability, performance analysis, and incident response. - Proven SDLC excellence: code quality, reviews, version control, CI/CD, testing, and release engineering. - Excellent written and verbal communication; English fluency. Preferred/Technical Skills - AI/ML/GenAI: experience with foundational models, RAG, agentic architectures; model deployment, optimization, monitoring, and retraining. - Cloud and containers: experience with containerization, orchestration, and resilient, fault-tolerant microservices. - Observability: hands-on experience designing dashboards, alerts, traces, logs, and metrics; defining SLOs/SLIs and error budgets; on-call readiness and runbook quality. - Operations: performance tuning across java / python and SQL for large-scale enterprise applications; strong Linux/Unix expertise; capacity planning and reliability reviews. - Automation and scripting: proficiency in scripting to automate operational workflows, build tooling, and CI/CD tasks (e.g., shell scripting, python, configuration-as-code, task runners). - Familiarity with enterprise ERP applications and standard DevOps tooling and practices. Qualifications Career Level - IC4 About Us As a world leader in cloud solutions, Oracle uses tomorrow's technology to tackle today's challenges. We've partnered with industry-leaders in almost every sectorand continue to thrive after 40+ years of change by operating with integrity. We know that true innovation starts when everyone is empowered to contribute. That's why we're committed to growing an inclusive workforce that promotes opportunities for all. Oracle careers open the door to global opportunities where work-life balance flourishes. We offer competitive benefits based on parity and consistency and support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs. We're committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing [Confidential Information] or by calling +1 888 404 2494 in the United States. Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.
-
Principal Site Reliability Engineer
4 hours ago
Hyderabad, Telangana, India Oracle Full time ₹ 12,00,000 - ₹ 36,00,000 per yearOracle is seeking motivated Principal Site Reliability Engineer who thrives in a fast-paced rapidly evolving technology environment. This position requires wide and overall knowledge in Mainframe zLinux, DB2, zVM, AIX. Site Reliability Engineer expected to work with multiple service and product development teams, identifying cross-team issues that...
-
Principal Service Reliability Engineer
3 weeks ago
Pune, India Amadeus Full timeJob Description Job Title Principal Service Reliability Engineer Common Accountabilities - Proficient in technical knowledge to ensure team performs at a high level. Is recognized as a leader in own area and may formally train Specialists/Senior Specialists. - Understands how main business drivers may impact on own area. Can assess complex problems with...
-
Principal Software Engineer
4 weeks ago
Karnataka, Karnataka, India NIKE Full timePRINCIPAL SITE RELIABILITY ENGINEERIndia Technology Center WHO YOU WILL WORK WITHThe Principal Site Reliability Engineer will work alongside a talented team of Site Reliability Engineers focused on delivering reliabile and observable software used by millions of athletes* around the world. You will be a part of the Resilience Engineering organization which...
-
Principal Site Reliability Engineer
2 weeks ago
hyderabad, India Oracle Full timeOracle is seeking motivated Principal Site Reliability Engineer who thrives in a fast-paced rapidly evolving technology environment. This position requires wide and overall knowledge in Linux administration, AI technologies, software development, cloud computing, networking, cloud security, performance analysis and monitoring to provide the stability,...
-
Principal Engineer, Site Reliability
4 days ago
Hyderabad, India TMUS Global Solutions Full timeAbout the Role The Principal Engineer, Site Reliability (SRE) will play a critical role in ensuring the stability, scalability, and operational excellence of Accounting and Finance platforms. This role is focused on leading the operational health of these platforms, ensuring the delivery of highly reliable financial applications and data services that meet...
-
Principal Service Reliability Engineer
3 weeks ago
Hyderabad, India Oracle Full timeJob Description Key Responsibilities - End-to-end service ownership: design for telemetry, security, resiliency, scalability, and performance lead sizing/architecture drive service health reviews and process simplification. - Incident management and prevention: lead postmortems/RCAs, coordinate fixes, define repair items, and implement data-driven prevention...
-
Principal Site Reliability Engineer
4 days ago
Hyderabad, Telangana, India Oracle Full time ₹ 20,00,000 - ₹ 60,00,000 per yearOracle is seeking motivated Principal Site Reliability Engineer who thrives in a fast-paced rapidly evolving technology environment. This position requires wide and overall knowledge in Linux administration, AI technologies, software development, cloud computing, networking, cloud security, performance analysis and monitoring to provide the stability,...
-
Principal Site Reliability Engineer
2 weeks ago
hyderabad, India JPMorgan Chase Full timeJoin a globally recognized financial organization and advance your profession to new heights by contributing to revolutionary projects. You've discovered the perfect environment to have a major impact.As a Principal Site Reliability Engineer at JPMorgan Chase within the Consumer & Community Banking division, you will leverage your advanced expertise to...
-
Principal Network Reliability Engineer
1 week ago
india Oracle Full timeDescriptionJob DescriptionThe Oracle Cloud Infrastructure (OCI) delivers mission-critical applications for top tier enterprises around the world. Our cloud offers unmatched hyper-scale, multi-tenant services deployed in more than 40 regions worldwide. The mission of our Network Reliability Engineering team is to provide exceptional network reliability and...
-
Principal Site Reliability Engineer
6 days ago
Hyderabad, India JPMorgan Chase & Co. Full timeJoin a globally recognized financial organization and advance your profession to new heights by contributing to revolutionary projects. You've discovered the perfect environment to have a major impact. As a Principal Site Reliability Engineer at JPMorgan Chase within the Consumer & Community Banking division, you will leverage your advanced expertise to...