Site Reliability Engineer II

1 day ago


Bengaluru, Karnataka, India American Express Full time ₹ 15,00,000 - ₹ 20,00,000 per year

You Lead the Way. We've Got Your Back.

With the right backing, people and businesses have the power to progress in incredible ways. When you join Team Amex, you become part of a global and diverse community of colleagues with an unwavering commitment to back our customers, communities and each other. Here, you'll learn and grow as we help you create a career journey that's unique and meaningful to you with benefits, programs, and flexibility that support you personally and professionally.

At American Express, you'll be recognized for your contributions, leadership, and impact—every colleague has the opportunity to share in the company's success. Together, we'll win as a team, striving to uphold our company values and powerful backing promise to provide the world's best customer experience every day. And we'll do it with the utmost integrity, and in an environment where everyone is seen, heard and feels like they belong.

Join Team Amex and let's lead the way together.

How will you make an impact in this role?

We are seeking an experienced Site Reliability Engineer to join our Big Data infrastructure team. This role focuses on ensuring the reliability, scalability, and performance of our Apache Spark-based data processing systems and broader big data ecosystem. The ideal candidate will have 5+ years of hands-on experience with distributed systems, data platforms, and SRE practices.

Key Responsibilities:

Infrastructure Management & Reliability

  • Design, implement, and maintain highly available Apache Spark clusters and big data infrastructure across cloud and on-premises environments
  • Monitor and optimize performance of distributed data processing workloads, ensuring SLA compliance and minimal downtime
  • Implement comprehensive monitoring, alerting, and observability solutions for big data pipelines and infrastructure components
  • Lead incident response and post-mortem analysis for data platform outages, implementing preventive measures to avoid recurrence

Automation & Operations

  • Develop and maintain Infrastructure as Code (IaC) solutions using tools like Terraform, Ansible, or CloudFormation for big data infrastructure provisioning
  • Build automated deployment pipelines and CI/CD workflows for Spark applications and data platform components
  • Create and maintain runbooks, operational procedures, and disaster recovery plans for critical data systems
  • Implement capacity planning and auto-scaling solutions to handle varying data processing workloads efficiently

Platform Engineering & Optimization

  • Collaborate with data engineering teams to optimize Spark job configurations, cluster sizing, and resource allocation
  • Design and implement data platform governance, security, and compliance measures
  • Evaluate and integrate new big data technologies and tools to improve platform capabilities and performance
  • Establish best practices for code deployment, configuration management, and system maintenance

Required Skills and Experience:

Technical Expertise

  • 5+ years of experience in Site Reliability Engineering, DevOps, or similar roles with focus on distributed systems
  • Deep hands-on experience with Apache Spark (Scala, Python/PySpark) and Spark cluster management (YARN, Kubernetes, or standalone)
  • Proficiency with big data ecosystem technologies including Hadoop, HDFS, Hive, Kafka, Airflow, and data lakes/warehouses
  • Strong experience with cloud platforms (AWS, GCP, or Azure) and their big data services (EMR, Dataproc, HDInsight, etc.)
  • Advanced knowledge of containerization technologies (Docker, Kubernetes) and orchestration in data processing contexts

Infrastructure & Monitoring

  • Experience with infrastructure monitoring and observability tools (Prometheus, Grafana, ELK stack, Datadog, or similar)
  • Proficiency in Infrastructure as Code tools (Terraform, CloudFormation, Ansible) for managing big data infrastructure
  • Strong Linux/Unix system administration skills and experience with configuration management tools
  • Knowledge of networking, security, and performance tuning in distributed computing environments

Programming & Automation

  • Proficient in at least one programming language (Python, Scala, Java, or Go) for automation and tooling development
  • Experience with CI/CD pipelines and version control systems (Git, Jenkins, GitLab CI, or similar)
  • Strong scripting skills (Bash, Python) for automation and operational tasks
  • Understanding of software engineering best practices including testing, code review, and documentation

Preferred Qualifications

  • Experience with stream processing frameworks (Kafka Streams, Apache Flink, or Spark Streaming)
  • Knowledge of data governance, data quality, and data lineage tools
  • Familiarity with machine learning operations (MLOps) and model deployment at scale
  • Experience with database technologies (SQL, NoSQL) and data warehouse solutions
  • Relevant certifications in cloud platforms or big data technologies

We back you with benefits that support your holistic well-being so you can be and deliver your best. This means caring for you and your loved ones' physical, financial, and mental health, as well as providing the flexibility you need to thrive personally and professionally:

  • Competitive base salaries
  • Bonus incentives
  • Support for financial-well-being and retirement
  • Comprehensive medical, dental, vision, life insurance, and disability benefits (depending on location)
  • Flexible working model with hybrid, onsite or virtual arrangements depending on role and business need
  • Generous paid parental leave policies (depending on your location)
  • Free access to global on-site wellness centers staffed with nurses and doctors (depending on location)
  • Free and confidential counseling support through our Healthy Minds program
  • Career development and training opportunities

American Express is an equal opportunity employer and makes employment decisions without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran status, disability status, age, or any other status protected by law.

Offer of employment with American Express is conditioned upon the successful completion of a background verification check, subject to applicable laws and regulations.



  • Bengaluru, Karnataka, India JPMorgan Chase Full time

    Job Category Software Engineering Play a key role in ensuring system reliability at one of the world s most iconic and largest financial institutions As a Site Reliability Engineer II at JPMorgan Chase within the Chief Administrative Office - Global Real Estate Technology you will use technology to solve business problems and leverage software...


  • Bengaluru, Karnataka, India Chase- Candidate Experience page Full time ₹ 15,00,000 - ₹ 20,00,000 per year

    Play a key role in ensuring system reliability at one of the world's most iconic and largest financial institutions.As a Site Reliability Engineer II at JPMorgan Chase within the Chief Administrative Office - Global Real Estate Technology, you will use technology to solve business problems and leverage software engineering best practices as we strive...


  • Bengaluru, Karnataka, India JPMorganChase Full time ₹ 15,00,000 - ₹ 25,00,000 per year

    Play a key role in ensuring system reliability at one of the world's most iconic and largest financial institutions.As a Site Reliability Engineer II at JPMorgan Chase within the Chief Administrative Office - Global Real Estate Technology, you will use technology to solve business problems and leverage software engineering best practices as we strive towards...


  • Bengaluru, Karnataka, India Microsoft Full time

    Microsoft is a company where passionate innovators come to collaborate envision what can be and take their careers further This is a world of more possibilities more innovation more openness and the sky is the limit thinking in a cloud-enabled world Microsofts Azure Data engineering team is leading the transformation of analytics in the world of data...


  • Bengaluru, Karnataka, India Chase- Candidate Experience page Full time ₹ 15,00,000 - ₹ 20,00,000 per year

    You're ready to gain the skills and experience needed to grow within your role and advance your career — and we have the perfect software engineering opportunity for you.As a Site Reliability Engineer at JPMorgan Chase within the Employee Platform, you will be part of an agile team dedicated to enhancing, designing, and delivering the software components...


  • Bengaluru, Karnataka, India Trintech Full time US$ 1,25,000 - US$ 1,75,000 per year

    THE ROLEThe SRE NOC Specialist role supports 24x7 delivery of Hosted and SaaS applications to global Fortune 500 clients at cloud scale. This role will focus on day to day tasks of analyzing and monitoring applications within our environment.WHO YOU AREBachelors Degree in Computer Science, Information Systems, Engineering, or equivalent experience.Excellent...


  • Bengaluru, Karnataka, India NIKE Full time ₹ 1,04,000 - ₹ 1,30,878 per year

    Site Reliability Engineer IIIndia Technology CenterWHO YOU'LL WORK WITHYou will be a part of a team of talented Site Reliability Engineers focused on delivering reliabile and observable software used by millions of athletes* around the world. You will be a part of the Resilience Engineering organization which includes Reliability Engineering, Live Site...


  • Bengaluru, Karnataka, India CES Full time

    We're looking for a highly skilled Site Reliability Engineer to help us build, manage, and scale modern infrastructure systems for high-availability applications. If you're passionate about automation, cloud platforms, and solving tough operational challenges, we would love to hear from you.Key Skills and Competencies3+ years of extensive experience with...


  • Bengaluru, Karnataka, India Enterprise Minds, Inc Full time

    We're Hiring | Site Reliability Engineer | 8-10 years


  • Bengaluru, Karnataka, India Randstad Full time

    Role: Site Reliability Engineer SummaryThe Network Engineer 2 provides technical design, planning, operation, maintenance, and advanced troubleshooting of the Bread Financials' network infrastructure. This position ensures continuity and alignment of the network administration/engineering direction. This position supports Bread Financials' strategies and...