Site Reliability Engineer

2 months ago


bangalore, India PhonePe Full time

Job Overview:


As a Site Reliability Engineer (SRE) specializing in DataPlatform OnPremise, you will play a critical role in deployment, ensuring the reliability, scalability, and performance of our Cloudera Data Platform (CDP) infrastructure. You will collaborate closely with cross-functional teams to design, implement, and maintain robust systems that support our data-driven initiatives. The ideal candidate will have a deep understanding of Cloudera Data Platform, strong troubleshooting skills, and a proactive mindset towards automation and optimization. You will play a pivotal role in ensuring the smooth functioning, operation, performance and security of large high density Cloudera-based infrastructure.

Key Responsibilities:

  • Implementation of Cloudera Data Platform: Lead the implementation process of Cloudera Data Platform on-premises, including planning, installation, configuration, and integration with existing systems.
  • Infrastructure Management: Manage and maintain the Cloudera-based infrastructure, ensuring optimal performance, high availability, and scalability. This includes monitoring system health, and performing routine maintenance tasks.
  • Strong troubleshooting skills and operational expertise in areas such as system capacity, bottlenecks, memory, CPU, OS, storage, and networking.
  • Data Security and Compliance: Implement and enforce security best practices to safeguard data integrity and confidentiality within the Cloudera environment. Ensure compliance with relevant regulations and standards (e.g., GDPR, HIPAA, DPR).
  • Performance Optimization: Continuously optimize the Cloudera infrastructure to enhance performance, efficiency, and cost-effectiveness. Identify and resolve bottlenecks, tune configurations, and implement best practices for resource utilization.
  • Capacity Planning: Planning and performance tuning of Hadoop clusters, Monitor resource utilization trends and plan for future capacity needs. Proactively identify potential capacity constraints and propose solutions to address them.
  • Collaborate effectively with infrastructure, network, database, application, and business intelligence teams to ensure high data quality and availability.
  • Work closely with teams to optimize the overall performance of the PhonePe Hadoop ecosystem.
  • Backup and Disaster Recovery: Implement robust backup and disaster recovery strategies to ensure data protection and business continuity. Test and maintain backup and recovery procedures regularly.
  • Develop tools and services to enhance debuggability and supportability.
  • Patches & Upgrades: Routinely apply recommended patches and perform rolling upgrades of the platform in accordance with the advisory from Cloudera, InfoSec and Compliance.
  • Documentation and Knowledge Sharing: Create comprehensive documentation for configurations, processes, and procedures related to the Cloudera Data Platform. Share knowledge and best practices with team members to foster continuous learning and improvement.
  • Collaboration and Communication: Collaborate effectively with cross-functional teams including data engineers, developers, and IT operations personnel. Communicate project status, issues, and resolutions clearly and promptly.


Qualifications:

  • Bachelor's degree in Computer Science, Engineering, or related field.
  • Proficiency in Linux system administration, shell scripting, and networking concepts including IPtables, and IPsec.
  • Strong understanding of networking, open-source technologies, and tools.
  • 5-10 years of experience in the design, set up, and management of large-scale Hadoop clusters, ensuring high availability, fault tolerance, and performance optimization.
  • Strong understanding of distributed computing principles and experience with Hadoop ecosystem technologies (HDFS, MapReduce, YARN, Hive, Spark, etc.).
  • Experience in administering Kerberos and LDAP.
  • Strong Knowledge of databases like Mysql,Nosql,Sql server
  • Hands-on experience with configuration management tools (e.g., Salt,Ansible, Puppet, Chef).
  • Strong scripting skills (e.g., PERL,Python, Bash) for automation and troubleshooting.
  • Experience with monitoring and logging solutions (e.g., Prometheus, Grafana, ELK stack).
  • Knowledge of networking principles and protocols (TCP/IP, UDP, DNS, DHCP, etc.).
  • Experience with managing *nix based machines and strong working knowledge of quintessential Unix programs and tools (e.g. Ubuntu, Fedora, Redhat, etc.)
  • Excellent communication skills and the ability to collaborate effectively with cross-functional teams.
  • Excellent analytical, problem-solving, and troubleshooting skills..
  • Proven ability to work well under pressure and manage multiple priorities simultaneously.


Good To Have:

  • Cloudera Certified Administrator (CCA) or Cloudera Certified Professional (CCP) certification preferred.
  • Minimum 5 years of experience in managing and administering medium/large hadoop based environments (>100 machines), including Cloudera Data Platform (CDP) experience is highly desirable.
  • Familiarity with Open Data Lake components such as Ozone, Iceberg, Spark, Flink, etc.
  • Familiarity with containerization and orchestration technologies (e.g. Docker, Kubernetes, OpenShift) is a plus
  • Design,develop and maintain Airflow DAGs and tasks to automate BAU processes,ensuring they are robust,scalable and efficient.



  • Bangalore, India Cyitechsearch Full time

    Job Title: Site Reliability EngineerAbout the Role:We are seeking a highly skilled Site Reliability Engineer to join our team at Cyitechsearch. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our full-stack software applications.Key Responsibilities:Develop and provide operational...


  • Bangalore, India Yogy HR Solutions Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Yogy HR Solutions. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, performance, and scalability of our cloud-based systems.Key Responsibilities:Collaborate with development partners to design and...


  • Bangalore, India Yogy HR Solutions Full time

    Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Yogy HR Solutions. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, performance, and scalability of our cloud-based systems.Key Responsibilities:Collaborate with development partners to design and implement scalable...


  • Bangalore, India Micoworks Full time

    Job Title: Site Reliability EngineerAt Micoworks, we are seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the stability, scalability, and performance of our cloud-based services.Key Responsibilities:Design, implement, and maintain scalable and reliable...


  • Bangalore, India Squareroot Consulting Pvt Ltd. Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Squareroot Consulting Pvt Ltd. in Bangalore, India. As a Site Reliability Engineer, you will be responsible for designing and implementing secure and scalable infrastructure as a service, automating infrastructure provisioning, and building tools...


  • bangalore, India Tranzeal Incorporated Full time

    Hi Everyone,One of our Direct client is Hiring Site Reliability Engineer in Bengaluru, Karnataka, India. If anyone is interested, please share your resume.Job Title: Site Reliability EngineerLocation: Bengaluru, Karnataka, India - OnsiteJob DescriptionResponsible for maintaining and scaling production services and servers across multiple data centers for...


  • Bangalore, India Wealthy Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Wealthy. As a Site Reliability Engineer, you will be responsible for designing, implementing, and maintaining reliable containerized applications using Kubernetes on GCP.Key Responsibilities:Develop and optimize SLIs, SLOs, and SLAs for critical...


  • Bangalore, India Wealthy Full time

    Job Title: Site Reliability EngineerWealthy is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for designing, implementing, and maintaining reliable containerized applications using Kubernetes on GCP.Key Responsibilities:Develop and optimize SLIs, SLOs, and SLAs for critical systems...


  • bangalore, India Tranzeal Incorporated Full time

    Hi Everyone,One of our Direct client is Hiring Site Reliability Engineer in Bengaluru, Karnataka, India. If anyone is interested, please share your resume.Job Title: Site Reliability EngineerLocation: Bengaluru, Karnataka, India - OnsiteJob DescriptionResponsible for maintaining and scaling production services and servers across multiple data centers for...


  • Bangalore, India Integra Connect Full time

    About Integra Connect Integra Connect delivers a comprehensive, integrated suite of cloud-based technologies and services that enable specialty groups to optimize clinical and financial performance as reimbursement shifts to value-based models. Connected by the Integra Cloud platform, the company’s core applications span population health including...


  • bangalore, India Integra Connect Full time

    About IntegraConnect Integra Connect delivers a comprehensive, integrated suite of cloud-based technologies and services that enable specialty groups to optimize clinical and financial performance as reimbursement shifts to value-based models. Connected by the IntegraCloud platform, the company’s core applications span population health including care...


  • Bangalore, India Integra Connect Full time

    About IntegraConnect Integra Connect delivers a comprehensive, integrated suite of cloud-based technologies and services that enable specialty groups to optimize clinical and financial performance as reimbursement shifts to value-based models. Connected by the IntegraCloud platform, the company’s core applications span population health including care...


  • bangalore, India Tranzeal Incorporated Full time

    Hi Everyone,One of our Direct client is Hiring Site Reliability Engineer in Bengaluru, Karnataka, India. If anyone is interested, please share your resume.Job Title: Site Reliability EngineerLocation: Bengaluru, Karnataka, India - OnsiteJob DescriptionResponsible for maintaining and scaling production services and servers across multiple data centers for...


  • bangalore, India Tranzeal Incorporated Full time

    Hi Everyone, One of our Direct client is Hiring Site Reliability Engineer in Bengaluru, Karnataka, India. If anyone is interested, please share your resume. Job Title: Site Reliability Engineer Location: Bengaluru, Karnataka, India - Onsite Job Description Responsible for maintaining and scaling production services and servers across multiple data...


  • bangalore, India Tranzeal Incorporated Full time

    Hi Everyone, One of our Direct client is Hiring Site Reliability Engineer in Bengaluru, Karnataka, India. If anyone is interested, please share your resume. Job Title: Site Reliability Engineer Location: Bengaluru, Karnataka, India - Onsite Job Description Responsible for maintaining and scaling production services and servers across multiple data...


  • Bangalore, India Integra Connect Full time

    About IntegraConnect Integra Connect delivers a comprehensive, integrated suite of cloud-based technologies and services that enable specialty groups to optimize clinical and financial performance as reimbursement shifts to value-based models. Connected by the IntegraCloud platform, the company’s core applications span population health including care...


  • Gurgaon/Gurugram/Bangalore, India Grizmo Labs Full time

    Job Title: Site Reliability EngineerGrizmo Labs is seeking a highly skilled Site Reliability Engineer to join our team. As a key member of our infrastructure team, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based services.Key Responsibilities:Design and implement scalable and highly available...


  • bangalore, India Integra Connect Full time

    About IntegraConnect Integra Connect delivers a comprehensive, integrated suite of cloud-based technologies and services that enable specialty groups to optimize clinical and financial performance as reimbursement shifts to value-based models. Connected by the IntegraCloud platform, the company’s core applications span population health including care...


  • Bangalore/Hyderabad/Pune, India Crox Consulting Inc Full time

    Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Crox Consulting Inc. As a key member of our engineering team, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based SaaS environment.Key Responsibilities:Develop and maintain automation scripts to improve...


  • bangalore, India Integra Connect Full time

    About IntegraConnectIntegra Connect delivers a comprehensive, integrated suite of cloud-based technologies and services that enable specialty groups to optimize clinical and financial performance as reimbursement shifts to value-based models. Connected by the IntegraCloud platform, the company’s core applications span population health including care...