Site Reliability Engineer
1 month ago
Position: Site Reliability Engineer (SRE) - DataPlatform OnPremise
Job Overview:
As a Site Reliability Engineer (SRE) specializing in DataPlatform OnPremise, you will play a critical role in deployment, ensuring the reliability, scalability, and performance of our Cloudera Data Platform (CDP) infrastructure. You will collaborate closely with cross-functional teams to design, implement, and maintain robust systems that support our data-driven initiatives. The ideal candidate will have a deep understanding of Data Platform, strong troubleshooting skills, and a proactive mindset towards automation and optimization.You will play a pivotal role in ensuring the smooth functioning, operation, performance and security of large high density Cloudera-based infrastructure.
Key Responsibilities:
- Work on tasks related to implementation of Cloudera Data Platform Cloudera Data Platform on-premises and be a part of planning, installation, configuration, and integration with existing systems.
- Infrastructure Management: Manage and maintain the Cloudera-based infrastructure, ensuring optimal performance, high availability, and scalability. This includes monitoring system health, and performing routine maintenance tasks.
- Strong troubleshooting skills and operational expertise in areas such as system capacity, bottlenecks, memory, CPU, OS, storage, and networking.
- Creating Runbooks and automating them using scripting tools like Shell scripting, Python etc.
- Working knowledge with any of the configuration management tools like Terraform, Ansible or SALT
- Data Security and Compliance: Implement and enforce security best practices to safeguard data integrity and confidentiality within the Cloudera environment. Ensure compliance with relevant regulations and standards (e.g., GDPR, HIPAA, DPR).
- Performance Optimization: Continuously optimize the Cloudera infrastructure to enhance performance, efficiency, and cost-effectiveness. Identify and resolve bottlenecks, tune configurations, and implement best practices for resource utilization.
- Capacity Planning: Planning and performance tuning of Hadoop clusters, Monitor resource utilization trends and plan for future capacity needs. Proactively identify potential capacity constraints and propose solutions to address them.
- Collaborate effectively with infrastructure, network, database, application, and business intelligence teams to ensure high data quality and availability.
- Work closely with teams to optimize the overall performance of the PhonePe Hadoop ecosystem.
- Backup and Disaster Recovery: Implement robust backup and disaster recovery strategies to ensure data protection and business continuity. Test and maintain backup and recovery procedures regularly.
- Develop tools and services to enhance debuggability and supportability.
- Patches & Upgrades: Routinely apply recommended patches and perform rolling upgrades of the platform in accordance with the advisory from Cloudera, InfoSec and Compliance.
- Documentation and Knowledge Sharing: Create comprehensive documentation for configurations, processes, and procedures related to the Cloudera Data Platform. Share knowledge and best practices with team members to foster continuous learning and improvement.
- Collaboration and Communication: Collaborate effectively with cross-functional teams including data engineers, developers, and IT operations personnel. Communicate project status, issues, and resolutions clearly and promptly.
Qualifications:
- Bachelor's degree in Computer Science, Engineering, or related field.
- Proficiency in Linux system administration, shell scripting, and networking concepts including IPtables, and IPsec.
- Strong understanding of networking, open-source technologies, and tools.
- 3-5 years of experience in the design, set up, and management of large-scale Hadoop clusters, ensuring high availability, fault tolerance, and performance optimization.
- Strong understanding of distributed computing principles and experience with Hadoop ecosystem technologies (HDFS, MapReduce, YARN, Hive, Spark, etc.).
- Experience with Kerberos and LDAP.
- Strong Knowledge of databases like Mysql,Nosql,Sql server
- Hands-on experience with configuration management tools (e.g., Salt,Ansible, Puppet, Chef).
- Strong scripting skills (e.g., PERL,Python, Bash) for automation and troubleshooting.
- Experience with monitoring and logging solutions (e.g., Prometheus, Grafana, ELK stack).
- Knowledge of networking principles and protocols (TCP/IP, UDP, DNS, DHCP, etc.).
- Experience with managing *nix based machines and strong working knowledge of quintessential Unix programs and tools (e.g. Ubuntu, Fedora, Redhat, etc.)
- Excellent communication skills and the ability to collaborate effectively with cross-functional teams.
- Excellent analytical, problem-solving, and troubleshooting skills..
- Proven ability to work well under pressure and manage multiple priorities simultaneously.
Good To Have:
- Cloudera Certified Administrator (CCA) or Cloudera Certified Professional (CCP) certification preferred.
- Minimum 2 years of experience in managing and administering medium/large hadoop based environments (>100 machines), including Cloudera Data Platform (CDP) experience is highly desirable.
- Familiarity with Open Data Lake components such as Ozone, Iceberg, Spark, Flink, etc.
- Familiarity with containerization and orchestration technologies (e.g. Docker, Kubernetes, OpenShift) is a plus
- Design,develop and maintain Airflow DAGs and tasks to automate BAU processes,ensuring they are robust,scalable and efficient.
-
Site Reliability Engineer
7 months ago
Bangalore Urban, India Integra Connect Full timeAbout IntegraConnect Integra Connect delivers a comprehensive, integrated suite of cloud-based technologies and services that enable specialty groups to optimize clinical and financial performance as reimbursement shifts to value-based models. Connected by the IntegraCloud platform, the company’s core applications span population health including care...
-
Site Reliability Engineer
7 months ago
Bangalore Urban, India Integra Connect Full timeAbout IntegraConnect Integra Connect delivers a comprehensive, integrated suite of cloud-based technologies and services that enable specialty groups to optimize clinical and financial performance as reimbursement shifts to value-based models. Connected by the IntegraCloud platform, the company’s core applications span population health including care...
-
Site Reliability Engineer
1 month ago
Bangalore Urban, India Apex Systems Full timeDevops Engineer Bengaluru & Chennai Remote Looking for an immediate Joiner • Overall 5+yrs of experience as Site Reliability Engineer /Devops Engineer• Bachelor’s or master’s Degree in software engineering, computer science, or in a related technical field• Familiarity with Infrastructure as Code (e.g. Terraform & CloudFormation)• Has a focus in...
-
Senior Site Reliability Engineer
1 week ago
Bangalore Urban, India Ushur Full timeLocation: BangaloreExperience: 6-8 YearsWork Mode: Hybrid/RemoteThe RoleSenior Site Reliability Engineers at Ushur perform a unique blend of customer support engineering, solution engineering, and operational engineering. You will work on our largest customers’ most complex problems and craft intuitive, elegant solutions. You’ll also proactively work...
-
Site Reliability Engineer
3 weeks ago
Bangalore, India Tranzeal Incorporated Full timeJob Title: Site Reliability Engineer (SRE) Location: Bangalore We're hiring a Site Reliability Engineer to join our team in Bangalore! If you have a strong background in maintaining and scaling cloud services and love automating infrastructure at scale, this is for you. Experience with Ansible and Kubernetes is a MUST-HAVE Key...
-
Site reliability engineer
6 days ago
Bangalore, India BCE Global Tech Full timeAt BCE Global Tech, immerse yourself in exciting projects that are shaping the future of both consumer and enterprise telecommunications. This involves building innovative mobile apps to enhance user experiences and enable seamless connectivity on-the-go. If you are passionate about technology and eager to make a difference, we want to hear from you! Apply...
-
Site Reliability Engineer
2 weeks ago
Bangalore, India Tranzeal Incorporated Full timeJob Title: Site Reliability Engineer (SRE) Location: Bangalore, KA Work Mode: Office (5Days/Week) Position Type: Contract based We're hiring a Site Reliability Engineer to join our team in Bangalore! If you have a strong background in maintaining and scaling cloud services and love automating infrastructure at scale, this is for you. ...
-
Site Reliability Engineer
3 months ago
bangalore, India tsworks Full timeWho We Are tsworks Technologies India Private Limited (subsidiary of The Software Works, Inc, USA) is a technology product and services company. Our mission is to provide domain expertise, innovative solutions and thought leadership to empower businesses to thrive in a digital world. We value our employees, take pride in providing best value in customer...
-
Site reliability engineer
4 weeks ago
Bangalore, India Tsworks Full timeWho We Are tsworks Technologies India Private Limited (subsidiary of The Software Works, Inc, USA) is a technology product and services company. Our mission is to provide domain expertise, innovative solutions and thought leadership to empower businesses to thrive in a digital world. We value our employees, take pride in providing best value in customer...
-
Site reliability engineer
3 weeks ago
Bangalore, India Randstad Digital Full timeJob Title: Site Reliability Engineering Location: Bengalore Experience: 6-8 Years Job Description: Summary: As an Application Developer, you will design, build, and configure applications to meet business process and application requirements. You will collaborate with the team to ensure smooth operations and provide solutions to problems. Your...
-
Site Reliability Engineer
3 months ago
bangalore, India Integra Connect Full timeAbout IntegraConnect Integra Connect delivers a comprehensive, integrated suite of cloud-based technologies and services that enable specialty groups to optimize clinical and financial performance as reimbursement shifts to value-based models. Connected by the IntegraCloud platform, the company’s core applications span population health including care...
-
Site reliability engineer
2 months ago
Bangalore, India Integra Connect Full timeAbout Integra Connect Integra Connect delivers a comprehensive, integrated suite of cloud-based technologies and services that enable specialty groups to optimize clinical and financial performance as reimbursement shifts to value-based models. Connected by the Integra Cloud platform, the company’s core applications span population health including...
-
Site Reliability Engineer
2 months ago
Bangalore, India Integra Connect Full timeAbout IntegraConnect Integra Connect delivers a comprehensive, integrated suite of cloud-based technologies and services that enable specialty groups to optimize clinical and financial performance as reimbursement shifts to value-based models. Connected by the IntegraCloud platform, the company’s core applications span population health including care...
-
Site Reliability Engineer
2 weeks ago
Bangalore, India BCE Global Tech Full timeAt BCE Global Tech, immerse yourself in exciting projects that are shaping the future of both consumer and enterprise telecommunications. This involves building innovative mobile apps to enhance user experiences and enable seamless connectivity on-the-go. If you are passionate about technology and eager to make a difference, we want to hear from you!...
-
Site Reliability Engineer
1 month ago
bangalore, India tsworks Full timeWho We Aretsworks Technologies India Private Limited (subsidiary of The Software Works, Inc, USA) is a technology product and services company. Our mission is to provide domain expertise, innovative solutions and thought leadership to empower businesses to thrive in a digital world. We value our employees, take pride in providing best value in customer...
-
Site Reliability Engineer
4 weeks ago
Bangalore, India Randstad Digital Full timeJob Title: Site Reliability Engineering Location: Bengalore Experience: 6-8Years Job Description: Summary: As an Application Developer, you will design, build, and configure applications to meet business process and application requirements. You will collaborate with the team to ensure smooth operations and provide solutions to problems. Your...
-
Site reliability engineer
13 hours ago
Bangalore, India Karix Full timeRole: Site Reliability Engineer Location: Bangalore (WFO) About the role: We are seeking an experienced professional Site Reliability Engineer who acts as a bridge between development and IT operations, taking operational tasks to ensure the efficient functioning of Service platforms. They are responsible for monitoring, automating, and improving...
-
Site Reliability Engineer
1 month ago
bangalore, India Integra Connect Full timeAbout IntegraConnectIntegra Connect delivers a comprehensive, integrated suite of cloud-based technologies and services that enable specialty groups to optimize clinical and financial performance as reimbursement shifts to value-based models. Connected by the IntegraCloud platform, the company’s core applications span population health including care...
-
Site Reliability Engineer
2 weeks ago
Bangalore, India Qure.ai Full timeAbout the job Job Title: Site Reliability Engineer Department: Engineering Location: Bangalore Years of experience: 2-5 years Type: Full Time Employment About Qure.ai: Qure.ai is one of the fastest-growing startups in India, which develops Artificial Intelligence enabled products and platforms for healthcare diagnostics. We create...
-
Site Reliability Engineer
7 months ago
bangalore, India Integra Connect Full timeAbout IntegraConnectIntegra Connect delivers a comprehensive, integrated suite of cloud-based technologies and services that enable specialty groups to optimize clinical and financial performance as reimbursement shifts to value-based models. Connected by the IntegraCloud platform, the company’s core applications span population health including care...