Dataproc Lead, Spark, OSS Technologies, Google Cloud

1 day ago


Bengaluru, Karnataka, India Google Inc Full time
Job Description

Minimum qualifications:

- Bachelor's degree or equivalent practical experience.
- 5 years of experience with software development in one or more programming languages, and with data structures/algorithms.
- Experience in software development and engineering, incorporating design methodologies, leveraging open source technologies, and working with distributed computing systems, including Apache Spark, Apache Hadoop, and Apache Hive.
- Experience in Open Source technologies, Big Data, Data Analytics, Artificial Intelligence, Machine Learning, and Database Internals.

Preferred qualifications:

- Experience with database optimizations such as query and executor optimizations.
- Experience with data lakes like Apache Iceberg, Apache Hudi, Delta Lake, etc.
- Experience with Open Telemetry, JMX and other monitoring solutions.
- Experience with OSS projects like Spark, Hive, Trino, Ray, Flink, etc.
- Experience working with data science tools such as Jupyter notebooks.
- Experience developing Cloud or SaaS products.

About the job

Google Cloud's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile the list goes on and is growing every day. As a software engineer, you will work on a specific project critical to Google Cloud's needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. You will anticipate our customer needs and be empowered to act like an owner, take action and innovate. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack as we continue to push technology forward.

Cloud Dataproc enables open source data analytics users (Apache Hadoop, Spark, Trino, Flink, etc.) to lift and modernize their workloads into the cloud. Dataproc is a fast, easy-to-use, fully managed cloud service for running Apache Spark, Apache Hadoop and dozens of other OSS software in a simpler, performant and cost-efficient way. Dataproc also easily integrates with other Google Cloud Platform (GCP) services like BigQuery, Dataplex (governance, lineage), Catalog Stores to give a powerful and complete platform for data processing, analytics, and machine learning.

Google Cloud accelerates every organization's ability to digitally transform its business and industry. We deliver enterprise-grade solutions that leverage Google's cutting-edge technology, and tools that help developers build more sustainably. Customers in more than 200 countries and territories turn to Google Cloud as their trusted partner to enable growth and solve their most critical business problems.

Responsibilities

- Build high-impact customer-facing features which make Cloud Dataproc the best place to run Spark, Ray, Trino, Flink and newer technologies in the cloud.
- Define the roadmap for Open Source technologies like Spark, Ray, Trino, Flink, etc.
- Define and implement the next generation Data Lakes and Lake Houses focusing on technologies like Iceberg, Hudi and Delta.
- Optimize the open source technologies for performance and efficiency.
- Design and build software stack to take advantage of Google technologies for faster cluster setup, efficient cluster operations, comprehensive monitoring and observability.

  • Bengaluru, Karnataka, India Google Full time ₹ 15,00,000 - ₹ 20,00,000 per year

    Minimum qualifications:Bachelor's degree or equivalent practical experience.5 years of experience with software development in one or more programming languages, and with data structures/algorithms.Experience in software development and engineering, incorporating design methodologies, leveraging open source technologies, and working with distributed...


  • Bengaluru, Karnataka, India beBeeDataEngineer Full time ₹ 15,00,000 - ₹ 25,00,000

    Our organization seeks skilled professionals to lead the design and implementation of scalable data pipelines utilizing Google Cloud services. Key qualifications include expertise in at least 4 of the following: Dataproc, Dataflow, Pub/Sub, Cloud Functions, BigQuery, and GCS.Responsibilities:Effectively utilize Google Cloud services to develop and maintain...


  • Bengaluru, Karnataka, India Google Full time ₹ 15,00,000 - ₹ 20,00,000 per year

    Minimum qualifications:Bachelor's degree in Computer Science, or equivalent practical experience.5 years of customer-facing experience in technical-consulting, architecture and solution-delivery of Data-solutions in Cloud-environments.Experience in data-platform migration/modernization (DWH/DL) to cloud platforms.Experience in building, orchestrating and...


  • Bengaluru, Karnataka, India beBeeDataEngineer Full time ₹ 18,00,000 - ₹ 21,00,000

    Job OpportunityWe are looking for a skilled professional to fill this role.Impetus is hiring experienced data engineers who possess strong skills in cloud-based data processing. If you have experience working with big data technologies such as Hadoop, Spark, and PySpark, and have a good understanding of GCP core services like Google Cloud Storage and Google...

  • Data engineer

    1 day ago


    Bengaluru, Karnataka, India Impetus Full time

    Impetus is hiring for good GCP Data Engineers, If you are good in Bigdata, Spark, pyspark & GCP-Pub Sub, Dataproc, Big query etc & you are immediate joiner & can join us in 0-30 days, please share your resume at to effectively use GCP managed services e.g. Dataproc, Dataflow, pub/sub, Cloud functions, Big Query, GCS - At least 4 of these Services.Should...


  • Bengaluru, Karnataka, India beBeeData Full time ₹ 15,00,000 - ₹ 20,00,000

    Expert Data Professional SoughtDescription:We are in search of highly skilled professionals to spearhead the implementation of large-scale data processing systems using Google Cloud Platform (GCP) managed services. The ideal candidate will have expertise in Bigdata, Spark, PySpark, and GCP managed services.Achieve proficiency in utilizing GCP managed...

  • GCP cloud engineer

    5 days ago


    Bengaluru, Karnataka, India Impetus Full time

    Job Descriptions for Big data or Cloud EngineerPosition Summary:We are looking for candidates with hands on experience in Big Data with GCP cloud.Qualifications4-7 years of IT experience range is preferred.Able to effectively use GCP managed services e.g. Dataproc, Dataflow, pub/sub, Cloud functions, Big Query, GCS - At least 4 of these Services.Good to have...

  • Data Engineer

    2 days ago


    Bengaluru, Karnataka, India Impetus Full time

    Impetus is hiring for good GCP Data Engineers, If you are good in Bigdata, Spark, pyspark & GCP-Pub Sub, Dataproc, Big query etc & you are immediate joiner & can join us in 0-30 days, please share your resume at rashmeet.g.tuteja@impetus.com.Responsibilities- Able to effectively use GCP managed services e.g. Dataproc, Dataflow, pub/sub, Cloud functions, Big...


  • Bengaluru, Karnataka, India Google Inc Full time

    Job DescriptionMinimum qualifications:- Bachelor's degree in Science, Technology, Engineering, Mathematics, or equivalent practical experience.- 8 years of experience working with external partners on conversation architecture and bot designing.- 8 years of experience working with external partners on conversation architecture and bot designing.- 3 years of...

  • Data Engineer

    5 days ago


    Bengaluru, Karnataka, India Impetus Full time

    Impetus is hiring for good GCP Data Engineers, If you are good in Bigdata, Spark, pyspark & GCP-Pub Sub, Dataproc, Big query etc & you are immediate joiner & can join us in 0-30 days, please share your resume at Responsibilities Able to effectively use GCP managed services e.g. Dataproc, Dataflow, pub/sub, Cloud functions, Big Query, GCS - At least 4 of...