Big Data with Pyspark

5 days ago


Remote, India Achutha Associates Full time

**Job Title**: Big Data Engineer (PySpark)
**Location**: Bengaluru, India
**Experience**: 5+ years
**Employment Type**: Full-time

**Job Summary**:
**Key Responsibilities**:

- Design, develop, and maintain scalable **big data pipelines** using **PySpark** and other big data technologies.
- Work with **Hadoop, Spark, Kafka, Hive, and other distributed data processing frameworks**.
- Optimize **ETL workflows** and ensure efficient data processing.
- Implement **data quality checks, monitoring, and validation** to ensure high data integrity.
- Collaborate with **data scientists, analysts, and business teams** to understand requirements and deliver solutions.
- Optimize **Spark performance** by tuning jobs and implementing best practices for distributed computing.
- Manage and process **structured and unstructured data** from multiple sources.
- Work with **cloud platforms** like AWS, Azure, or GCP for big data storage and processing.
- Troubleshoot and debug **performance issues** related to big data systems.

**Required Skills**:

- Strong experience with **PySpark and Spark (RDD, DataFrame, Spark SQL)**.
- Proficiency in **Hadoop ecosystem** (HDFS, Hive, HBase, Oozie, etc.).
- Experience with **Kafka, Airflow, or other data orchestration tools**.
- Strong **SQL** skills for querying and optimizing data processing.
- Experience with **cloud platforms** (AWS Glue, EMR, Azure Databricks, GCP BigQuery, etc.).
- Proficiency in **Python and Scala** for big data processing.
- Knowledge of **data lake and data warehouse concepts**.
- Experience in **CI/CD pipelines for data engineering** is a plus.
- Strong problem-solving skills and the ability to work in an **agile environment**.

Pay: ₹50,000.00 - ₹100,000.00 per month

Schedule:

- Day shift

**Experience**:

- Big data with PySpark: 6 years (required)

Work Location: Remote


  • Big Data Engineer

    1 week ago


    Remote, India beBee Careers Full time

    Senior Big Data Engineer - PySparkAbout the Role:We are looking for a seasoned Senior Big Data Engineer with expertise in PySpark to lead the development of our big data pipelines. You will be responsible for architecting and implementing scalable solutions to process large volumes of XML data.Responsibilities:Design and implement optimized PySpark pipelines...


  • Remote, India beBee Careers Full time

    PySpark Developer Lead - Big Data SolutionsAbout the Position:We are seeking an experienced PySpark Developer Lead to spearhead the development of our big data solutions. You will be responsible for leading a team of engineers to design and implement scalable PySpark pipelines to process large volumes of XML data.Key Responsibilities:Lead a team of engineers...


  • Remote, India beBee Careers Full time

    Pyspark Developer Job Summary:We're looking for an experienced Pyspark developer to join our team.The ideal candidate will have extensive knowledge of PySpark and Spark XML libraries, as well as experience with distributed computing frameworks and cloud platforms.As a Pyspark developer, you will be responsible for designing and developing scalable pipelines...

  • Big Data Trainer

    6 days ago


    Remote, India REGex Software Services Full time

    Required Big Data Trainer who is expert in following topics: Python Introduction to LINUX Operating System and Basic LINUX commands Hadoop(HDFS) Hadoop 2.0 & YARN Sqoop Hive Programming PySpark ETL **Job Types**: Part-time, Contractual / Temporary, Freelance Contract length: 6-8 weeks Part-time hours: 10-12 per week **Salary**: ₹500.00 -...

  • Big Data Trainer

    2 weeks ago


    Remote, India REGex Software Services Full time

    Required Big Data Trainer who is expert in following topics: Python Introduction to LINUX Operating System and Basic LINUX commands Hadoop(HDFS) Hadoop 2.0 & YARN Sqoop Hive Programming PySpark ETL **Job Types**: Part-time, Freelance, Contractual / Temporary Contract length: 6-8 weeks Part-time hours: 10-12 per week **Salary**: ₹500.00 -...

  • Data Engineer

    4 weeks ago


    Remote, India Avensys Consulting PVT LTD Full time

    Location : INDIA (Remote)Experience : 8+ yearsType : Contract (12 months, extendable)Shift : India ( 9 am to 6 pm IST)Key Responsibilities :- Design, develop, and maintain end-to-end data pipelines using Azure Databricks, Azure Data Factory, and Azure Synapse.- Implement big data processing frameworks using PySpark for scalable data transformation.- Manage...


  • Remote, India beBee Careers Full time

    Senior PySpark Developer Job Description:We are seeking an experienced Senior PySpark Developer to join our team.As a Senior PySpark Developer, you will be responsible for designing and developing scalable PySpark pipelines to ingest, parse, and process XML datasets with extreme hierarchical complexity.You will work closely with data architects and analysts...

  • Big Data Specialist

    1 week ago


    Remote, India beBee Careers Full time

    Key Responsibilities:Design, develop, and maintain end-to-end data pipelines using Azure Databricks, Azure Data Factory, and Azure Synapse.Implement big data processing frameworks using PySpark for scalable data transformation.Manage and optimize Azure Data Lake storage, ensuring efficient data ingestion, retrieval, and governance.Develop ETL/ELT solutions...


  • Remote, India beBee Careers Full time

    Senior PySpark DeveloperJob Summary:We are seeking a highly skilled Senior PySpark Developer to join our team. As a senior member of our big data engineering team, you will be responsible for designing and developing scalable PySpark pipelines to ingest, parse, and process XML datasets with extreme hierarchical complexity.Key Responsibilities:Design and...


  • Remote, India Victrix Systems & Labs Full time

    Key Responsibilities : - Design and develop scalable PySpark pipelines to ingest, parse, and process XML datasets with extreme hierarchical complexity. - Implement efficient XPath expressions, recursive parsing techniques, and custom schema definitions to extract data from nested XML structures. - Optimize Spark jobs through partitioning, caching, and...