PySpark Developer

1 week ago


India Victrix Inc. Full time
Job Description

Senior PySpark Developer - Complex XML Data Processing

Key Responsibilities

- Design and develop scalable PySpark pipelines to ingest, parse, and process XML datasets with extreme hierarchical complexity.
- Implement efficient XPath expressions, recursive parsing techniques, and custom schema definitions to extract data from nested XML structures.
- Optimize Spark jobs through partitioning, caching, and parallel processing to handle terabytes of XML data efficiently.
- Transform raw hierarchical XML data into structured Data Frames for analytics, machine learning, and reporting use cases.
- Collaborate with data architects and analysts to define data models for nested XML schemas.
- Troubleshoot performance bottlenecks and ensure reliability in distributed environments (e.g., AWS, Databricks, Hadoop).
- Document parsing logic, data lineage, and optimization strategies for maintainability.

Qualifications

- 5+ years of hands-on experience with PySpark and Spark XML libraries (e.g., `spark-xml`) in production environments.
- Proven track record of parsing XML data with 20+ levels of nesting using recursive methods and schema inference.
- Expertise in XPath, XQuery, and DataFrame transformations (e.g., `explode`, `struct`, `selectExpr`) for hierarchical data.
- Strong understanding of Spark optimization techniques: partitioning strategies, broadcast variables, and memory management.
- Experience with distributed computing frameworks (e.g., Hadoop, YARN) and cloud platforms (AWS, Azure, GCP).
- Familiarity with big data file formats (Parquet, Avro) and orchestration tools (Airflow, Luigi).
- Bachelors degree in Computer Science, Data Engineering, or a related field.

Preferred Skills

- Experience with schema evolution and versioning for nested XML/JSON datasets.
- Knowledge of Scala or Java for extending Spark XML libraries.
- Exposure to Databricks, Delta Lake, or similar platforms.
- Certifications in AWS/Azure big data technologies.

Skills: python,gcp,big data,dataframe transformations,hadoop,xpath,spark xml libraries,broadcast variables,aws,pyspark,apache spark,xml,partitioning strategies,azure,big data file formats (parquet, avro),schema,spark,orchestration tools (airflow, luigi),xquery,parsing,memory management
  • PySpark Developer

    4 days ago


    India Risk Resources LLP Full time

    Job Description : Must-Have : - Mandatory skill : Pyspark. - 3-6 years of experience in the design and implementation of Big Data pipelines using PySpark, database migration, transformation, and integration solutions for any Data warehousing project. - Must have excellent knowledge in Apache Spark and Python programming experience. - Experience in developing...


  • India AJ Consulting Full time

    Position Name : Lead PySpark DeveloperExperience : 10 - 16 yrs Location : Pan IndiaBudget : 46 LPA max including variablesNotice Period : Immediate to Serving NP (30 Days max remaining)Job Description :Primary Skill : Lead PySpark Skill : SAS tools - Lead a team of data engineers in the design and implementation of big data solutions using PySpark. -...


  • Anywhere in India/Multiple Locations AJ Consulting Full time

    Position Name : Lead PySpark Developer Experience : 10 - 16 yrs Location : Pan India Budget : 46 LPA max including variables Notice Period : Immediate to Serving NP (30 Days max remaining) Job Description : Primary Skill : Lead PySpark Skill : SAS tools - Lead a team of data engineers in the design and implementation of big data solutions using PySpark....

  • PySpark Developer

    4 days ago


    India DATA ECONOMY PRIVATE LIMITED Full time

    Job OverviewDATA ECONOMY PRIVATE LIMITED seeks a skilled PySpark Developer - Data Engineer to lead the development of advanced models using AWS services such as EMR, Glue, and Glue Notebooks. As a key member of our team, you will design, build, and optimize scalable cloud infrastructure solutions with a minimum of 5 years of experience.About the Job- Develop...

  • PySpark Developer

    3 hours ago


    India Victrix Systems & Labs Full time

    Job Overview:We are looking for an experienced PySpark Developer to join our team. As a key member of our data engineering department, you will be responsible for designing and developing high-performance data pipelines to process complex XML datasets.Main Responsibilities:• Develop and maintain efficient data pipelines using PySpark and Spark XML...


  • Anywhere in India/Multiple Locations MWIDM Staffing Services Full time

    About Our CompanyMWIDM Staffing Services is a leading provider of staffing solutions, specializing in placing top talent in data engineering roles.Job Overview:We are seeking an experienced Senior PySpark Developer to join our team. The ideal candidate will have a strong background in designing, developing, and maintaining efficient and reliable data...

  • PySpark Expert

    4 days ago


    India Risk Resources LLP Full time

    About Us:Risk Resources LLP is a leading provider of risk management services. We are committed to delivering high-quality solutions that meet the evolving needs of our clients.Job Overview:We are seeking a talented PySpark Expert to join our team. The successful candidate will be responsible for designing and implementing Big Data pipelines using PySpark,...


  • India Victrix Systems & Labs Full time

    Key Responsibilities : - Design and develop scalable PySpark pipelines to ingest, parse, and process XML datasets with extreme hierarchical complexity. - Implement efficient XPath expressions, recursive parsing techniques, and custom schema definitions to extract data from nested XML structures. - Optimize Spark jobs through partitioning, caching, and...


  • India Victrix Systems & Labs Full time

    About the Role:Victrix Systems & Labs is looking for a seasoned PySpark Developer to lead the development of our XML pipeline architecture. In this role, you will be responsible for implementing efficient data processing pipelines that can handle terabytes of XML data efficiently. Your expertise in Spark XML libraries and XPath expressions will be crucial in...

  • Data Engineer

    4 weeks ago


    india Synechron Full time

    Greetings,We have an urgent opening for aData Engineer specializing in PySparkat Synechron in Chennai. We are looking for candidates with more than 5+ years of relevant experience.Position: Data Engineer-PysparkLocation:ChennaiAbout Company:At Synechron, we believe in the power of digital to transform businesses for the better. Our global consulting firm...


  • Hyderabad, India Chase Bank Full time

    Job Description You're ready to gain the skills and experience needed to grow within your role and advance your career - and we have the perfect software engineering opportunity for you. As a PySpark Developer Software Engineer II at JPMorgan Chase within the Commercial & Investment Bank Payments Technology team, you are part of an agile team that works to...

  • - PySpark Developer

    4 days ago


    India DATA ECONOMY PRIVATE LIMITED Full time

    Key Responsibilities : - Model Development : Lead the development of advanced models using AWS services such as EMR, Glue, and Glue Notebooks. - Cloud Infrastructure : Design, build, and optimize scalable cloud infrastructure solutions with a minimum of 5 years of experience. - ETL Pipeline Development : Create, manage, and optimize ETL pipelines using...


  • India 9NEXUS Full time

    We are seeking an AWS PySpark Data Engineer to design, develop, and optimize data pipelines in a cloud-based environment. The ideal candidate should have expertise in PySpark, AWS services, and big data processing.Key Responsibilities:Develop and optimize ETL/ELT data pipelines using PySpark on AWS.Work with AWS services like Glue, Lambda, S3,...


  • India CIEL HR Full time

    Exp - 5 yearsSkills - ETL development AWS Glue Python PySpark SQL PostgresqlLocation - Mumbai Pune Chennai BangaloreNotice - 15 days to Immediate1 Experience in analysis design and ETL development2 Strong hands-on experience on AWS Glue Python PySpark SQL Postgresql3 Good knowledge in creating ETL workflows using Python and PySpark with...


  • India Deltaclass Technology Solutions Pvt. Ltd. Full time

    Required Skills :Job Summary : Looking for an offshore Lead Databricks/PySpark Developer who is willing to learn new technologies if needed and able to work with team. This position is long term and will likely be renewed annually.Essential Job Functions- Design and development of data ingestion pipelines (Databricks background preferred).- Performance tune...

  • Data Engineer

    11 hours ago


    India Victrix Systems & Labs Full time

    Job Description:At Victrix Systems & Labs, we are seeking a highly skilled Data Engineer to join our team. As a key member of our data engineering department, you will be responsible for designing and developing scalable PySpark pipelines to ingest, parse, and process complex XML datasets with extreme hierarchical complexity.Main Responsibilities:• Design...


  • Anywhere in India/Multiple Locations Deltaclass Technology Solutions Pvt. Ltd. Full time

    Required Skills :Job Summary : Looking for an offshore Lead Databricks/PySpark Developer who is willing to learn new technologies if needed and able to work with team. This position is long term and will likely be renewed annually.Essential Job Functions- Design and development of data ingestion pipelines (Databricks background preferred).- Performance tune...


  • India Aexonic Technologies Private Limited Full time

    We are seeking a highly skilled Senior Big Data Engineer with extensive experience in Big Data technologies, including Hadoop, Hive, PySpark, and expertise in Google Cloud Platform (GCP) and BigQuery. The successful candidate will be responsible for designing, building, and maintaining robust data pipelines that support large-scale data processing and...

  • Python Developer

    3 days ago


    India Data Unveil Full time

    Job Description:Position Title: Python DeveloperExperience: 2 Years to 3 YearsLocation: Hyderabad, TelanganaHire Type: Full Time, On-siteStart Date: ImmediateJob Summary:We are seeking a Python Developer with 2-3 years of experience, skilled in Python, PySpark, Oracle, MySQL, and basic AWS services. The candidate will work on data processing, ETL pipelines,...

  • Python Developer

    1 week ago


    India Truelancer Full time

    Job DescriptionJob Title: Python DeveloperYears of Experience: 5+ YrsPrimary Skills: Python, AWS Development, PySpark, Lambdas, Cloud Watch (Alerts), SNS, SQS, Cloud formation.Roles & Requirements:- More than 5 years of experience in Python.- More than 3 years of hands-on experience with AWS Development, PySpark, Lambdas, Cloud Watch (Alerts), SNS, SQS, and...