PySpark Developer
1 week ago
Senior PySpark Developer - Complex XML Data Processing
Key Responsibilities
- Design and develop scalable PySpark pipelines to ingest, parse, and process XML datasets with extreme hierarchical complexity.
- Implement efficient XPath expressions, recursive parsing techniques, and custom schema definitions to extract data from nested XML structures.
- Optimize Spark jobs through partitioning, caching, and parallel processing to handle terabytes of XML data efficiently.
- Transform raw hierarchical XML data into structured Data Frames for analytics, machine learning, and reporting use cases.
- Collaborate with data architects and analysts to define data models for nested XML schemas.
- Troubleshoot performance bottlenecks and ensure reliability in distributed environments (e.g., AWS, Databricks, Hadoop).
- Document parsing logic, data lineage, and optimization strategies for maintainability.
Qualifications
- 5+ years of hands-on experience with PySpark and Spark XML libraries (e.g., `spark-xml`) in production environments.
- Proven track record of parsing XML data with 20+ levels of nesting using recursive methods and schema inference.
- Expertise in XPath, XQuery, and DataFrame transformations (e.g., `explode`, `struct`, `selectExpr`) for hierarchical data.
- Strong understanding of Spark optimization techniques: partitioning strategies, broadcast variables, and memory management.
- Experience with distributed computing frameworks (e.g., Hadoop, YARN) and cloud platforms (AWS, Azure, GCP).
- Familiarity with big data file formats (Parquet, Avro) and orchestration tools (Airflow, Luigi).
- Bachelors degree in Computer Science, Data Engineering, or a related field.
Preferred Skills
- Experience with schema evolution and versioning for nested XML/JSON datasets.
- Knowledge of Scala or Java for extending Spark XML libraries.
- Exposure to Databricks, Delta Lake, or similar platforms.
- Certifications in AWS/Azure big data technologies.
Skills: python,gcp,big data,dataframe transformations,hadoop,xpath,spark xml libraries,broadcast variables,aws,pyspark,apache spark,xml,partitioning strategies,azure,big data file formats (parquet, avro),schema,spark,orchestration tools (airflow, luigi),xquery,parsing,memory management
-
PySpark Developer
4 days ago
India Risk Resources LLP Full timeJob Description : Must-Have : - Mandatory skill : Pyspark. - 3-6 years of experience in the design and implementation of Big Data pipelines using PySpark, database migration, transformation, and integration solutions for any Data warehousing project. - Must have excellent knowledge in Apache Spark and Python programming experience. - Experience in developing...
-
Lead PySpark Developer
4 weeks ago
India AJ Consulting Full timePosition Name : Lead PySpark DeveloperExperience : 10 - 16 yrs Location : Pan IndiaBudget : 46 LPA max including variablesNotice Period : Immediate to Serving NP (30 Days max remaining)Job Description :Primary Skill : Lead PySpark Skill : SAS tools - Lead a team of data engineers in the design and implementation of big data solutions using PySpark. -...
-
Lead PySpark Developer
1 week ago
Anywhere in India/Multiple Locations AJ Consulting Full timePosition Name : Lead PySpark Developer Experience : 10 - 16 yrs Location : Pan India Budget : 46 LPA max including variables Notice Period : Immediate to Serving NP (30 Days max remaining) Job Description : Primary Skill : Lead PySpark Skill : SAS tools - Lead a team of data engineers in the design and implementation of big data solutions using PySpark....
-
PySpark Developer
4 days ago
India DATA ECONOMY PRIVATE LIMITED Full timeJob OverviewDATA ECONOMY PRIVATE LIMITED seeks a skilled PySpark Developer - Data Engineer to lead the development of advanced models using AWS services such as EMR, Glue, and Glue Notebooks. As a key member of our team, you will design, build, and optimize scalable cloud infrastructure solutions with a minimum of 5 years of experience.About the Job- Develop...
-
PySpark Developer
3 hours ago
India Victrix Systems & Labs Full timeJob Overview:We are looking for an experienced PySpark Developer to join our team. As a key member of our data engineering department, you will be responsible for designing and developing high-performance data pipelines to process complex XML datasets.Main Responsibilities:• Develop and maintain efficient data pipelines using PySpark and Spark XML...
-
Senior PySpark Developer
7 days ago
Anywhere in India/Multiple Locations MWIDM Staffing Services Full timeAbout Our CompanyMWIDM Staffing Services is a leading provider of staffing solutions, specializing in placing top talent in data engineering roles.Job Overview:We are seeking an experienced Senior PySpark Developer to join our team. The ideal candidate will have a strong background in designing, developing, and maintaining efficient and reliable data...
-
PySpark Expert
4 days ago
India Risk Resources LLP Full timeAbout Us:Risk Resources LLP is a leading provider of risk management services. We are committed to delivering high-quality solutions that meet the evolving needs of our clients.Job Overview:We are seeking a talented PySpark Expert to join our team. The successful candidate will be responsible for designing and implementing Big Data pipelines using PySpark,...
-
Senior PySpark Developer
4 days ago
India Victrix Systems & Labs Full timeKey Responsibilities : - Design and develop scalable PySpark pipelines to ingest, parse, and process XML datasets with extreme hierarchical complexity. - Implement efficient XPath expressions, recursive parsing techniques, and custom schema definitions to extract data from nested XML structures. - Optimize Spark jobs through partitioning, caching, and...
-
PySpark XML Pipeline Developer
4 days ago
India Victrix Systems & Labs Full timeAbout the Role:Victrix Systems & Labs is looking for a seasoned PySpark Developer to lead the development of our XML pipeline architecture. In this role, you will be responsible for implementing efficient data processing pipelines that can handle terabytes of XML data efficiently. Your expertise in Spark XML libraries and XPath expressions will be crucial in...
-
Data Engineer
4 weeks ago
india Synechron Full timeGreetings,We have an urgent opening for aData Engineer specializing in PySparkat Synechron in Chennai. We are looking for candidates with more than 5+ years of relevant experience.Position: Data Engineer-PysparkLocation:ChennaiAbout Company:At Synechron, we believe in the power of digital to transform businesses for the better. Our global consulting firm...
-
Software Engineer II-PySpark Developer
4 weeks ago
Hyderabad, India Chase Bank Full timeJob Description You're ready to gain the skills and experience needed to grow within your role and advance your career - and we have the perfect software engineering opportunity for you. As a PySpark Developer Software Engineer II at JPMorgan Chase within the Commercial & Investment Bank Payments Technology team, you are part of an agile team that works to...
-
- PySpark Developer
4 days ago
India DATA ECONOMY PRIVATE LIMITED Full timeKey Responsibilities : - Model Development : Lead the development of advanced models using AWS services such as EMR, Glue, and Glue Notebooks. - Cloud Infrastructure : Design, build, and optimize scalable cloud infrastructure solutions with a minimum of 5 years of experience. - ETL Pipeline Development : Create, manage, and optimize ETL pipelines using...
-
AWS PySpark Data Engineer
2 weeks ago
India 9NEXUS Full timeWe are seeking an AWS PySpark Data Engineer to design, develop, and optimize data pipelines in a cloud-based environment. The ideal candidate should have expertise in PySpark, AWS services, and big data processing.Key Responsibilities:Develop and optimize ETL/ELT data pipelines using PySpark on AWS.Work with AWS services like Glue, Lambda, S3,...
-
Aws Glue – Python, Pyspark, Aws
4 weeks ago
India CIEL HR Full timeExp - 5 yearsSkills - ETL development AWS Glue Python PySpark SQL PostgresqlLocation - Mumbai Pune Chennai BangaloreNotice - 15 days to Immediate1 Experience in analysis design and ETL development2 Strong hands-on experience on AWS Glue Python PySpark SQL Postgresql3 Good knowledge in creating ETL workflows using Python and PySpark with...
-
Lead Databricks/PySpark Developer
4 weeks ago
India Deltaclass Technology Solutions Pvt. Ltd. Full timeRequired Skills :Job Summary : Looking for an offshore Lead Databricks/PySpark Developer who is willing to learn new technologies if needed and able to work with team. This position is long term and will likely be renewed annually.Essential Job Functions- Design and development of data ingestion pipelines (Databricks background preferred).- Performance tune...
-
Data Engineer
11 hours ago
India Victrix Systems & Labs Full timeJob Description:At Victrix Systems & Labs, we are seeking a highly skilled Data Engineer to join our team. As a key member of our data engineering department, you will be responsible for designing and developing scalable PySpark pipelines to ingest, parse, and process complex XML datasets with extreme hierarchical complexity.Main Responsibilities:• Design...
-
Lead Databricks/PySpark Developer
2 weeks ago
Anywhere in India/Multiple Locations Deltaclass Technology Solutions Pvt. Ltd. Full timeRequired Skills :Job Summary : Looking for an offshore Lead Databricks/PySpark Developer who is willing to learn new technologies if needed and able to work with team. This position is long term and will likely be renewed annually.Essential Job Functions- Design and development of data ingestion pipelines (Databricks background preferred).- Performance tune...
-
Aexonic - Big Data Engineer - Hadoop/PySpark
5 hours ago
India Aexonic Technologies Private Limited Full timeWe are seeking a highly skilled Senior Big Data Engineer with extensive experience in Big Data technologies, including Hadoop, Hive, PySpark, and expertise in Google Cloud Platform (GCP) and BigQuery. The successful candidate will be responsible for designing, building, and maintaining robust data pipelines that support large-scale data processing and...
-
Python Developer
3 days ago
India Data Unveil Full timeJob Description:Position Title: Python DeveloperExperience: 2 Years to 3 YearsLocation: Hyderabad, TelanganaHire Type: Full Time, On-siteStart Date: ImmediateJob Summary:We are seeking a Python Developer with 2-3 years of experience, skilled in Python, PySpark, Oracle, MySQL, and basic AWS services. The candidate will work on data processing, ETL pipelines,...
-
Python Developer
1 week ago
India Truelancer Full timeJob DescriptionJob Title: Python DeveloperYears of Experience: 5+ YrsPrimary Skills: Python, AWS Development, PySpark, Lambdas, Cloud Watch (Alerts), SNS, SQS, Cloud formation.Roles & Requirements:- More than 5 years of experience in Python.- More than 3 years of hands-on experience with AWS Development, PySpark, Lambdas, Cloud Watch (Alerts), SNS, SQS, and...