Synthetic Data Engineer for Scalable Enterprise Solutions
7 days ago
At Betterdata, we are seeking a highly skilled Synthetic Data Engineer to join our team. This is an exciting opportunity to work on scalable enterprise solutions and transform academic research into production-ready systems.
Company Overview:
Betterdata is a fast-paced startup dedicated to developing innovative synthetic data solutions for enterprises. We are passionate about empowering organizations to make data-driven decisions with confidence.
Estimated Salary: $140,000 - $180,000 per year
Job Description:
We are looking for a Senior Data & Machine Learning Engineer with hands-on experience in transforming academic research into scalable, production-ready solutions for synthetic tabular data generation. The ideal candidate has extensive experience scaling systems to handle datasets with hundreds of millions to billions of records and can build and optimize complex data pipelines for enterprise applications.
This role requires someone familiar with the dynamic nature of a startup, capable of rapidly designing and implementing scalable solutions. You'll work closely with research teams to optimize performance and ensure seamless integration of systems, handling data from financial institutions, government agencies, consumer brands, and internet companies.
Key Responsibilities:
- Design scalable data pipelines for batch processing, deciding between distributed computing tools like Spark, Dask, or Ray when handling extremely large datasets across multiple nodes, and single-node tools like Polars and DuckDB for more lightweight, efficient operations.
- Leverage Polars for high-speed, in-memory data manipulation when working with large datasets that can be processed efficiently in-memory on a single node.
- Utilize DuckDB for on-disk query execution, offering SQL-like operations with minimal overhead, suitable for environments that need a balance between memory use and query performance.
- Seamlessly transform Pandas-based research code into production-ready pipelines, ensuring efficient memory usage and fast data access without adding unnecessary complexity.
- Work with internal data representations such as Parquet, Arrow, and CSV to support the needs of our generative models, choosing the appropriate format based on data processing and performance needs.
- Ensure that the system can scale efficiently from a single node to multiple nodes, providing graceful scaling for users with varying compute capacities.
- Optimize SQL-based queries for performance and scalability in enterprise SQL environments, ensuring efficient querying across large datasets.
- Utilize GPU acceleration and parallel processing to improve performance in large-scale model training and data processing.
- Implement basic data lineage for auditability, ensuring traceability in data transformations when required.
- Manage metadata as needed to document pipelines and workflows.
- Design robust error handling mechanisms, with automatic retries and data recovery in case of pipeline failures.
- Track performance metrics such as data throughput, latency, and processing times to ensure efficient pipeline operations at scale.
- Create clear documentation of data pipelines, workflows, and system architectures to enable smooth handovers and collaboration across teams.
Required Skills and Qualifications:
- Hands-on experience scaling data pipelines and machine learning systems to handle hundreds of millions to billions of rows in enterprise environments.
- 4+ years of experience in building scalable data solutions with Python and distinct libraries such as Pandas, NumPy, Scikit-learn.
- Ability to choose the right framework (e.g., Dask, Ray, Polars, DuckDB) depending on the workload and environment, with a focus on balancing simplicity and scalability.
- Experience in data validation and ensuring data quality with tools like Pandera or Pydantic.
- Proficiency in building ETL/ELT pipelines and managing data across relational databases, data warehouses, and cloud storage.
- Strong knowledge of GPU parallelization for deep learning models using PyTorch.
Benefits:
- Competitive compensation package.
- Equity options.
- Opportunity to directly impact the future of synthetic data for enterprises.
Why Join Us:
This is a unique opportunity for someone looking to actively build and scale systems in a fast-moving startup. If you've successfully scaled machine learning and data systems to billions of rows and thrive in a dynamic, hands-on environment, this role is for you.
-
Anywhere in India/Multiple Locations/Bangladesh/Pakistan Betterdata Full timeWho Are We Looking For : We are seeking a Senior Data & Machine Learning Engineer with hands-on experience to transform academic research into scalable, production-ready solutions for synthetic tabular data generation. This is an individual contributor (IC) role suited for someone who thrives in a fast-paced, early-stage startup environment. The ideal...
-
Senior Enterprise Data Engineer
17 hours ago
Anywhere in India/Multiple Locations Kloud9 Full timeJob Title: Senior Enterprise Data Engineer">About Us: Kloud9 is a cutting-edge technology firm that empowers businesses to unlock their full potential through innovative data solutions. We are seeking an experienced Senior Enterprise Data Engineer to join our team and contribute to the design, development, and implementation of large-scale enterprise data...
-
Senior Data Engineering Expert
1 month ago
Anywhere in India/Multiple Locations Crescendo Global Full timeCrescendo Global is Hiring a Senior Data EngineerWe are seeking an experienced Senior Data Engineer to join our team at Crescendo Global, a leading technology firm in India. As a key member of our data engineering team, you will be responsible for designing and developing scalable data pipelines using Python, Pyspark, and AWS services.About the RoleThis is a...
-
Anywhere in India/Multiple Locations Paralleldots Full timeAbout the Role:As a Senior Data Engineer at ParallelDots, you will play a pivotal role in designing and developing cutting-edge data solutions. Collaborating closely with data scientists, analysts, and stakeholders, you will ensure our data infrastructure meets the highest standards of scalability, reliability, and performance.Key Responsibilities:Lead the...
-
Senior Cloud Data Engineer
3 weeks ago
Anywhere in India/Multiple Locations AppSierra Solutions Pvt Ltd Full timeAbout AppSierra SolutionsAppSierra Solutions Pvt Ltd is a cutting-edge technology firm that specializes in delivering innovative data engineering solutions. We are seeking an experienced AWS Data Engineer to join our team, utilizing their expertise in designing, building, and maintaining scalable data pipelines.
-
Engineering Director for Scalable Solutions
7 days ago
Bangalore/Anywhere in India/Multiple Locations Squareroot Consulting Pvt Ltd. Full timeWe are a B2B SAAS-based product startup company serving customers in North America, Europe, and India. We are expanding our engineering team in Bangalore, India.Position Overview:The Director Engineering will lead the engineering team and oversee the development of software products, focusing on scalability, flexibility, and robustness.Key...
-
Anywhere in India/Multiple Locations Kloud9 Full timeWe are seeking an experienced Cloud Data Architect to join Kloud9's team. This role will involve designing and implementing scalable, high-performance data lake solutions using Google Cloud Platform (GCP) technologies.About the RoleThis is a highly specialized position that requires deep expertise in GCP, Apache Spark, and SAP S/4HANA data ingestion. The...
-
Enterprise Data Solutions Architect
1 month ago
Bangalore/Anywhere in India/Multiple Locations/Chennai Mazenet solution Full timeAbout Mazenet Solutions:Mazenet Solutions is a workforce development leader with 23+ years of experience delivering impactful Staffing & Corporate Training solutions. Our strategic approach combines innovative thinking, cutting-edge technologies, and expert capabilities to cultivate a future-ready workforce.We empower over 300 clients, including Fortune 500...
-
Enterprise Documentum Architect
3 weeks ago
Anywhere in India/Multiple Locations iXceed Solutions Full timeJob Description">As an Enterprise Documentum Architect at iXceed Solutions, you will play a critical role in leading the maintenance and enhancement of all current and future components owned by the POD to extend the Documentum platform within the bank. With a focus on delivering high-quality solutions, you will work closely with cross-functional teams to...
-
Senior Java Developer
3 weeks ago
Anywhere in India/Multiple Locations Appzlogic Mobility solutions Pvt. Ltd. ,Noida Full time**Company Overview**About Appzlogic Mobility Solutions, a leading enterprise mobile app development company. We deliver highly efficient, secured and scalable enterprise apps to global audiences.
-
Senior Web Developer
7 days ago
Anywhere in India/Multiple Locations MNR Solutions Full timeCompany OverviewMNR Solutions is a dynamic and innovative company that delivers cutting-edge solutions to clients worldwide.Salary: $120,000 - $150,000 per annumJob DescriptionWe are seeking a highly skilled PHP/Laravel developer to join our team of experts in building scalable web applications.The successful candidate will work closely with cross-functional...
-
Enterprise Solutions Architect
7 days ago
Anywhere in India/Multiple Locations MNR Solutions Full time**About MNR Solutions**We are a dynamic organization seeking an experienced Enterprise Solutions Architect to join our team.Salary: $120,000 - $180,000 per annum, depending on experience and qualifications.Job Description:The successful candidate will have 6 to 8 years of experience in SharePoint Online and Power Platform.In this role, you will be...
-
AppSierra Solutions
1 month ago
Anywhere in India/Multiple Locations AppSierra Solutions Pvt Ltd Full timeDescription :We are seeking an experienced AWS Data Engineer proficient in AWS technologies, PySpark, Glue, S3, and Terraform to join our innovative team. As an integral part of our data engineering group, you will be responsible for designing, building, and maintaining scalable data pipelines that facilitate seamless data extraction, transformation, and...
-
Cloud Expert for Scalable Infrastructure
3 weeks ago
Anywhere in India/Multiple Locations Pace Wisdom Solutions Ltd Full timeAbout the RoleWe are seeking an experienced Cloud Expert to lead our team in designing and delivering scalable infrastructure solutions using AWS. This role requires a strong background in developing and deploying enterprise software applications using public cloud providers like AWS.
-
Data Engineering Lead
4 weeks ago
Anywhere in India/Multiple Locations RAPINNO TECH SOLUTIONS PRIVATE LIMITED Full timeUnlock Your Potential as a Data Engineering Lead at RAPINNO TECH SOLUTIONS PRIVATE LIMITEDRAPINNO TECH SOLUTIONS PRIVATE LIMITED is seeking an exceptional Data Engineering Lead to spearhead the development of our data-driven initiatives. As a key member of our team, you will be responsible for designing and implementing robust data architectures that drive...
-
Anywhere in India/Multiple Locations Kloud9 Full timeWe are seeking a highly skilled Data Engineering Technical Architect with deep expertise in Google Cloud Platform (GCP), Apache Spark, and SAP S/4HANA (Data ingestion) to architect, design and implement scalable, high-performance data lake solutions. The ideal candidate will have extensive experience in building data ingestion pipelines, managing big data...
-
Artificial Intelligence Engineer
3 days ago
India Data-Hat AI Full timeWe are seeking a highly skilled Artificial Intelligence Engineer to join our team at Data-Hat AI. As an AI Engineer, you will play a crucial role in designing and developing scalable AI solutions that drive business growth and improve decision-making.About the Role:This is a fantastic opportunity to work with cutting-edge technology and contribute to the...
-
Data Engineer
3 weeks ago
Anywhere in India/Multiple Locations Risk Resources LLP Full timeJoin Risk Resources LLP as a highly skilled Data Engineer specializing in designing, developing, and maintaining scalable data solutions on cloud platforms.About the RoleWe are seeking a seasoned professional with expertise in Snowflake, Azure, and AWS technologies to lead our data engineering efforts. As a key member of our team, you will be responsible for...
-
Cloud Data Solutions Developer
3 weeks ago
Anywhere in India/Multiple Locations IT Source Global Full timeWe are seeking a talented Cloud Data Solutions Developer to join our team at IT Source Global.Job Description: Design, develop, and maintain scalable data processing solutions using Azure Databricks and Azure Data Factory. This includes building and optimizing end-to-end data pipelines for batch and real-time data ingestion, transformation, and...
-
Data Science Innovator for iXceed Solutions
1 month ago
Anywhere in India/Multiple Locations iXceed Solutions Full timeCompany Overview: iXceed Solutions is a pioneering company at the forefront of data science and machine learning. We strive to deliver cutting-edge solutions that drive business growth and innovation.Salary: $150,000 - $200,000 per annum, depending on experience.Job Description:We are seeking a highly skilled Data Science Specialist to lead our team in...