Data Scientist

6 days ago


Gandhinagar, India MyRemoteTeam Inc Full time

About UsMyRemoteTeam, Inc is a fast-growing distributed workforce enabler, helping companies scale with top global talent. We empower businesses by providing world-class software engineers, operations support, and infrastructure to help them grow faster and better.Position: Senior Data Engineer (Python Coder)Location: India ( Remote )Work Commitment: 40 Hrs / Week (full-time)Contract Duration: 3 - 6 MonthsClient: Wipro ( Google ) BGV: YESRole: Senior Data Engineer (Python Coder)Exp: Min. 8 Years Role SummaryWe are looking for a seasoned Senior Data Engineer to architect, build, and own the data pipelines that power our large language model (LLM) development. As a senior Individual Contributor (IC), you will be the team's expert on data ingestion, processing, and quality for all AI training. Your primary mission is to build scalable, automated systems that transform massive, raw datasets into pristine, model-ready formats. While your focus will be on data engineering, your expertise will be valued in collaborating on model training runs and experiments. You're the perfect fit if you are a Python expert who thrives on solving large-scale data challenges and enjoys working at the intersection of data engineering and machine learning. Key ResponsibilitiesArchitect & Build: Design, develop, and own robust, scalable, and automated ETL/ELT pipelines in Python for ingesting and processing terabyte-scale text datasets.Data Quality: Implement rigorous data cleaning, deduplication, filtering, and normalization strategies. Define and enforce data quality standards to ensure the highest integrity for model training.Data Transformation: Efficiently structure and format diverse datasets (JSON, Parquet, etc.) for consumption by LLM training frameworks.Collaboration: Work closely with our team of AI researchers and ML engineers to understand data requirements, define metrics, and support the model training lifecycle.Optimization: Continuously optimize data processing workflows for speed, cost, and reliability.ML Support (Secondary): Occasionally assist in launching, monitoring, and debugging data-related issues during model training runs. Required Qualifications8+ years of professional experience in data engineering, data processing, or backend software engineering.Expert-level proficiency in Python and its data ecosystem (e.g., Pandas, NumPy, Dask, Polars).Proven experience building and maintaining large-scale data pipelines.Deep understanding of data structures, data modeling, and software engineering best practices (Git, CI/CD, testing).Experience handling and parsing diverse data formats (JSON, CSV, XML, Parquet) at scale.Excellent problem-solving skills and a meticulous attention to detail.Strong communication and collaboration skills, with experience working in a team environment. Preferred Qualifications (Nice-to-Haves)Hands-on experience with the data preprocessing pipeline for an LLM (e.g., LLaMA, BERT, GPT-family).Strong experience with big data frameworks like Apache Spark or Ray.Experience with Hugging Face libraries (Transformers, Datasets, Tokenizers).Familiarity with ML frameworks like PyTorch or TensorFlow.Proficiency with cloud platforms (AWS, GCP, Azure) and their data/storage services.


  • Data Scientist

    21 hours ago


    Gandhinagar, India Xtranet Technologies Private Limited Full time

    **Location: Gandhinagar, Gujrat** **Qualification: BE/Btech(CS, IT, Electronics and Telecommunication), MCA** **Total Exp: 5-8 years** - At least 1 year experience as ML/DS Model Developer Expertise - This role leads the technical aspects of the project, including data preparation, model design, training, evaluation, and...

  • Data Scientist

    21 hours ago


    Gandhinagar, India Etech Global Services Full time

    Remote **What We Offer**: - Internet Allowance - Health Insurance - Tuition Reimbursement - Work-Life Balance Initiatives - Rewards & Recognition - Internal movement through IJP **What You’ll Be Doing**: - Design, develop, and optimize ETL processes for large-scale data integration, transformation, and cleaning using SQL, Pyspark, and Databricks. -...


  • Gandhinagar, Gujarat, India CEREBULB Full time ₹ 6,00,000 - ₹ 18,00,000 per year

    Position Offered: Sr. Data ScientistLocation: Gandhinagar, GujaratAbout CereBulb:CereBulb transforms organizations by empowering people with data and streamliningprocesses with technology in their digital transformation journey. As an industry-leadingsolution integrator, we specialize inIndustrial Internet of Things (IIoT), Industry 4.0, andSmart...

  • Sr. Data Scientist(TL)

    11 hours ago


    Gandhinagar, Gujarat, India CEREBULB Full time ₹ 12,00,000 - ₹ 36,00,000 per year

    Company DescriptionCereBulb is a global industrial digital solutions company dedicated to transforming operations across mining, energy, utilities, and critical infrastructure. By integrating Operational Technology (OT), Information Technology (IT), and Engineering Technology (ET), CereBulb delivers impactful solutions for asset-intensive industries. With...


  • Gandhinagar, India CEREBULB Full time

    Job Description Company Description CereBulb is a global industrial digital solutions company dedicated to transforming operations across mining, energy, utilities, and critical infrastructure. By integrating Operational Technology (OT), Information Technology (IT), and Engineering Technology (ET), CereBulb delivers impactful solutions for asset-intensive...

  • Data Engineer

    2 weeks ago


    Gandhinagar, India LTIMindtree Full time

    Job Description We are seeking a skilled Data Engineer with hands-on experience inGoogle Cloud Platform (GCP) , specificallyBigQuery ,Dataflow ,Airflow , andPython . The ideal candidate will be responsible for developing scalable data pipelines, transforming and ingesting large-scale data, and ensuring data quality and security across workflows. Roles and...

  • Sr Data Engineer

    4 weeks ago


    Gandhinagar, India Mitchell Martin Inc. Full time

    Job Title: Senior Data EngineerLocation: Remote in India Employment Type: Full-timeAbout the RoleWe are looking for a Senior Data Engineer to design, build, and optimize data pipelines and systems that power our analytics, reporting, and data-driven decision-making. The ideal candidate will have hands-on experience with Python, Dagster (or equivalent...


  • Gandhinagar, India Whatjobs IN C2 Full time

    Job Title: Lead Data Engineer – Python & GCP Location : Hyderabad Exp : 10+ Years Overview: We are seeking an experienced Lead Data Engineer with strong expertise in Python and Google Cloud Platform (GCP) . You will design, develop, and manage scalable ETL/ELT data pipelines , work closely with clients to understand requirements, and build data solutions...

  • Data Science Intern

    4 weeks ago


    Gandhinagar, India Whatjobs IN C2 Full time

    NLP Data Science Intern Did you notice a shortage of food at supermarkets during covid? Have you heard about the recent issues in the global shipping industry? or perhaps you’ve heard about the shortages of microchips? These problems are called supply chain disruptions. They have been increasing in frequency and severity. Supply chain disruptions are...


  • Gandhinagar, India Hireginie Full time

    About Our Client:A leading IT and data solutions provider offering services in consulting, systems integration, data science, IoT, and business process outsourcing. The company enables organizations to enhance efficiency, solve complex challenges, and drive digital transformation.Job Description: Data Science & AI Lead Location: Bengaluru Experience: 7+...