Senior Data Science Specialist

3 days ago


Surat, Gujarat, India beBeeSpecialist Full time US$ 1,20,000 - US$ 1,50,000
About the Role

We are seeking a senior expert to lead the development of our Entity Resolution and Network Generation services. This is a hands-on technical leadership role where you'll architect and implement distributed graph computing solutions processing billions of entities and relationships.

The Platform

Our cloud-native platform leverages microservices architecture with Kubernetes orchestration, Apache Spark for distributed processing, Elasticsearch for real-time search and fuzzy matching, Scala as the primary development language, and data mesh principles with API-first design.

  • Entity resolution algorithms
  • Blocking strategies optimized for Spark at scale
  • Fuzzy matching algorithms leveraging Elasticsearch's capabilities
  • ML-enhanced matching with explainable AI for match decisions
  • Incremental resolution supporting real-time and batch modes
Core Responsibilities

Entity Resolution Service

  • Design and implement distributed entity resolution algorithms capable of processing billions of records
  • Build blocking strategies (e.g., LSH, canopy clustering) optimized for Spark at scale
  • Develop fuzzy matching algorithms leveraging Elasticsearch's capabilities
  • Create ML-enhanced matching with explainable AI for match decisions
  • Implement incremental resolution supporting real-time and batch modes
  • Design APIs for entity lookup with sub-100ms latency requirements
Network Generation Service
  • Architect distributed graph generation pipelines using GraphX/GraphFrames
  • Implement graph analytics algorithms (PageRank, community detection, centrality measures)
  • Design storage strategies for multi-billion edge graphs in Parquet/distributed file systems
  • Build temporal graph support for time-evolving networks
  • Create high-performance graph serving APIs with complex query capabilities
  • Optimize graph partitioning to minimize shuffle and maximize locality
AI Model Development

Graph Neural Networks (GNNs)

  • Develop GNN models using PyTorch Geometric or DGL to analyze corporate and transaction networks, detecting fraud rings and risk patterns

Entity Resolution

  • Design algorithms for fuzzy matching, semantic matching (Sentence-BERT), and clustering to unify entities across heterogeneous data sources

Risk Scoring Models

  • Combine rule-based, supervised (XGBoost), and unsupervised (Isolation Forest) methods to generate composite risk scores, optimized for real-time and large data processing

Explainable AI (XAI)

  • Champion transparency by integrating SHAP, LIME, and GNNExplainer to provide clear, interpretable explanations for model predictions
Required Skills and Qualifications
  • 7+ years of experience in distributed computing and big data systems
  • 5+ years specifically in entity resolution and graph analytics at scale
  • Expert-level Scala programming skills
  • Deep experience with Apache Spark, including custom optimizations
  • Production experience with Elasticsearch for search and matching
  • Proven track record building systems processing billions of entities/edges
Domain Knowledge
  • Strong understanding of blocking algorithms and their trade-offs
  • Experience with probabilistic record linkage and similarity measures
  • Expertise in graph algorithms and their distributed implementations
  • Knowledge of graph storage formats and query optimization
  • Understanding of ML applications in entity resolution
Systems Design
  • Experience designing microservices architectures
  • Track record of building fault-tolerant, scalable systems
  • API design experience with GraphQL or REST
  • Performance optimization and capacity planning expertise

  • Data Science Engineer

    4 weeks ago


    Surat, Gujarat, India SatLeo Labs Full time

    We are seeking a talented Data Science Engineer with strong expertise in satellite data analytics, thermal imaging, and coding to drive innovation in the geospatial domain. This role involves analyzing thermal and optical imagery from both satellites and drones to uncover impactful applications across agriculture, environment, and urban sectors.Key...


  • Surat, Gujarat, India beBeeBioimaging Full time ₹ 12,00,000 - ₹ 20,10,000

    Job TitleFull Remote:Sales Application Specialist in Life Sciences Field We are seeking an experienced sales application specialist to join our team. As a sales application specialist, you will be responsible for providing sales support for bio and pathology image analysis. You will work closely with customers, collaborators, and internal teams to understand...

  • MLOPS Data Science

    4 days ago


    Surat, Gujarat, India Infogain Full time

    Key Responsibilities:Develop, train, and validate predictive and analytical models using machine learning techniques.Collaborate with data engineers and business teams to define data requirements and success metrics.Deploy machine learning models into production using ML Ops best practices.Build automated pipelines for model training, testing, monitoring,...

  • Senior Data Analyst

    4 weeks ago


    Surat, Gujarat, India Neurones IT Asia Full time

    We are looking for a Senior Data Analyst for one of our clients. Your job scope is as follows:ResponsibilitiesLeading the data analytics team to collect, clean, and validate data from multiple sources.Managing the development and implementation of data analysis frameworks and methodologies.Overseeing the creation of complex reports and dashboards to present...


  • Surat, Gujarat, India beBeeDataEngineer Full time ₹ 1,20,00,000 - ₹ 2,10,00,000

    Senior Data Engineer - Marketing SpecialistJob Description:Extract attribution data, auction signals, and algorithmic behavior from marketing platforms using APIs and data extraction techniques.Build real-time monitoring datasets to detect anomalies, pacing issues, and creative decay.Develop granular spend and performance datasets with dayparting, marginal...


  • Surat, Gujarat, India beBeeData Full time ₹ 15,00,000 - ₹ 25,00,000

    Key Role: Data Insights SpecialistWe are looking for a highly skilled professional to take on the challenge of driving business growth through data-driven insights.The ideal candidate will have:Strong expertise in Python programmingFamiliarity with Generative AI and Rare Item Association Graph conceptsKnowledge of Large Language ModelsIn addition to these...

  • Data Science Expert

    1 hour ago


    Surat, Gujarat, India beBeeDataScientist Full time ₹ 1,00,00,000 - ₹ 2,00,00,000

    Data Scientist PositionAs a seasoned data scientist, you will be responsible for developing and implementing data pipelines using Python. Your work will involve creating reports using PowerBI and Microsoft Excel, as well as collaborating with the team on Confluence.This role offers the opportunity to work with cutting-edge technologies such as GenAI and GCP,...


  • Surat, Gujarat, India beBeeData Full time ₹ 80,00,000 - ₹ 1,50,00,000

    Senior Data ArchitectWe are seeking a highly skilled Senior Data Architect to lead our data architecture and pipeline initiatives.The ideal candidate will be a collaborative team player with excellent communication skills, passionate about making a meaningful impact through data-driven solutions.Key Responsibilities:Design and implement scalable and robust...


  • Surat, Gujarat, India beBeeCoding Full time ₹ 9,26,069 - ₹ 14,36,817

    Job Title: Senior Healthcare Coding SpecialistThis is a challenging opportunity for an experienced healthcare coder to join our team in a leadership role.We are seeking a highly skilled and certified coder with expertise in inpatient DRG coding, auditing, and mentoring. The successful candidate will have excellent communication skills, a strong work ethic,...


  • Surat, Gujarat, India beBeeIdentityAccess Full time ₹ 80,00,000 - ₹ 1,60,00,000

    Identity Access ProfessionalThe Identity Access Data Specialist is responsible for managing day-to-day operations for the identity access management function. This includes ensuring native user data is aggregated to Onecert, performing monitoring activities to ensure data quality, and meeting monthly metrics.Key Responsibilities:Manage user data refreshes to...