Senior Data Engineer

16 hours ago


Kollam, Kerala, India beBeeDataEngineer Full time US$ 1,00,000 - US$ 1,50,000
About the Role

We're building a high-performance data intelligence platform using a scalable architecture. We need a senior expert to lead the development of our Entity Resolution and Network Generation services.

This is a technical leadership role where you'll design and implement distributed graph computing solutions for billions of entities and relationships.

Our cloud-native platform leverages:

  • Microservices architecture with orchestration
  • Apache Spark for distributed processing
  • Elasticsearch for real-time search and fuzzy matching
  • Scala as the primary development language

Data mesh principles with API-first design ensure seamless integration between entity resolution and network generation.

Job Responsibilities
  1. Entity Resolution Service
  • Design and implement distributed entity resolution algorithms capable of processing billions of records
  • Build blocking strategies (e.g. LSH, canopy clustering) optimized for Spark at scale
  • Develop fuzzy matching algorithms leveraging Elasticsearch's capabilities
  • Create ML-enhanced matching with explainable AI for match decisions
  • Implement incremental resolution supporting real-time and batch modes
  • Design APIs for entity lookup with sub-100ms latency requirements
Network Generation Service
  • Architect distributed graph generation pipelines using GraphX/GraphFrames
  • Implement graph analytics algorithms (PageRank, community detection, centrality measures)
  • Design storage strategies for multi-billion edge graphs in Parquet/distributed file systems
  • Build temporal graph support for time-evolving networks
  • Create high-performance graph serving APIs with complex query capabilities
  • Optimize graph partitioning to minimize shuffle and maximize locality
AI Model Development
  • Build Graph Neural Networks (GNNs): Develop GNN models (e.g., GraphSAGE, GATv2) using PyTorch Geometric or DGL to analyze corporate and transaction networks
  • Implement Entity Resolution: Design algorithms for fuzzy matching, semantic matching (Sentence-BERT), and clustering to unify entities across heterogeneous data sources
  • Create Risk Scoring Models: Combine rule-based, supervised (XGBoost), and unsupervised (Isolation Forest) methods to generate composite risk scores
  • Advance Composite AI: Leverage ContexQ's proprietary approach, integrating symbolic AI, vector embeddings, and graph AI for robust entity resolution and network analytics
Explainable AI (XAI)
  • Champion Transparency: Integrate SHAP, LIME, and GNNExplainer to provide clear, interpretable explanations for model predictions
  • Ensure Fairness: Audit models for bias and fairness, embedding ethical principles into every stage of development
Required Qualifications
  • Technical Expertise:
    • 7+ years of experience in distributed computing and big data systems
    • 5+ years specifically in entity resolution and graph analytics at scale
    • Expert-level Scala programming skills
    • Deep experience with Apache Spark, including custom optimizations
  • Domain Knowledge:
    • Strong understanding of blocking algorithms and their trade-offs
    • Experience with probabilistic record linkage and similarity measures
    • Expertise in graph algorithms and their distributed implementations
  • Systems Design:
    • Experience designing microservices architectures
    • Track record of building fault-tolerant, scalable systems
Benefits

Competitive compensation package with flexible remote work arrangements, latest hardware and cloud resources for development, LTIP - Long term Incentive plan, 75% of base as Bonus payment at the end of 4th year in service, Equity potential of up to USD 150K every year.

Interview Process

Technical screen focusing on distributed systems and graph algorithms, system design session on entity resolution at scale, coding session implementing a graph algorithm in Scala, architecture discussion with the team, final round with leadership.



  • Kollam, Kerala, India Quant-data Full time

    We're Hiring: Machine Learning Engineer / Data Engineer (Remote | Full-Time) Build AI-powered credit decisioning systems on Microsoft AzureWe're looking for a Machine Learning Engineer / Data Engineer with 5+ years of experience to join our AI-driven credit lending platform team. In this role, you'll design and deploy scalable ML solutions that power loan...


  • Kollam, Kerala, India beBeeDataEngineer Full time ₹ 12,00,000 - ₹ 20,10,000

    Job Title: Senior Data EngineerWe are seeking an experienced professional to join our organization in the role of Senior Data Engineer. In this position, you will be responsible for developing and implementing data engineering solutions using Python and Pyspark.Key Responsibilities:Design and implement scalable data architectures to support large-scale data...


  • Kollam, Kerala, India beBeeDataEngineering Full time ₹ 20,00,000 - ₹ 25,00,000

    Job Title: Senior Data Engineering Leader">The Technical Lead-Data Engineer is a senior-level position responsible for leading data engineering teams and projects. The role requires a deep understanding of data warehousing concepts, OLAP design, and enterprise-level data engineering principles.">Must-Have Skills and Qualifications:">">8+ years of experience...


  • Kollam, Kerala, India beBeeDataEngineer Full time ₹ 15,00,000 - ₹ 25,00,000

    Data Engineer Job DescriptionWe are seeking a seasoned and proficient Senior Data Engineer with substantial experience in cloud technologies.As a pivotal member of our data engineering team, you will play a crucial role in designing, implementing, and optimizing data pipelines, ensuring seamless integration with cloud platforms.Key Responsibilities:Design,...

  • Data engineer

    7 hours ago


    Kollam, Kerala, India Centrilogic Full time

    Data EngineerPurpose:Over 15 years, we have become a premier global provider of multi-cloud management, cloud-native application development solutions, and strategic end-to-end digital transformation services.Headquartered in Canada and with regional headquarters in the U. S. and the United Kingdom, Centrilogic delivers smart, streamlined solutions to...


  • Kollam, Kerala, India beBeeDataEngineer Full time ₹ 1,50,00,000 - ₹ 2,50,00,000

    Job Title: Senior Software Engineer/Technical SpecialistJob DescriptionWe are seeking a skilled software engineer to join our team. As a senior software engineer, you will be responsible for designing and implementing scalable data architectures using Azure Data Factory (ADF), Databricks, and Synapse Analytics.You will work with large datasets, developing...


  • Kollam, Kerala, India beBeeData Full time ₹ 18,00,000 - ₹ 25,00,000

    Job Opportunity:We are seeking a skilled Data Engineer to join our organization.The ideal candidate will have a strong background in data engineering, with expertise in building scalable data pipelines using PySpark and Apache Airflow.Proficiency in using Spark SQL, DataFrame, and RDD APIs to implement complex business logic is essential.A solid foundation...


  • Kollam, Kerala, India beBeeDatabricksspecialist Full time ₹ 1,50,00,000 - ₹ 2,50,00,000

    Job Overview:We are seeking a seasoned professional to serve as a Senior Databricks Data Engineer on our data engineering team. This is an exciting opportunity for the right candidate to utilize their skills in PySpark, CI/CD pipelines, and Terraform for infrastructure as code to drive success in our organization.Design, develop, and maintain scalable and...


  • Kollam, Kerala, India beBeeDataEngineer Full time ₹ 2,00,00,000 - ₹ 2,50,00,000

    Job OpportunityWe are seeking a highly skilled Senior Data Engineer to join our mission-critical banking project. This is an excellent opportunity to work on cutting-edge data pipelines, streaming platforms, and cloud-based infrastructures.Key Responsibilities:Designing and implementing ETL pipelines for large-scale data ingestion and transformation.Building...

  • Senior Data Engineer

    19 hours ago


    Kollam, Kerala, India beBeeData Full time ₹ 8,00,000 - ₹ 15,00,000

    Expert Data Pipeline Developer NeededWe are seeking an experienced and skilled data pipeline developer to design, build, and deploy robust ETL/ELT pipelines in Databricks. The ideal candidate will have a strong background in data engineering, Azure Databricks, and Azure Data Lake.Key Responsibilities:Pipeline Design and Development: Create complex data...