
Data Architect for Entity Resolution and Graph Analytics
1 week ago
We are seeking a technical leader to head up our Entity Resolution and Network Generation services.
The Platform
Our cloud-native platform leverages:
- Microservices architecture with Kubernetes orchestration
- Apache Spark for distributed processing
- Elasticsearch for real-time search and fuzzy matching
- Scala as the primary development language
- Data mesh principles with API-first design
Core Responsibilities
Entity Resolution Service- Design and implement distributed entity resolution algorithms capable of processing billions of records
- Build blocking strategies (e.g. LSH, canopy clustering) optimized for Spark at scale
- Develop fuzzy matching algorithms leveraging Elasticsearch's capabilities
- Create ML-enhanced matching with explainable AI for match decisions
- Implement incremental resolution supporting real-time and batch modes
- Design APIs for entity lookup with sub-100ms latency requirements
- Architect distributed graph generation pipelines using GraphX/GraphFrames
- Implement graph analytics algorithms (PageRank, community detection, centrality measures)
- Design storage strategies for multi-billion edge graphs in Parquet/distributed file systems
- Build temporal graph support for time-evolving networks
- Create high-performance graph serving APIs with complex query capabilities
- Optimize graph partitioning to minimize shuffle and maximize locality
- Build Graph Neural Networks (GNNs): Develop GNN models (e.g. GraphSAGE, GATv2) using PyTorch Geometric or DGL to analyze corporate and transaction networks, detecting fraud rings and risk patterns.
- Implement Entity Resolution: Design algorithms for fuzzy matching, semantic matching (Sentence-BERT), and clustering to unify entities across heterogeneous data sources (e.g. CSVs, APIs, PDFs).
- Create Risk Scoring Models: Combine rule-based, supervised (XGBoost), and unsupervised (Isolation Forest) methods to generate composite risk scores, optimized for real-time and large data processing in trillions.
- Advance Composite AI: Leverage ContexQ's proprietary approach, integrating symbolic AI, vector embeddings, and graph AI for robust entity resolution and network analytics.
- Champion Transparency: Integrate SHAP, LIME, and GNNExplainer to provide clear, interpretable explanations for model predictions, meeting regulatory and ethical standards.
- Ensure Fairness: Audit models for bias and fairness, embedding ethical principles into every stage of development.
- Ensure seamless integration between entity resolution and network generation
- Design data lineage tracking across both services
- Implement comprehensive monitoring and observability
- Contribute to API design and service contracts
- Optimize for 10x scale growth
Required Skills and Qualifications
Technical Expertise- 7+ years of experience in distributed computing and big data systems
- 5+ years specifically in entity resolution and graph analytics at scale
- Expert-level Scala programming skills
- Deep experience with Apache Spark, including custom optimizations
- Production experience with Elasticsearch for search and matching
- Proven track record building systems processing billions of entities/edges
- Strong understanding of blocking algorithms and their trade-offs
- Experience with probabilistic record linkage and similarity measures
- Expertise in graph algorithms and their distributed implementations
- Knowledge of graph storage formats and query optimization
- Understanding of ML applications in entity resolution
- BASIC experience of Banking compliances - FinCrime, Fraud
- Experience designing microservices architectures
- Track record of building fault-tolerant, scalable systems
- API design experience with GraphQL or REST
- Performance optimization and capacity planning expertise
Preferred Qualifications
PhD in Computer Science or related field with focus on graphs/entity resolution- Contributions to open-source projects (especially Spark, GraphX, Elasticsearch)
- Experience with graph databases (Neo4j, Neptune, JanusGraph) or equivalent
- Publications or conference talks on entity resolution or graph analytics
- Experience with real-time stream processing (Kafka, Spark Streaming)
- Knowledge of graph neural networks and embedding techniques
Technical Environment
- Languages: Scala (primary), Python, Java
- Big Data: Apache Spark 3.x, Hadoop ecosystem
- Search: Elasticsearch 8.x
- Orchestration: Kubernetes, Docker
- Storage: HDFS/S3/GCS, Parquet
- Monitoring: Prometheus, Grafana, Jaeger
- CI/CD: Modern DevOps practices
We're looking for someone who thinks in distributed systems and can optimize for both latency and throughput, a technical leader who can make architectural decisions and implement them, a strong communicator who can explain complex graph concepts to stakeholders, a self-directed engineer who can own large technical initiatives end-to-end, and a performance-obsessed developer who benchmarks everything.
Impact You'll Make
- Define the architecture for entity resolution serving multiple business domains
- Build the graph intelligence layer powering advanced analytics and ML
- Create systems that will process billions of entities with millisecond latencies
- Establish best practices for graph computing in our organization
- Mentor other engineers on distributed graph algorithms
-
Senior Data Architect
3 days ago
Palakkad, Kerala, India beBeeDataEngineering Full time ₹ 80,00,000 - ₹ 2,00,00,000Unlock Your Potential as a Data Engineer We're seeking an Associate to join our Data Intelligence Group team, perfect for those eager to learn and adapt in the AI-powered financial solutions landscape.You'll work on developing and deploying cutting-edge technology combining Large Language Models (LLMs), Agents, Knowledge Graph, and vector search in Cloud...
-
Enterprise Data Architect
3 days ago
Palakkad, Kerala, India beBeeDataArchitect Full time ₹ 1,20,00,000 - ₹ 2,02,00,000Job Summary:We are seeking a highly skilled Data Architect to join our organization. The ideal candidate will have deep technical expertise in data modeling, databases, cloud platforms, and integration frameworks.Main Responsibilities:Data Strategy & Design: Develop an end-to-end data architecture strategy that meets the needs of the business. This includes...
-
Senior Data Architect
3 days ago
Palakkad, Kerala, India beBeeDataEngineer Full time US$ 1,80,000 - US$ 2,30,000Unlock the Potential of Data EngineeringWe are seeking a highly skilled Senior Data Engineer to join our team. As a key member of our data engineering group, you will play a critical role in designing, building, and maintaining high-performance data pipelines that support advanced analytics and machine learning.Key Responsibilities:Lead the optimization and...
-
Chief Data Architect
2 days ago
Palakkad, Kerala, India beBeeData Full time ₹ 15,00,000 - ₹ 25,00,000Job OverviewWe are seeking a highly motivated and technically skilled professional with 3–4 years of experience in designing, building, and scaling intelligent systems.Main Responsibilities:Develop AI/LLM-enabled applications focusing on modularity, scalability, and maintainability.Create and optimize RAG pipelines for enterprise or product-based use...
-
Enterprise Data Architect
2 days ago
Palakkad, Kerala, India beBeeData Full time ₹ 40,00,000 - ₹ 50,00,000Snowflake ArchitectAs a seasoned Enterprise Data Architect, you will play a pivotal role in driving our clients' data strategy forward. This exciting opportunity will see you leading the design and implementation of scalable, secure, and high-performance data solutions that deliver tangible business outcomes.We are seeking an exceptional individual with...
-
Data Architect
12 hours ago
Palakkad, Kerala, India beBeeDataEngineer Full time ₹ 2,00,00,000 - ₹ 2,50,00,000Job DescriptionAs a data architect, you will design and develop large-scale data pipelines for ingestion, transformation, and storage. Your focus will be on creating efficient data architectures that enable business stakeholders to make informed decisions.
-
Chief Data Strategist
3 days ago
Palakkad, Kerala, India beBeeData Full time US$ 18,00,000 - US$ 24,00,000Data Architecture LeadWe are seeking an experienced Data Architecture Lead to join our team. The ideal candidate will have a strong background in designing and implementing end-to-end data pipelines and integration solutions for various structured and unstructured data sources and targets.The successful candidate will have experience in architecting,...
-
Chief Data Architect
7 hours ago
Palakkad, Kerala, India beBeeData Full time ₹ 15,00,000 - ₹ 20,00,000Unlock the full potential of data with a role that combines architecture, engineering, and analytics.Job DescriptionWe're seeking a visionary Architect to design and scale next-generation analytics platforms in Microsoft ecosystem. As a key member of our team, you'll define architecture, guide implementation, and enable data teams to deliver insights faster...
-
Chief Enterprise Data Architect
2 days ago
Palakkad, Kerala, India beBeeDataModel Full time ₹ 21,39,000 - ₹ 25,11,000Senior Data Model ArchitectThe ideal candidate will lead the creation, maintenance and communication of conceptual and logical enterprise data models and data flow diagrams. This role includes driving architecture capabilities for the Data Governance Organization.The Senior Data Model Architect will have a deep understanding of organizational master data,...
-
Senior Cloud Data Architect
10 hours ago
Palakkad, Kerala, India beBeeDataModeler Full time ₹ 20,00,000 - ₹ 30,00,000About this Position:We seek a skilled Data Architect to work on data engineering and cloud platforms.This role requires an individual who can work independently and collaborate effectively with cross-functional teams.Key Responsibilities:Design, develop, and optimize scalable data pipelines using cloud-native technologies.Architect and implement robust data...