
Data Engineering Architect
2 days ago
We're seeking a senior expert in entity resolution and network generation to join our team. As a key member of our data intelligence platform, you'll be responsible for architecting and implementing distributed graph computing solutions that process billions of entities and relationships.
The ideal candidate will have 7+ years of experience in distributed computing and big data systems, with a strong focus on entity resolution and graph analytics at scale. They will also possess expert-level Scala programming skills and deep experience with Apache Spark, including custom optimizations.
About the Role
- Design and implement distributed entity resolution algorithms capable of processing billions of records.
- Build blocking strategies (e.g. LSH, canopy clustering) optimized for Spark at scale.
- Develop fuzzy matching algorithms leveraging Elasticsearch's capabilities.
- Create ML-enhanced matching with explainable AI for match decisions.
- Implement incremental resolution supporting real-time and batch modes.
- Design APIs for entity lookup with sub-100ms latency requirements.
Network Generation Service
- Architect distributed graph generation pipelines using GraphX/GraphFrames.
- Implement graph analytics algorithms (PageRank, community detection, centrality measures).
- Design storage strategies for multi-billion edge graphs in Parquet/distributed file systems.
- Build temporal graph support for time-evolving networks.
- Create high-performance graph serving APIs with complex query capabilities.
- Optimize graph partitioning to minimize shuffle and maximize locality.
Ai Model Development
- Build Graph Neural Networks (GNNs): Develop GNN models (e.g., GraphSAGE, GATv2) using PyTorch Geometric or DGL to analyze corporate and transaction networks, detecting fraud rings and risk patterns.
- Implement Entity Resolution: Design algorithms for fuzzy matching, semantic matching (Sentence-BERT), and clustering to unify entities across heterogeneous data sources (e.g., CSVs, APIs, PDFs).
- Create Risk Scoring Models: Combine rule-based, supervised (XGBoost), and unsupervised (Isolation Forest) methods to generate composite risk scores, optimized for real-time and large data processing in trillions.
- Advance Composite Ai: Leverage ContexQ's proprietary approach, integrating symbolic Ai, vector embeddings, and graph Ai for robust entity resolution and network analytics.
Explainable Ai (Xai)
- Champion Transparency: Integrate SHAP, LIME, and GNNExplainer to provide clear, interpretable explanations for model predictions, meeting regulatory and ethical standards.
- Ensure Fairness: Audit models for bias and fairness, embedding ethical principles into every stage of development.
Cross-Service Responsibilities
- Ensure seamless integration between entity resolution and network generation.
- Design data lineage tracking across both services.
- Implement comprehensive monitoring and observability.
- Contribute to Api design and service contracts.
- Optimize for 10x scale growth.
Required Skills and Qualifications
Technical Expertise
- 7+ years of experience in distributed computing and big data systems.
- 5+ years specifically in entity resolution and graph analytics at scale.
- Expert-level Scala programming skills.
- Deep experience with Apache Spark, including custom optimizations.
- Production experience with Elasticsearch for search and matching.
- Proven track record building systems processing billions of entities/edges.
Domain Knowledge
- Strong understanding of blocking algorithms and their trade-offs.
- Experience with probabilistic record linkage and similarity measures.
- Expertise in graph algorithms and their distributed implementations.
- Knowledge of graph storage formats and query optimization.
- Understanding of Ml applications in entity resolution.
- BASIC EXPERIENCE OF BANKING COMPLIANCES - Fincrime, Fraud
Systems Design
- Experience designing microservices architectures.
- Track record of building fault-tolerant, scalable systems.
- API design experience with GraphQL or REST.
- Performance optimization and capacity planning expertise.
Preferred Qualifications
- PhD in Computer Science or related field with focus on graphs/entity resolution.
- Contributions to open-source projects (especially Spark, GraphX, Elasticsearch).
- Experience with graph databases (Neo4j, Neptune, JanusGraph) or equivalent.
- Publications or conference talks on entity resolution or graph analytics.
- Experience with real-time stream processing (Kafka, Spark Streaming).
- Knowledge of graph neural networks and embedding techniques.
Technical Environment
Languages: Scala (primary), Python, Java
Big Data: Apache Spark 3.x, Hadoop ecosystem
Search: Elasticsearch 8.x
Orchestration: Kubernetes, Docker
Storage: HDFS/S3/GCS, Parquet
Monitoring: Prometheus, Grafana, Jaeger
CI/CD: Modern DevOps practices
What We're Looking For
- Someone who thinks in distributed systems and can optimize for both latency and throughput.
- A technical leader who can make architectural decisions and implement them.
- Strong communicator who can explain complex graph concepts to stakeholders.
- Self-directed engineer who can own large technical initiatives end-to-end.
- Performance-obsessed developer who benchmarks everything.
Impact You'll Make
- Define the architecture for entity resolution serving multiple business domains.
- Build the graph intelligence layer powering advanced analytics and Ml.
- Create systems that will process billions of entities with millisecond latencies.
- Establish best practices for graph computing in our organization.
- Mentor other engineers on distributed graph algorithms.
-
Chief Data Engineer
14 hours ago
Nashik, Maharashtra, India beBeeDataArchitect Full time ₹ 2,00,00,000 - ₹ 3,00,00,000Key Roles and Responsibilities">We are seeking an experienced professional to fill a critical Data Architect position. The ideal candidate will possess a strong background in data engineering, architecture, and analytics, with expertise in scalable and secure Databricks solutions.">The successful candidate will design and implement efficient data pipelines,...
-
Big Data Architect
3 weeks ago
Nashik, Maharashtra, India EdgeVerve Full timeJob Tittle- Big Data ArchitectLocation - Bangalore/ PuneExperience: 10Yrs to 16 YearsExperienced profile with strong integration data architecture, data modeling, database design, proficient in SQL and familiar with at least one cloud platforms. Good understanding of data integration and management tools (MuleSoft/IBM Sterling Integrator/Talend/Informatica.)...
-
Data System Architect
11 hours ago
Nashik, Maharashtra, India beBeeData Full time ₹ 1,80,00,000 - ₹ 2,50,00,000Job Title: Data Systems ArchitectWe are seeking a highly skilled Data Systems Architect to join our team. The ideal candidate will have expertise in designing, building and maintaining scalable data pipelines for data ingestion, processing and storage.The role requires strong knowledge of data management, processing and architecture. Additionally, the...
-
Senior Data Solutions Architect
4 days ago
Nashik, Maharashtra, India beBeeDataEngineering Full time US$ 69,600 - US$ 1,02,720Seeking a highly skilled data solutions architect to lead the development of innovative data engineering projects. This long-term contract opportunity involves designing and implementing scalable data pipelines using Python, PySpark, and SQL.The ideal candidate will possess at least 5-10 years of experience in data engineering, with expertise in the...
-
Lead Data Engineer Position
21 hours ago
Nashik, Maharashtra, India beBeeData Full time ₹ 15,00,000 - ₹ 30,00,000Key to Unlocking Business Insights:We are seeking a seasoned Data Engineering professional to spearhead our big data engineering efforts. As a critical team member, you will be responsible for architecting and implementing scalable data pipelines that meet the evolving needs of clients.As a technical leader, you will collaborate closely with cross-functional...
-
Chief Data Architect
14 hours ago
Nashik, Maharashtra, India beBeeData Full time ₹ 1,00,00,000 - ₹ 2,01,50,000About This RoleThis is a fantastic opportunity to excel in the field of data architecture. As a visionary data architect, you will be responsible for crafting and governing data architecture and pipelines.
-
Data Platform Architect
2 weeks ago
Nashik, Maharashtra, India beBeeData Full time ₹ 1,96,80,000 - ₹ 2,15,60,000Job Summary:The lead will oversee the development and scaling of data and AI capabilities across the organization, utilizing Databricks Unified Analytics Platform. This role requires technical vision, solution architecture, team building, partnership development, and delivery excellence.Key Responsibilities:Offering and Capability Development: Develop and...
-
Cloud Data Architect
4 weeks ago
Nashik, Maharashtra, India Sunrise Systems, Inc. Full time**********************************4 months contract opportunity**********************************Remote - Must Work EST Hours [Flexible and around 11.00 AM EST]Prefer minimum of 5-15 years of experience.This project is working on a new analytic platform, consolidation, and implementation.The cloud architect will identify, lead, and deliver data analysis and...
-
Data Solutions Architect
14 hours ago
Nashik, Maharashtra, India beBeeArchitect Full time ₹ 80,00,000 - ₹ 1,50,00,000Job Overview We are seeking an experienced Data Solutions Architect to design and develop solutions that drive business growth. The ideal candidate will have a strong background in designing and developing data-centric solutions using Go Anywhere, Seeburger MFT standard FTP servers, and secure FTP servers.Experience with SFTP tools, standards, and AWS...
-
Senior Data Engineer
14 hours ago
Nashik, Maharashtra, India beBeeDataEngineer Full time ₹ 1,50,00,000 - ₹ 2,50,00,000Job OpportunitySeeking a seasoned leader to spearhead data engineering initiatives, leveraging expertise in Big Data platforms and cloud-native environments.About the RoleThe ideal candidate will direct a team of talented data engineers, guiding them in designing and delivering scalable data pipelines and analytics platforms that prioritize high performance,...