Data Engineer
3 weeks ago
What You'll Build Core Responsibilities Data Architecture & Infrastructure (40%) ● Design and implement a multi-database architecture (MongoDB, Redis, Milvus, Neo4j, BigQuery) ● Build scalable data pipelines for real-time conversation processing and personalization● Architect ETL/ELT workflows for data migration from legacy systems● Implement data partitioning, sharding, and optimization strategies for high-throughput systems ● Create data governance frameworks ensuring quality, security, and compliance Vector & Graph Database Systems (25%)● Design and optimize Milvus vector collections for semantic search (1024-dim embeddings) ● Build graph schemas in Neo4j for customer journey mapping and persona relationships● Implement HNSW indexing strategies and similarity search optimization● Create hybrid search systems combining vector, full-text, and graph queries● Monitor and tune database performance (query latency, throughput, resource utilization) ML Data Infrastructure (20%) ● Build data collection pipelines for LLM fine-tuning (conversation logs, tool executions)● Create feature stores for GNN training (customer interactions, engagement signals)● Implement data versioning and lineage tracking for ML experiments ● Design A/B testing data infrastructure with CUPED variance reduction● Build real-time feature computation pipelines for contextual bandits Analytics & Monitoring (15%) ● Design BigQuery schemas for marketing analytics and performance tracking● Create materialized views and aggregation pipelines for real-time dashboards● Implement data quality monitoring and anomaly detection ● Build observability infrastructure (Prometheus metrics, Grafana dashboards)● Develop cost optimization strategies for cloud data warehousing Technical Stack You'll Work With Databases & Storage ● MongoDB (conversation state, active sessions) ● Redis (caching, rate limiting, real-time data) ● Milvus (vector embeddings, semantic search) ● Neo4j (customer journey graphs, persona networks) ● BigQuery (analytics warehouse, historical data) Data Processing & Orchestration ● Apache Airflow or Prefect (workflow orchestration) ● Pandas, Polars (data transformation) ● Apache Spark (optional - for large-scale processing) ● dbt (data transformation and modeling) ML/AI Data Pipeline ● vLLM (LLM inference serving) ● MLflow (model registry, experiment tracking)● Sentence Transformers (embedding generation) ● PyTorch, TensorFlow (ML model training) Cloud & Infrastructure ● Google Cloud Platform (BigQuery, Cloud Storage, Compute) ● Docker & Kubernetes (containerization, orchestration) ● Terraform (infrastructure as code) ● GitHub Actions or GitLab CI (CI/CD pipelines) Programming & Tools ● Python 3.10+ (primary language) ● SQL (complex queries, query optimization) ● Shell scripting (Bash/Zsh) ● Git (version control) Requirements Must-Have Skills ● 5+ years of data engineering experience with production systems● Expert-level SQL and database design skills ● Strong Python programming (async/await, type hints, testing) ● Experience with at least 3 different database technologies (SQL, NoSQL, Vector, Graph) ● Proven track record building high-scale data pipelines (>1M records/day)● Deep understanding of data modeling (dimensional, normalized, denormalized)● Experience with cloud data warehouses (BigQuery, Redshift, or Snowflake)● Strong knowledge of data quality, validation, and governance ● Excellent debugging and optimization skills Highly Desirable ● Experience with vector databases (Milvus, Pinecone, Weaviate, Qdrant)● Experience with graph databases (Neo4j, ArangoDB, Neptune) ● Knowledge of embedding models and semantic search ● Experience with ML data pipelines (feature stores, model training data)● Understanding of A/B testing and experimental design ● Experience with real-time streaming (Kafka, Pub/Sub, Kinesis) ● Knowledge of LLMs and conversational AI systems ● Experience with data migration projects (especially large-scale) ● Background in marketing technology or customer data platformsNice-to-Have ● Experience with PyTorch Geometric or graph neural networks ● Knowledge of marketing analytics (attribution, segmentation, personalization)● Familiarity with LangChain, LangGraph, or agent frameworks ● Experience with cost optimization in cloud environments ● Contributions to open-source data engineering projects ● Experience with data compliance (GDPR, CCPA) Key Projects You'll Own Phase 1: Foundation ● Migrate 10M+ conversation vectors from Pinecone to Milvus ● Design and implement MongoDB schemas for real-time agent state● Set up Neo4j graph database with customer journey models ● Create BigQuery data warehouse with partitioned tables Phase 2: Optimization ● Build automated data quality monitoring system ● Implement caching strategies (Redis) for 10x latency reduction ● Optimize vector search queries (target:
-
Data engineer
3 weeks ago
Ajmer, India Forage AI Full timeExperience Level: Data Engineer- 3- 7 years of relevant experience in data engineering.About Forage AI: Forage AI is a pioneering AI-powered data extraction and automation company that transforms complex, unstructured web and document data into clean, structured intelligence. Our platform combines web crawling, NLP, LLMs, and agentic AI to deliver highly...
-
Data Engineer
2 days ago
Ajmer, India Canopus Infosystems - A CMMI Level 3 Company Full timePosition:Data Engineer Experience:6 Months to 3 Years Location:Remote Joining:Immediate Joiners PreferredAbout the Role:We are looking for a skilledData Engineerwith strong Python expertise and hands-on experience in handling large datasets, data cleaning, analysis, and visualization. The ideal candidate should be capable of building data pipelines,...
-
Azure Data Engineer
2 days ago
Ajmer, India LTIMindtree Full timeRole Senior Data Engineer 8 years of experienceKey responsibilities Build reusable utilities templates and automation pipelines Design scalable data engineering frameworks standards and best practices Provide architectural guidance cost optimization and performance tuning support for Data Engineering solutions mainly on Azure Evaluate and onboard new...
-
Azure data engineering
3 weeks ago
Ajmer, India LTIMindtree Full timeLet's Connect!!!We are hiring for ADB +ADF +Py spark +SQL +SynapseRoles:Specialist - Data Engineering 5 to 8 YrsSenior Specialist - Data Engineering 8 to 12 yrsLocation: Coimbatore & IndorePlease apply in below link Ga WD
-
PySpark Data Engineer
3 weeks ago
Ajmer, India EXTRAGIG Full time🚀 Contract Assistant – Data Engineer Support (Remote, EST Hours) 🚀📅 Start Date: Sept 10, 2025⏳ Duration: 6 months (extendable)💰 Pay: $1,000/month🕗 Work Hours: 8:00 AM – 5:30 PM ESTWe’re looking for a Contract Assistant to support a PySpark Data Engineer with daily activities. This is a remote contract role (not formal employment).What...
-
Azure Data Engineer
2 weeks ago
Ajmer, India 9NEXUS Full timeJob Title: Azure Data EngineerExperience: 6-10 YearsJob Description: We are seeking an experienced Azure Data Engineer to design, develop, and optimize scalable data solutions in a cloud environment. The role involves building high-performance data pipelines, managing large-scale ETL/ELT processes, and collaborating with stakeholders to deliver secure and...
-
Azure Data Engineer
2 weeks ago
Ajmer, India 9NEXUS Full timeJob Title: Azure Data EngineerExperience: 6-10 YearsJob Description: We are seeking an experienced Azure Data Engineer to design, develop, and optimize scalable data solutions in a cloud environment. The role involves building high-performance data pipelines, managing large-scale ETL/ELT processes, and collaborating with stakeholders to deliver secure and...
-
Data visualization with gcp
3 weeks ago
Ajmer, India People Prime Worldwide Full timeAbout Company: Our Client Corporation provides digital engineering and technology services to Forbes Global 2000 companies worldwide. Our Engineering First approach ensures we can execute all ideas and creatively solve pressing business challenges. With industry expertise and empowered agile teams, we prioritize execution early in the process for impactful...
-
Ai Data Architect
3 weeks ago
Ajmer, India Whatjobs IN C2 Full timeRole:- AI Data Architect Experience- 8 to 15years Location - PAN India Job Description We are seeking an inventive Data Architect for AI with 8–15 years of experience to lead the strategic design and implementation of enterprise-scale AI solutions. 8+ years of experience in ETL/ELT, Big Data cloud data solutioning, with at least 3 years in architectural...
-
Cloud Pak For Data
4 days ago
Ajmer, India Whatjobs IN C2 Full timeWe’re Hiring Cloud Pak for Data (CP4D) Platform Engineer | Remote Are you passionate about enterprise data platforms, automation and high availability for AI-driven platforms? Our client is looking for a CP4D Platform Engineer / Admin to support deployments across IBM Cloud Pak for Data, Watson Knowledge Catalog and Manta. What You’ll Do - Set up...