Data Engineer
4 weeks ago
What You'll Build Core Responsibilities Data Architecture & Infrastructure (40%) ● Design and implement a multi-database architecture (MongoDB, Redis, Milvus, Neo4j, BigQuery) ● Build scalable data pipelines for real-time conversation processing and personalization● Architect ETL/ELT workflows for data migration from legacy systems● Implement data partitioning, sharding, and optimization strategies for high-throughput systems ● Create data governance frameworks ensuring quality, security, and compliance Vector & Graph Database Systems (25%)● Design and optimize Milvus vector collections for semantic search (1024-dim embeddings) ● Build graph schemas in Neo4j for customer journey mapping and persona relationships● Implement HNSW indexing strategies and similarity search optimization● Create hybrid search systems combining vector, full-text, and graph queries● Monitor and tune database performance (query latency, throughput, resource utilization) ML Data Infrastructure (20%) ● Build data collection pipelines for LLM fine-tuning (conversation logs, tool executions)● Create feature stores for GNN training (customer interactions, engagement signals)● Implement data versioning and lineage tracking for ML experiments ● Design A/B testing data infrastructure with CUPED variance reduction● Build real-time feature computation pipelines for contextual bandits Analytics & Monitoring (15%) ● Design BigQuery schemas for marketing analytics and performance tracking● Create materialized views and aggregation pipelines for real-time dashboards● Implement data quality monitoring and anomaly detection ● Build observability infrastructure (Prometheus metrics, Grafana dashboards)● Develop cost optimization strategies for cloud data warehousing Technical Stack You'll Work With Databases & Storage ● MongoDB (conversation state, active sessions) ● Redis (caching, rate limiting, real-time data) ● Milvus (vector embeddings, semantic search) ● Neo4j (customer journey graphs, persona networks) ● BigQuery (analytics warehouse, historical data) Data Processing & Orchestration ● Apache Airflow or Prefect (workflow orchestration) ● Pandas, Polars (data transformation) ● Apache Spark (optional - for large-scale processing) ● dbt (data transformation and modeling) ML/AI Data Pipeline ● vLLM (LLM inference serving) ● MLflow (model registry, experiment tracking)● Sentence Transformers (embedding generation) ● PyTorch, TensorFlow (ML model training) Cloud & Infrastructure ● Google Cloud Platform (BigQuery, Cloud Storage, Compute) ● Docker & Kubernetes (containerization, orchestration) ● Terraform (infrastructure as code) ● GitHub Actions or GitLab CI (CI/CD pipelines) Programming & Tools ● Python 3.10+ (primary language) ● SQL (complex queries, query optimization) ● Shell scripting (Bash/Zsh) ● Git (version control) Requirements Must-Have Skills ● 5+ years of data engineering experience with production systems● Expert-level SQL and database design skills ● Strong Python programming (async/await, type hints, testing) ● Experience with at least 3 different database technologies (SQL, NoSQL, Vector, Graph) ● Proven track record building high-scale data pipelines (>1M records/day)● Deep understanding of data modeling (dimensional, normalized, denormalized)● Experience with cloud data warehouses (BigQuery, Redshift, or Snowflake)● Strong knowledge of data quality, validation, and governance ● Excellent debugging and optimization skills Highly Desirable ● Experience with vector databases (Milvus, Pinecone, Weaviate, Qdrant)● Experience with graph databases (Neo4j, ArangoDB, Neptune) ● Knowledge of embedding models and semantic search ● Experience with ML data pipelines (feature stores, model training data)● Understanding of A/B testing and experimental design ● Experience with real-time streaming (Kafka, Pub/Sub, Kinesis) ● Knowledge of LLMs and conversational AI systems ● Experience with data migration projects (especially large-scale) ● Background in marketing technology or customer data platformsNice-to-Have ● Experience with PyTorch Geometric or graph neural networks ● Knowledge of marketing analytics (attribution, segmentation, personalization)● Familiarity with LangChain, LangGraph, or agent frameworks ● Experience with cost optimization in cloud environments ● Contributions to open-source data engineering projects ● Experience with data compliance (GDPR, CCPA) Key Projects You'll Own Phase 1: Foundation ● Migrate 10M+ conversation vectors from Pinecone to Milvus ● Design and implement MongoDB schemas for real-time agent state● Set up Neo4j graph database with customer journey models ● Create BigQuery data warehouse with partitioned tables Phase 2: Optimization ● Build automated data quality monitoring system ● Implement caching strategies (Redis) for 10x latency reduction ● Optimize vector search queries (target:
-
Data Engineer
4 weeks ago
, India, IN KPG99 INC Full timeRole- Databricks EngineerLocation- Remote Duration- 12+ months with ExtensionsREQUIRED SKILLS AND EXPERIENCE- 3–5 years of experience in data engineering roles- Strong hands-on experience with Databricks for data processing and pipeline development.- Proficiency in SQL for data querying, transformation, and troubleshooting.- Solid programming skills in...
-
Data Engineer
4 weeks ago
india, IN Insight Global Full timePosition: GCP Data Engineer Location: 100% Remote in IndiaDuration: 12 month contract + extensions + conversionsPackage: 10 LPA- 26 LPAInterview Process: 2 RoundsREQUIRED SKILLS AND EXPERIENCE6+ Years of experience as a Data Engineer Experience with GCP Data ie. Big Query, Cloud Storage, BigTable, Airflow, Dataproc, Dataflow Strong SQL experience (NoSQL,...
-
Data Engineer
4 weeks ago
india, IN Digivance Solutions Full timePosition: Data EngineerExperience: 5–10 YearsLocation: Chennai, Bengaluru, Pune, Hyderabad, Mumbai, Delhi NCR(candidates will be required to work at any of these locations in hybrid mode)Key ResponsibilitiesCollaborate with business and technology stakeholders to understand current and future data requirements.Design, build, and maintain reliable,...
-
Data Engineer
4 weeks ago
india, IN Jaipur Rugs Full timeOrganization Description: Jaipur Rugs is a social enterprise that connects rural craftsmanship with global markets through its luxurious handmade carpets. It is a family-run business that offers an exclusive range of hand-knotted and hand-woven rugs made using 2500 years old traditional art forms. The founder, Mr. Nand Kishore Chaudhary created a unique...
-
Databricks Data Engineer
4 weeks ago
india, IN Insight Global Full time** Immediate Joiner **Insight Global is seeking a Databricks Data Engineer in India with with 3–5 years of experience to support data engineering initiatives in the pharmaceutical domain. The ideal candidate will have hands-on expertise in Databricks, SQL, and Python, and a strong understanding of pharma/life sciences data. This role involves building and...
-
Senior Data Engineer
4 weeks ago
india, IN Mitra AI Full timeWe are seeking a candidate who is comfortable working on B shift (3PM - 11PM IST).JOB SPECIFIC DUTIES AND RESPONSIBILITIESDevelop and maintain ETL pipelines using FiveTran, DBT Cloud, and custom frameworksImplement client-specific data transformations for HighTouch integrationSupport data ingestion and quality checks during client onboardingOptimize...
-
SAP Data Engineer
4 weeks ago
india, IN KPG99 INC Full timePosition : SAP Data Engineer Location : Remote in IndiaDuration : 12-month contract with extensions Candidates' Core Skillset Must Be:DataSphere S4 Hana SACBW Data Modeling Plusses:SAP ABAP, HANA/AMDP/CDS, SOAP/OData/Rest API’s etc.
-
Sr. Data Engineer
4 weeks ago
Bangalore Urban, Karnataka, India, IN Tata Consultancy Services Full timeGreetings from TCS!We are Looking for Senior Data EngineerExperience : 8 + YearsLocation : BangaloreMust-HaveThe Senior Data Engineer is the one who designs and builds data foundations and end to end solutions for the Shell Business to maximize value from data. The role helps create a data-driven thinking within the organization, not just within IT teams,...
-
Data Steward
4 weeks ago
india, IN Insight Global Full timeRequired Skills & Experience:7-10 years of experience in data stewardship, data governance, or data analytics roles.Hands-on experience with Alation or similar enterprise data catalog platforms (e.g., Collibra, Informatica EDC, Atlan).Strong SQL skills with the ability to query, analyze, and validate large datasets.Familiarity with ETL/ELT pipelines and data...
-
SAP Data Engineer
4 weeks ago
india, IN KPG99 INC Full timeRole: SAP Data Engineer (DataSphere/BW/HANA/Data Modeling)Client NBCUniversal Location: Remote in IndiaDuration: 12-month contract with extensionsJob Description:Provide state-of-the-art technical support for the SAP Datasphere, BW4 Systems maintenance and enhancements, including integration to other systems. This includes participating in project...