
Senior Data Pipeline Architect
1 week ago
This role is responsible for leading the design and implementation of a shared component library/SDK for pipelines, including ingestion, parsing/OCR, extraction, validation, enrichment, and publishing. The ideal candidate will have strong expertise in performance and concurrency, with experience in implementing large-scale data pipelines using Apache Beam and/or Spark.
Key Responsibilities:
- Pipeline Development: Design and build a shared component library/SDK for pipelines, including ingestion, parsing/OCR, extraction, validation, enrichment, and publishing.
- Standardization: Define patterns/templates for Apache Beam pipelines and Databricks jobs, standardize configuration, packaging, versioning, CI/CD, and documentation.
- Interface Creation: Create pluggable interfaces so multiple teams can swap extractors (Regex/LLM), OCR providers, and EMR publishers without code rewrites.
- Repository Strategy: Define repo strategy - shared/child repos for each use case.
- Profiling and Tuning: Own end-to-end profiling and tuning: cProfile/py-spy/line_profiler, memory (tracemalloc), CPU vs I/O analysis.
- Observability: Instrument services with Elastic APM and correlate traces/metrics with Splunk logs; build dashboards and runbooks.
- Concurrency Best Practices: Implement concurrency best practices: asyncio for I/O-bound, ThreadPool/ProcessPool for CPU-bound, batching, rate limiting, retries, etc.
- LLM Rate Limiting: Implement robust LLM API rate limiting/governance: enforce provider TPM and concurrency caps, request queueing/token budgeting, and emit APM/Splunk metrics (throttle rate, queue depth, cost per job) with alerts.
- SLOs and Alerts: Establish SLOs/alerts for throughput, latency, error rates; set up DLQs and recovery patterns.
- Mentorship: Mentor devs, lead design reviews, codify best practices, write clear docs and examples.
- Partnership: Partner with ML engineers on the future LLM/SLM path (evaluation harness, safety/PII, cost/perf).
Requirements:
- Expertise: 7+ years Python with strong depth in performance and concurrency (asyncio, concurrent.futures, multiprocessing), profiling and memory tuning.
- Observability Expertise: Observability expertise: Elastic APM instrumentation and dashboarding; Splunk for logs and correlation; OpenTelemetry familiarity.
- LLM Experience: Must have implemented LLM based solutions and supported them in production.
- API Engineering: API engineering for high-throughput integrations (REST, OAuth2), resilience patterns, and secure handling of sensitive data.
- Architecture Skills: Strong architecture/design skills: clean interfaces, packaging shared libs, versioning, CI/CD (GitHub Actions/Azure DevOps), testing.
- Data Pipelines: 3+ years building large-scale data pipelines with Apache Beam and/or Spark, including hands-on Databricks experience (Jobs, Delta Lake, cluster tuning).
- Document Processing: Document processing: OCR (Tesseract, AWS Textract, Azure Form Recognizer), PDF parsing, text normalization.
- LLM/SLM Integration: LLM/SLM integration experience (e.g., OpenAI/Azure AI, local SLMs), prompt/eval frameworks, PII redaction/guardrails.
- Cloud and Tooling: Cloud and tooling: AWS/Azure/GCP, Dataflow/Flink, Terraform, Docker; cost/performance tuning on Databricks.
- Security Mindset: Security/compliance mindset (HIPAA), secrets management, least-privilege access.
-
Chief Data Pipeline Architect
5 days ago
Malappuram, Kerala, India beBeeDataEngineering Full time ₹ 15,00,000 - ₹ 20,00,000Job Title: Chief Data Pipeline ArchitectAbout the Role:Craft and maintain efficient data pipelines to support business operationsCollaborate with cross-functional teams to identify data needs and deliver solutionsDesign, develop, and optimize large-scale data systems using PySpark and PythonWork closely with data scientists and analysts to ensure...
-
Data Pipeline Architect
2 weeks ago
Malappuram, Kerala, India beBeeDataEngineer Full time ₹ 15,00,000 - ₹ 20,10,000Job DescriptionAs a data engineer, you will play a crucial role in designing and implementing scalable data pipelines to support our business growth.
-
Cloud Solutions Architect
4 days ago
Malappuram, Kerala, India beBeeCloud Architect Full time ₹ 1,50,00,000 - ₹ 2,50,00,000Job OverviewWe are seeking an experienced .NET architect to lead our cloud-native transformation efforts. This role requires a strong background in modernizing legacy applications into scalable solutions using .NET Core, Docker, Kubernetes, and Azure.Key Responsibilities:Drive containerization efforts with Docker & KubernetesDefine and implement robust CI/CD...
-
Data Pipeline Architect
4 days ago
Malappuram, Kerala, India beBeeDataEngineer Full time ₹ 15,00,000 - ₹ 20,25,000Data Engineer PositionWe are seeking a skilled professional to design and implement scalable data pipelines using modern cloud and big data platforms. Key Responsibilities: Design, develop, and maintain large-scale data processing systems Implement efficient data ingestion, storage, and retrieval solutions Collaborate with cross-functional teams to ensure...
-
Building Scalable Data Pipelines
2 weeks ago
Malappuram, Kerala, India beBeeData Full time ₹ 15,00,000 - ₹ 25,00,000Job Title: Business Intelligence ArchitectWe are seeking a skilled professional to design, build, and optimize scalable data pipelines using Snowflake, AWS (Lambda, Glue), DBT, and SQL.The ideal candidate will be responsible for enabling seamless data integration, transformation, and analytics to support business intelligence and advanced analytics...
-
Senior Data Architect
2 weeks ago
Malappuram, Kerala, India beBeeDataEngineer Full time ₹ 1,50,00,000 - ₹ 2,00,00,000Job Title:Data Platform EngineerAbout the Opportunity:We are seeking a skilled data engineer to design, build, and maintain our modern data platform.This role is ideal for someone with deep technical expertise in ETL pipeline design, data modeling, and data infrastructure who thrives in a fast-paced engineering environment.Key Responsibilities:Design and...
-
Data Pipeline Expert
2 weeks ago
Malappuram, Kerala, India beBeeDataIntegration Full time ₹ 18,00,000 - ₹ 25,00,000Job TitleThe Data Integration Specialist will play a crucial role in designing and implementing data pipelines, developing data processing scripts using SQL, and collaborating with cross-functional teams to drive business outcomes.The ideal candidate will have at least 5-6 years of experience with strong hands-on experience in ETL processes, proficiency in...
-
Senior Data Pipeline Specialist
2 weeks ago
Malappuram, Kerala, India beBeeDataEngineer Full time ₹ 25,00,000 - ₹ 35,00,000Are you a data expert looking to leverage your skills in designing and building robust data pipelines?We are seeking an experienced Azure Data Engineer to join our team.The ideal candidate will have strong SQL skills, experience with Fabric, and proficiency in Python (PySpark).Additionally, they will have hands-on experience with Azure Data Services, CI/CD...
-
Senior Cloud Data Specialist
4 days ago
Malappuram, Kerala, India beBeeData Full time ₹ 13,43,238 - ₹ 25,16,920AWS Data Architect PositionJob Title: AWS Data ArchitectWe are seeking a highly skilled Senior AWS Data Engineer to join our data engineering team.The ideal candidate will have deep expertise in building scalable data pipelines using Apache Spark, PySpark, SQL, and Python, along with hands-on experience in the AWS ecosystem.Key Responsibilities:Design,...
-
Data Architect
2 weeks ago
Malappuram, Kerala, India beBeeData Full time ₹ 8,00,000 - ₹ 10,00,000Job Title: Data ArchitectData Engineer RoleWe are seeking a skilled data engineer to lead the design and development of our data pipelines using Databricks PySpark.Key Responsibilities:Data Pipeline Development:Design and develop efficient data pipelines using Databricks PySpark for extracting, transforming, and loading data from diverse sources into our...