High-Level Software Architectures for Scalable Data Pipelines

2 days ago


Kanpur, Uttar Pradesh, India beBeePython Full time ₹ 1,50,00,000 - ₹ 2,50,00,000
Job Title: Python Platform Engineer

About the Position:

  • We are seeking a skilled Python engineer to lead our platform engineering efforts with hands-on experience in Apache Beam, Databricks, and related technologies.

Key Responsibilities:

  • Design and build a shared component library/SDK for pipelines: ingestion, parsing/OCR, extraction, validation, enrichment, publishing.
  • Define patterns/templates for Apache Beam pipelines and Databricks jobs; standardize configuration, packaging, versioning, CI/CD, and documentation.
  • Create pluggable interfaces so multiple teams can swap extractors (Regex/LLM), OCR providers, and EMR publishers without code rewrites.
  • Develop a robust repository strategy - shared/child repos for each use case.
Performance and Reliability:
  • Own end-to-end profiling and tuning: cProfile/py-spy/line_profiler, memory (tracemalloc), CPU vs I/O analysis.
  • Instrument services with Elastic APM and correlate traces/metrics with Splunk logs; build dashboards and runbooks.
  • Implement concurrency best practices: asyncio for I/O-bound, ThreadPool/ProcessPool for CPU-bound, batching, rate limiting, retries, etc.
  • Implement robust LLM API rate limiting/governance: enforce provider TPM and concurrency caps, request queueing/token budgeting, and emit APM/Splunk metrics (throttle rate, queue depth, cost per job) with alerts.
  • Establish SLOs/alerts for throughput, latency, error rates; set up DLQs and recovery patterns.
Team Enablement:
  • Mentor developers, lead design reviews, codify best practices, write clear documentation and examples.
  • Partner with ML engineers on the future LLM/SLM path (evaluation harness, safety/PII, cost/perf).
Requirements:
  • 7+ years of Python experience with strong depth in performance and concurrency (asyncio, concurrent.futures, multiprocessing), profiling and memory tuning.
  • Observability expertise: Elastic APM instrumentation and dashboarding; Splunk for logs and correlation; OpenTelemetry familiarity.
  • Must have implemented LLM-based solutions and supported them in production.
  • API engineering for high-throughput integrations (REST, OAuth2), resilience patterns, and secure handling of sensitive data.
  • Strong architecture/design skills: clean interfaces, packaging shared libraries, versioning, CI/CD (GitHub Actions/Azure DevOps), testing.
  • 3+ years of building large-scale data pipelines with Apache Beam and/or Spark, including hands-on Databricks experience (Jobs, Delta Lake, cluster tuning).
  • Document processing: OCR (Tesseract, AWS Textract, Azure Form Recognizer), PDF parsing, text normalization.
  • LLM/SLM integration experience (e.g., OpenAI/Azure AI, local SLMs), prompt/eval frameworks, PII redaction/guardrails.
  • Cloud and tooling: AWS/Azure/GCP, Dataflow/Flink, Terraform, Docker; cost/performance tuning on Databricks.
  • Security/compliance mindset (HIPAA), secrets management, least-privilege access.
Benefits:
  • Competitive salary and benefits package
  • Culture focused on talent development with quarterly promotion cycles and company-sponsored higher education and certifications
  • Opportunity to work with cutting-edge technologies
  • Employee engagement initiatives such as project parties, flexible work hours, and Long Service awards
  • Annual health check-ups
  • Insurance coverage: group term life, personal accident, and Mediclaim hospitalization for self, spouse, two children, and parents
Work Environment:
  • Persistent Ltd. is dedicated to fostering diversity and inclusion in the workplace.
  • We welcome diverse candidates from all backgrounds.
  • We support hybrid work and flexible hours to fit diverse lifestyles.
  • Our office is accessibility-friendly, with ergonomic setups and assistive technologies to support employees with physical disabilities.
Contact Information:

Please visit our careers page for more information.



  • Kanpur, Uttar Pradesh, India beBeeDataEngineer Full time ₹ 1,50,00,000 - ₹ 2,00,00,000

    About Us We're looking for a talented Data Engineer to help us scale and optimize our data processing capabilities. Role Overview You will design, build, and maintain high-performance data pipelines that process terabytes of data. Key Responsibilities Design and implement scalable batch processing systems using Python and big data technologies Optimize...


  • Kanpur, Uttar Pradesh, India beBeeDataEngineering Full time ₹ 18,00,000 - ₹ 25,00,000

    Are you a data engineering expert looking for a new challenge? This is an excellent opportunity to leverage your skills and experience in designing, developing, and maintaining scalable and high-performance data pipelines using Databricks and PySpark.About the RoleThis is a Senior Data Engineer position where you will be working on the design, development,...


  • Kanpur, Uttar Pradesh, India beBeeData Full time ₹ 10,00,000 - ₹ 20,00,000

    Big Data EngineerWe are seeking a highly skilled professional to join our team as a Big Data Engineer. In this role, you will be responsible for designing, building, and maintaining robust data pipelines that handle high-volume financial data.Key Responsibilities:Design, develop, and manage end-to-end data pipelines for stocks, crypto, and other financial...


  • Kanpur, Uttar Pradesh, India beBeeArchitecture Full time ₹ 1,20,00,000 - ₹ 1,80,00,000

    About the RoleWe are seeking an experienced Data Architecture Specialist to join our team. In this role, you will design and develop scalable, robust, and secure data pipelines using Azure Data Factory, Databricks, Synapse Analytics, Azure Data Lake Gen2, Event Hub, and Azure Functions.Your key responsibilities will include developing and maintaining ETL/ELT...


  • Kanpur, Uttar Pradesh, India beBeeData Full time ₹ 7,50,000 - ₹ 15,00,000

    Job Title: Data Architecture ExpertWe are seeking a seasoned professional with expertise in designing and implementing data pipelines, using cloud-based technologies such as Azure Data Factory.Key Responsibilities:Design, implement, and manage data pipelines using Azure Data Factory.Implement scalable data processing solutions within Azure Synapse...


  • Kanpur, Uttar Pradesh, India beBeeBackend Full time ₹ 1,68,00,000 - ₹ 2,24,00,000

    We are seeking a skilled Backend Developer to join our team and design the core architecture of our FinTech platform. As a key member of our founding team, you will be responsible for building high-performance systems that handle real-time financial data and execute trades efficiently.The RoleBackend Architecture & Development: Design and implement...


  • Kanpur, Uttar Pradesh, India beBeeDataEngineer Full time ₹ 1,50,00,000 - ₹ 2,50,00,000

    About the Role:The ideal candidate will have a solid foundation in data engineering with a minimum of 7 years of experience in development on data-centric projects.Mandatory Skills:Expertise in working with real-time data and Kafka framework, including kSQL/Mirror Maker etc.Proficiency in at least one programming language: Groovy or Java.Thorough...


  • Kanpur, Uttar Pradesh, India beBeeEngineeringManager Full time ₹ 20,00,000 - ₹ 25,00,000

    About the RoleWe are seeking an experienced Engineering Manager to lead a team of data engineers in designing and delivering scalable data pipelines and analytics platforms using Big Data platforms, Microsoft Azure, and Databricks.Main ResponsibilitiesLead, mentor, and grow a team of data engineers working on large-scale distributed data systems.Architect...


  • Kanpur, Uttar Pradesh, India beBeeETL Full time ₹ 10,00,000 - ₹ 15,00,000

    Senior ETL DeveloperWe are seeking an accomplished Senior ETL Developer to join our Data Engineering team. The ideal candidate will have a proven track record of designing, developing, and maintaining scalable and efficient data pipelines using IBM DataStage, AWS Glue, and Snowflake.The role involves collaborating with architects, business analysts, and data...


  • Kanpur, Uttar Pradesh, India beBeeData Full time ₹ 15,00,000 - ₹ 17,50,000

    About the RoleWe're seeking a highly skilled Data Engineer to build robust pipelines that transform large, messy technical datasets into high-quality structured data.