Data Engineering Manager – Web Crawling
24 hours ago
Data Engineering Manager – Web Crawling & Pipeline ArchitectureExperience: 7 to 12 YearsLocation: Remote / BangaloreEngagement: Full-timePositions: 2Qualification: B.E / B.Tech / M.Tech / MCA / Computer Science / ITIndustry: IT / Data / AI / E-commerce / FinTech / HealthcareNotice Period: ImmediateWhat We Are Looking For- Proven experience leading data engineering teams with strong ownership of web crawling systems and pipeline architecture.- Expertise in designing, building, and optimizing scalable data pipelines, preferably using workflow orchestration tools such as Airflow or Celery.- Hands-on proficiency in Python and SQL for data extraction, transformation, processing, and storage.- Experience working with cloud platforms such as AWS, GCP, or Azure for data infrastructure, deployments, and pipeline operations.- Deep understanding of web crawling frameworks, proxy rotation, anti-bot strategies, session handling, and compliance with global data collection standards (GDPR/CCPA-safe crawling).- Strong expertise in AI-driven automation, including integrating AI agents or frameworks like Crawl4ai into scraping, validation, and pipeline workflows..Responsibilities- Lead and mentor data engineering and web crawling teams, ensuring high-quality delivery and adherence to best practices.- Architect, implement, and optimize scalable data pipelines that support high-volume data ingestion, transformation, and storage.- Build and maintain robust crawling systems using modern frameworks, handling IP rotation, throttling, and dynamic content extraction.- Establish pipeline orchestration using Airflow, Celery, or similar distributed processing technologies.- Define and enforce data quality, validation, and security measures across all data flows and pipelines.- Collaborate with product, engineering, and analytics teams to translate data requirements into scalable technical solutions.- Develop monitoring, logging, and performance metrics to ensure high availability and reliability of data systems.- Oversee cloud-based deployments, cost optimization, and infrastructure improvements on AWS/GCP/Azure.- Integrate AI agents or LLM-based automation for tasks such as error resolution, data validation, enrichment, and adaptive crawlingQualifications- Bachelor's or master's degree in engineering, Computer Science, or related field.- 7–12 years of relevant experience in data engineering, pipeline design, or large-scale web crawling systems.- Strong expertise in Python, SQL, and modern data processing practices.- Experience working with Airflow, Celery, or similar workflow automation tools.- Solid understanding of proxy systems, anti-bot techniques, and scalable crawler architecture.- Hands-on experience with cloud data platforms (AWS/GCP/Azure).- Experience with AI/LLM frameworks (Crawl4ai, LangChain, LlamaIndex, AutoGen, OpenAI, or similar).- Strong analytical, architectural, and leadership skills.
-
Data Engineer
2 days ago
tumkur, India Tech Mahindra Full timeHi All,W are hiring Data Engineers, 4-9 years experience with below mandatory Skills .Job Location : Pan India ( Tech M Location)Azure Databricks, ETL, Pyspark & SQL, Medallion architecture- Informatica knowledge.- Fact & dimension table- Medallion architecture- Data bricks certification preferable- ADF, Databricks, Pyspark, SQL Mandatory- ETL concepts
-
Data Governance Collibra Data Lineage Engineer
9 hours ago
tumkur, India Nityo Infotech Full timeImmediate Hiring – Data Governance Collibra Data Lineage Engineer (Singapore | Hybrid |Note : Open to candidates from anywhere in the world who are interested in relocating to Singapore Start Date: Immediate – Within 2 to 4 WeeksWe are looking for an experienced Collibra Data Lineage Engineer to support a leading Japanese Bank in strengthening its...
-
Senior Data Solutions Developer
13 hours ago
tumkur, India beBeeData Full timeJob SummaryWe are seeking a skilled software engineer to join our global technology team. The successful candidate will work as part of a global team to develop and deliver industry-leading technology solutions that drive business growth.Key Responsibilities:Analyze, design, develop, implement, document, and maintain applications systems on moderately...
-
Kinaxis Supply Chain Integration Analyst
2 days ago
tumkur, India BD Full timeRole: Kinaxis Supply Chain Integration Analyst Location: BengaluruWork Model: 4 days in office (MondayThursday)Experience: 3+ yearsAdditional Screening Pointers:- Strong integration experience between SAP and Kinaxis.- Working knowledge of major SAP transactional data tables/interfaces (Sales Orders, Purchase Orders, Production Orders) and data flow to...
-
Enterprise Data Governance and Management Lead
8 hours ago
tumkur, India beBeeDataGovernance Full timeJob Title:Data Governance SpecialistWe are seeking a Data Governance specialist to support a leading bank in strengthening its Enterprise Data Governance and Management initiatives.Key Responsibilities:Configure & deploy data lineage workflows (technical + business)Implement data catalog, business glossary & metadata modelsBuild Collibra workflows using BPMN...
-
Principal Engineer
3 hours ago
tumkur, India Quantalent AI Full timePrincipal EngineerThis opportunity is with one of our clients, a mission-driven global EdTech innovator building large-scale learning platforms used by millions of students and educators worldwide.Location: Bengaluru – Hybrid (3 days WFO)Experience: 12–18 yearsRole: Principal Engineer (Technical Leadership & Architecture)About the RoleWe are seeking a...
-
Principal Quality Assurance Engineer
2 days ago
tumkur, India Skyhigh Security Full timeAbout Skyhigh Security:Skyhigh Security is a dynamic, fast-paced, cloud company that is a leader in the security industry. Our mission is to protect the world's data, and because of this, we live and breathe security. We value learning at our core, underpinned by openness and transparency.Since 2011, organizations have trusted us to provide them with a...
-
Associate Engineer Iii T500-21686
8 hours ago
tumkur, India lululemon Full timeAbout lululemon:lululemon is an innovative performance apparel company for yoga, running, training, and other athletic pursuits. Setting the bar in technical fabrics and functional design, we create transformational products and experiences that support people in moving, growing, connecting, and being well. We owe our success to our innovative products,...
-
Chief Data Strategist
11 hours ago
tumkur, India beBeeData Full timeQuantitative Data ManagerVOYA INDIA is a technology-driven company driving the evolution of the financial services customer experience through technology, innovation, and human creativity.This role involves overseeing critical quantitative data for asset management firms and investment banks across multiple consultant databases.Evaluate key deliverables...
-
Azure Cloud Engineer
2 days ago
tumkur, India Deloitte Full timeYour potential, unleashed.India's impact on the global economy has increased at an exponential rate and Deloitte presents an opportunity to unleash and realize your potential amongst cutting edge leaders, and organizations shaping the future of the region, and indeed, the world beyond.At Deloitte, your whole self to work, every day. Combine that with our...