Web Data Extraction Engineer

1 week ago


Mysore, India TripleChoice Corp Full time

Job Description We are building a high-speed, AI-enhanced web data extraction system to collect and structure public product data from hundreds (scaling to thousands) of manufacturer websites in the industrial and engineering space - including pumps, valves, compressors, and related equipment. Our focus is on accuracy, automation, speed, and reliability. We are not scraping selection tools or private portals - only publicly available catalogs, product pages, datasheets, and specifications. We are looking for a scraping engineer who can build a robust, intelligent, and self-healing system - with a strong emphasis on AI-powered parsing, dynamic adaptation, and automated failure recovery. ️ What You'll Do - Build and scale an automated, intelligent scraping pipeline for 1000+ public manufacturer websites and product catalogs. - Use AI (LLMs, NLP, or OCR) to extract, clean, and structure unstructured product data (from PDFs, tables, or descriptive text). - Design resilient scraping logic with retry mechanisms, selector auto-repair, and failure detection. - Implement tools to monitor scrape quality, detect partial/incomplete records, and track performance metrics. - Store output in both structured (e.g., PostgreSQL, JSON, CSV) and unstructured formats (PDF, blob) for ELT and indexing. - Continuously optimize for speed, quality, and minimal maintenance overhead. Success Means: - 90% success rate on automated runs across dynamic websites - Near-zero manual intervention through AI-assisted extraction and validation - Delivery of high-accuracy, structured product data that's ready for ELT pipelines - Scraper architecture that's scalable, modular, and fast Ideal Experience - 3+ years in web scraping / data extraction, especially across a variety of website structures - Strong with tools like Playwright, Puppeteer, or Selenium for dynamic rendering - Solid experience parsing and extracting data from PDF datasheets (via PyMuPDF, pdfplumber, or Textract) - Used AI or NLP (e.g., OpenAI, spaCy, LangChain) to extract or normalize product specs - Built monitoring and QA systems for scraping success, accuracy, and uptime - Familiar with proxy management, stealth browsers, and bot-detection evasion - Bonus: Industrial product or part catalog experience ️ Tools & Stack (Or Bring Your Own) - Scraping: Playwright, Puppeteer, Requests, BeautifulSoup - AI/NLP: OpenAI, spaCy, LangChain, regex transformers - PDF/OCR: PyMuPDF, pdfplumber, Tesseract, AWS Textract - Storage & ELT: PostgreSQL, S3, JSON, CSV, Elasticsearch - Automation: Python, Git, Docker, CI/CD (basic) Deliverables - Automated, modular scraping flows across hundreds of vendor websites - Structured product data: model numbers, specs, dimensions, materials, categories - Failover logic and self-repairing selector strategies - Quality scoring, logging, and alerts for broken or incomplete extractions


  • Web Designer

    3 days ago


    Mysore, India Digital Spike Technologies Full time

    **Web Designer** Minimum 3 years of experience as a web designer. Knowledge of HTML, CSS, jQuery, JavaScript WordPress Development Experience is a must. Register web domain names and organize the hosting of the website. Designing responsive landing pages. Optimizing sites for maximum speed and scalability. Work with different content management...


  • Mysore, India Amphenol Full time

    Overview We are seeking a highly motivated AI & Data Analytics Engineer to join our Internal Audit department. This role will play a critical part in modernizing and streamlining our testing processes through automation, artificial intelligence (AI), advanced analytics, and innovative coding solutions. The Analyst will collaborate with auditors and...

  • Data Engineer

    2 weeks ago


    Mysore, India Fiery Full time

    Fiery LLC is the leading provider of Digital Front Ends (DFEs) and workflow solutions for the growing industrial and graphic arts print industries. Fiery is leading the transformation from analog to digital imaging with scalable, digital, award-winning products for the printing industry. Based in Silicon Valley, California with offices around the world and a...

  • Data Engineer

    2 weeks ago


    Mysore, India Whatjobs IN C2 Full time

    Role : Data Engineer Location : Remote Shift Timing : 2:00 Pm - 11:00 Pm Experience : 2- 4 years relevant Experience only ( this is a Junior position with us ) Must have skillset : GCP - 2 years minimum working Experience Python and Pyspark - 2 years SQL - 2 years Excellent communication Worked with global stakeholders Who we are: Randstad Sourceright’s...


  • India, Mysore Optimum Data Analytics Full time

    Job Description Optimum Data Analytics is a strategic technology partner delivering reliable turn-key AI solutions. Our streamlined approach to development ensures high-quality results and client satisfaction. We bring experience and clarity to organizations, powering every human decision with analytics & AI. Our team consists of statisticians, computer...

  • Software Engineer

    3 weeks ago


    Mysore, India Grantify Full time

    Company Description Grantify is an innovative education platform that streamlines the university admissions process through a transparent and data-driven ecosystem. By aligning student budgets and academic goals with tailored offers from universities, Grantify makes higher education more accessible and affordable. The platform enhances student...


  • mysore, India Intuit Full time

    Overview:At Intuit, we are a mission-driven, global financial technology platform dedicated to powering prosperity around the world. We serve approximately 100 million consumers, small businesses, and self-employed individuals through our ecosystem of iconic products: TurboTax, QuickBooks, Credit Karma, and Mailchimp. As a Senior Staff Software Engineer, you...


  • India, Mysore Uplers Full time

    Job Description Experience: 3.00 + years Salary: INR 2500000-4500000 / year (based on experience) Shift: (GMT+05:30) Asia/Kolkata (IST) Opportunity Type: Remote Placement Type: Full time Permanent Position (*Note: This is a requirement for one of Uplers client - Nuaav) What do you need for this opportunity Must have skills required: Snowflake, Snowflake SQL,...


  • Mysore, Karnataka, India LTIMindtree Full time

    Company DescriptionLTIMindtree is a global technology consulting and digital solutions company that helps enterprises across various industries to reimagine business models, accelerate innovation, and maximize growth using digital technologies. With over 700 clients worldwide, LTIMindtree offers extensive domain and technology expertise to drive superior...

  • Senior Data Engineer

    3 weeks ago


    Mysore, India Related Account Propertyfinder FZ LLC Full time

    DUBAI BASED ROLE. Relocation would be required but with highly competitive, tax free salary package. Company Profile: - Property Finder is the leading digital real estate platform in the Middle East and North Africa region. - A UAE-born startup, Property Finder expanded its operations to Qatar, Bahrain, Saudi Arabia, Egypt and Turkey over the years - The...