Project

General

Profile

Actions

Feature DOMAINCLAW-5

closed
RA NK

Epic DOMAINCLAW-1: Mail-Hound Prototype — Domain Probe, Redirect Tracking, Contact Extraction

Deep Crawl Selected Domains and Extract Emails

Feature DOMAINCLAW-5: Deep Crawl Selected Domains and Extract Emails

Added by Redmine Admin about 1 month ago. Updated about 1 month ago.

Status:
Closed
Priority:
Normal
Start date:
05/08/2026
Due date:
% Done:

100%

Estimated time:

Description

Description

Implement the deep crawl flow for selected domains.

The crawler should collect and display:

  • Crawled pages
  • Redirect events
  • Contact information
  • Extracted email addresses

The existing redirect rules should be respected, including:

  • Recording cross-host redirects.
  • Handling origin pages correctly when a soft redirect is detected.
  • Continuing to scrape emails from the origin page if the origin still serves valid content.

Acceptance Criteria

  • The user can start a deep crawl using the selected domains.
  • The crawl result is displayed in the related UI tabs.
  • The system shows data for:
    • Per-domain summary
    • Pages
    • Redirects
    • Contacts
  • Extracted emails or contact records are shown when available.
  • Redirects are recorded according to the existing redirect rules.
  • The result reflects the domains selected by the user.

Definition of Done

  • The deep crawl flow runs end-to-end without blocker errors.
  • Crawl results are visible in the UI.
  • Contact/email extraction works for valid pages.
  • Redirect data is captured consistently.
  • A crawl failure on one domain does not stop the entire run.
Actions

Also available in: PDF Atom