Actions
Feature DOMAINCLAW-5
closed
RA
NK
Epic DOMAINCLAW-1: Mail-Hound Prototype — Domain Probe, Redirect Tracking, Contact Extraction
Deep Crawl Selected Domains and Extract Emails
Feature DOMAINCLAW-5:
Deep Crawl Selected Domains and Extract Emails
Start date:
05/08/2026
Due date:
% Done:
100%
Estimated time:
Description
Description¶
Implement the deep crawl flow for selected domains.
The crawler should collect and display:
- Crawled pages
- Redirect events
- Contact information
- Extracted email addresses
The existing redirect rules should be respected, including:
- Recording cross-host redirects.
- Handling origin pages correctly when a soft redirect is detected.
- Continuing to scrape emails from the origin page if the origin still serves valid content.
Acceptance Criteria¶
- The user can start a deep crawl using the selected domains.
- The crawl result is displayed in the related UI tabs.
- The system shows data for:
- Per-domain summary
- Pages
- Redirects
- Contacts
- Extracted emails or contact records are shown when available.
- Redirects are recorded according to the existing redirect rules.
- The result reflects the domains selected by the user.
Definition of Done¶
- The deep crawl flow runs end-to-end without blocker errors.
- Crawl results are visible in the UI.
- Contact/email extraction works for valid pages.
- Redirect data is captured consistently.
- A crawl failure on one domain does not stop the entire run.
RA Updated by Redmine Admin about 1 month ago
- Assignee set to Nguyen tuan kiet
RA Updated by Redmine Admin about 1 month ago
- % Done changed from 0 to 100
NK Updated by Nguyen tuan kiet about 1 month ago
- Status changed from New to Closed
Actions