Project

General

Profile

Actions

Feature DOMAINCLAW-3

closed
RA NK

Epic DOMAINCLAW-1: Mail-Hound Prototype — Domain Probe, Redirect Tracking, Contact Extraction

Export CSV/JSON and Per-Run Logging

Feature DOMAINCLAW-3: Export CSV/JSON and Per-Run Logging

Added by Redmine Admin about 1 month ago. Updated about 1 month ago.

Status:
Closed
Priority:
Normal
Start date:
05/08/2026
Due date:
% Done:

100%

Estimated time:

Description

Objective

Run a deeper crawl only on the domains selected from the fast precheck result and extract useful crawl/contact data.

Description

Implement the deep crawl flow for selected domains.

The crawler should collect and display:

  • Crawled pages
  • Redirect events
  • Contact information
  • Extracted email addresses

The existing redirect rules should be respected, including:

  • Recording cross-host redirects.
  • Handling origin pages correctly when a soft redirect is detected.
  • Continuing to scrape emails from the origin page if the origin still serves valid content.

Acceptance Criteria

  • The user can start a deep crawl using the selected domains.
  • The crawl result is displayed in the related UI tabs.
  • The system shows data for:
    • Per-domain summary
    • Pages
    • Redirects
    • Contacts
  • Extracted emails or contact records are shown when available.
  • Redirects are recorded according to the existing redirect rules.
  • The result reflects the domains selected by the user.

Definition of Done

  • The deep crawl flow runs end-to-end without blocker errors.
  • Crawl results are visible in the UI.
  • Contact/email extraction works for valid pages.
  • Redirect data is captured consistently.
  • A crawl failure on one domain does not stop the entire run.

Sub-task 3: Export CSV/JSON and Per-Run Logging

Type

Feature / Technical

Estimate

1 SP

Objective

Store every crawl run in a structured and traceable format so that users can review, debug, or share the results after the run is complete.

Description

Each run should generate a unique run ID.

For every run, the system should create an output folder using the following structure:

exports/<run_id>/

The following export files must be generated:

  • summary.csv
  • pages.csv
  • redirects.csv
  • contacts.csv
  • results.json

A separate log file should also be created for each run under:

logs/

The exported data should match what is shown in the UI.

Acceptance Criteria

  • Each run creates a unique output folder under exports/<run_id>/.
  • The following required files are created after each run:
    • summary.csv
    • pages.csv
    • redirects.csv
    • contacts.csv
    • results.json
  • A dedicated log file is created for each run.
  • Exported CSV/JSON data can be opened and read successfully.
  • Exported data matches the data displayed in the UI.
  • Logs contain enough information to trace errors or failed domains.

Definition of Done

  • Output folder structure is stable and easy to inspect.
  • Export files are not lost when the UI session ends.
  • Logs are persisted per run.
  • A third party can review the exported files without needing access to the running UI.
Actions

Also available in: PDF Atom