Case Study | Rakuten Group

Watchdog Resiliency

Mitigating "Silent Failures" in cloud-native pipelines through automated recovery and background state synchronization.

The Challenge

The video ingestion pipeline relied on Microsoft Defender for Storage to scan files for malware before processing. Seldom, random files would be uploaded but never triggered an event in the system.

This resulted in "Silent Failures" where transactions were stuck in a permanent "Scanning" state, requiring manual intervention to re-upload files and restart the flow.

Architecture

Abstracted view of the Self-Healing "Watchdog" Pattern.

Storage Layer

Azure Blob Storage + Defender Antivirus

?

The Gap

Missing Malware Scan Events

Watchdog Service

Background "Sweeper" triggered by CRON

Auto-Recovery

Metadata Patch & Flow Re-Trigger

The Solution

Watchdog Pattern

Self-Healing Implementation

  • State Monitoring: Engineered a background "Sweeper" service that queried the database for any transactions remaining in "Scanning" status beyond a 15-minute SLA.
  • Verification: The service performed a "Health Check" against Azure Blob Metadata to confirm if the Malware Scan had actually completed but failed to fire an event.
  • Automated Recovery: Implemented a patch mechanism to manually inject the missing state change, re-triggering the downstream Event Hubs logic without user re-uploads.
  • Outcome: Achieved 100% processing reliability and eliminated manual support tickets for "stuck" video analysis.