- State Monitoring: Engineered a background "Sweeper" service that queried the database for any transactions remaining in "Scanning" status beyond a 15-minute SLA.
- Verification: The service performed a "Health Check" against Azure Blob Metadata to confirm if the Malware Scan had actually completed but failed to fire an event.
- Automated Recovery: Implemented a patch mechanism to manually inject the missing state change, re-triggering the downstream Event Hubs logic without user re-uploads.
- Outcome: Achieved 100% processing reliability and eliminated manual support tickets for "stuck" video analysis.
Case Study | Rakuten Group
Watchdog Resiliency
Mitigating "Silent Failures" in cloud-native pipelines through automated recovery and background state synchronization.
The Challenge
The video ingestion pipeline relied on Microsoft Defender for Storage to scan files for malware before processing. Seldom, random files would be uploaded but never triggered an event in the system.
This resulted in "Silent Failures" where transactions were stuck in a permanent "Scanning" state, requiring manual intervention to re-upload files and restart the flow.
Architecture
Abstracted view of the Self-Healing "Watchdog" Pattern.
Storage Layer
Azure Blob Storage + Defender Antivirus
The Gap
Missing Malware Scan Events
Watchdog Service
Background "Sweeper" triggered by CRON
Auto-Recovery
Metadata Patch & Flow Re-Trigger