Add queue-based media download system with rate limiting

- Add download_status, retry_after, queued_at columns to mls_media table
- Add mls_media_log table for download attempt tracking
- Rewrite media handler to queue downloads instead of immediate download
- Add 700ms delay between downloads (25% buffer over 2/sec limit)
- Add 3-hour backoff for rate-limited (429) responses
- Add max 5 attempts before marking as permanently failed
- Add wp mls media command: status, process, reset, logs
- Deprecate wp mls sync media in favor of wp mls media process
- Update documentation with queue system details and cron examples

Media downloads are now separate from property sync:
1. wp mls sync full/incremental - syncs properties, queues media
2. wp mls media process - downloads queued media with rate limiting

Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Hanson.xyz Dev
2025-12-14 22:52:58 -06:00
parent b62867d834
commit 6eadf3d266
5 changed files with 930 additions and 334 deletions
@@ -18,7 +18,8 @@ All tables use `{$wpdb->prefix}mls_` prefix:
| Table | Purpose |
|-------|---------|
| `mls_properties` | Listing data |
| `mls_media` | Media files |
| `mls_media` | Media files with download queue |
| `mls_media_log` | Media download attempt history |
| `mls_sync_state` | Sync progress tracking |
| `mls_rate_limits` | API usage tracking |
| `mls_sync_log` | Debug logging |
@@ -34,18 +35,20 @@ define('MLSGRID_ACCESS_TOKEN', 'your-token-here');
### MLS Grid API Rate Limits
MUST comply with these limits:
- 2 requests/second
- 2 requests/second (500ms minimum between requests)
- 7,200 requests/hour
- 40,000 requests/day
- 4GB data/hour
Media downloads use 700ms delay (25% buffer) between requests.
### Key Files
| File | Purpose |
|------|---------|
| `includes/class-mls-api-client.php` | API communication, auth, gzip |
| `includes/class-mls-sync-engine.php` | Sync orchestration |
| `includes/class-mls-media-handler.php` | Media download/storage |
| `includes/class-mls-media-handler.php` | Media queue and download |
| `includes/class-mls-query.php` | Public query API |
| `includes/class-mls-rate-limiter.php` | Rate limit compliance |
| `cli/class-mls-cli.php` | WP-CLI commands |
@@ -61,12 +64,19 @@ wp mls test auth
wp mls status
wp mls status rate-limits
# Run sync (use --verbose for detailed output)
# Run property sync (queues media, does not download)
wp mls sync full [--dry-run] [--limit=N] [--verbose]
wp mls sync incremental [--dry-run] [--verbose]
wp mls sync media [--limit=N] [--verbose]
wp mls sync resume --id=<sync_id>
# Media download queue (separate from property sync)
wp mls media status # Show queue stats
wp mls media process # Download queued media (rate limited)
wp mls media process --limit=50 --verbose
wp mls media reset # Reset failed downloads for retry
wp mls media logs # View download history
wp mls media logs --clear --days=7
# Statistics
wp mls stats
@@ -83,31 +93,54 @@ wp mls recovery auto # Auto-resume most recent failed sync
wp mls recovery cleanup # Mark stale (>1hr) syncs as failed
```
### Media Queue System
Media downloads are now queue-based and separate from property sync:
1. **Property sync** (`wp mls sync full/incremental`) queues media records
2. **Media process** (`wp mls media process`) downloads queued media with rate limiting
3. Downloads are rate-limited to 700ms between requests (under 2/sec limit)
4. Failed downloads get 3-hour backoff before retry
5. After 5 attempts, items are marked failed and logged
**Queue states:**
- `pending` - Ready for download
- `completed` - Successfully downloaded
- `failed` - Max attempts reached
**Media table columns:**
- `download_status` - pending/completed/failed
- `retry_after` - Next retry time (3hr backoff on rate limit)
- `queued_at` - When item was queued
- `download_attempts` - Attempt count (max 5)
### Progress Output
Without --verbose (compact mode):
Property sync (compact mode):
- `.` = new property created
- `#` = property updated
- `x` = property deleted
- `-` = skipped (dry-run)
- `P` = photo downloaded
- `p` = photo skipped (already exists)
- `E` = photo error
- `q` = media queued
- `p` = media skipped (already downloaded)
- `|` = page complete
With --verbose: Full timestamped output showing API requests, responses, and individual item status.
Media process (compact mode):
- `P` = downloaded
- `B` = backoff (retry later)
- `E` = error
With --verbose: Full timestamped output.
### Missing Media Log
Failed media downloads are logged to: `wp-content/uploads/mls-missing-media.log`
Permanently failed media downloads logged to: `wp-content/uploads/mls-missing-media.log`
Format: `[timestamp] listing_key | media_key | error | url`
Media downloads use exponential backoff (1s, 2s, 4s, 8s, 16s) for rate limit (429) and server errors (5xx).
### Sync Recovery
The sync engine saves progress after each page, allowing interrupted syncs to resume:
The sync engine saves progress after each page:
1. **Automatic state tracking**: `last_next_link` saved after each API page
2. **Stale sync detection**: Syncs running >1 hour marked as failed
@@ -116,9 +149,17 @@ The sync engine saves progress after each page, allowing interrupted syncs to re
- `wp mls recovery auto` - Auto-resume most recent failed sync
- `wp mls recovery list` - View all resumable syncs
For cron jobs, consider adding recovery at the start:
### Recommended Cron Setup
```bash
wp mls recovery auto --quiet && wp mls sync incremental
# Property sync every 30 minutes
*/30 * * * * cd /var/www/html && wp mls recovery auto --quiet && wp mls sync incremental --allow-root >> /var/log/mls-sync.log 2>&1
# Media downloads every 5 minutes (processes up to 50 items per run)
*/5 * * * * cd /var/www/html && wp mls media process --limit=50 --quiet --allow-root >> /var/log/mls-media.log 2>&1
# Full sync weekly (Sunday 3am)
0 3 * * 0 cd /var/www/html && wp mls sync full --allow-root >> /var/log/mls-sync.log 2>&1
```
### Public API Functions
@@ -150,10 +191,10 @@ if (mls_is_available()) { ... }
### Sync Strategy
1. **Initial Import**: Full sync downloads all viewable properties
2. **Incremental**: Uses ModificationTimestamp to fetch only changes
1. **Property Sync**: Full/incremental sync downloads property data and queues media
2. **Media Queue**: Separate process downloads media with rate limiting
3. **Delete Handling**: MlgCanView=false triggers local deletion
4. **Media**: Downloads to wp-content/uploads/mls-listings/
4. **Media Storage**: Downloads to wp-content/uploads/mls-listings/
5. **Recovery**: Stores last_next_link for resume on failure
### Testing After Changes
@@ -162,6 +203,7 @@ if (mls_is_available()) { ... }
wp mls test connection
wp mls test auth
wp mls sync full --dry-run --limit=10
wp mls media status
wp mls stats
```