Refactor MLS sync to Active/Pending only with on-demand media
Major changes to sync strategy following MLS Grid best practices: - Initial sync now fetches only Active/Pending properties (~30K vs 1.3M) - Replication (incremental) fetches all changes, deletes non-Active/Pending - On-demand media fetching replaces background queue (avoids rate limits) - Media downloaded and cached when first viewed, not during sync - Updated CLI commands: wp mls media status/fetch/clear - Comprehensive documentation with troubleshooting guide This fixes the "Value out of range" API error caused by high $skip values. Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -17,9 +17,8 @@ All tables use `{$wpdb->prefix}mls_` prefix:
|
||||
|
||||
| Table | Purpose |
|
||||
|-------|---------|
|
||||
| `mls_properties` | Listing data |
|
||||
| `mls_media` | Media files with download queue |
|
||||
| `mls_media_log` | Media download attempt history |
|
||||
| `mls_properties` | Listing data (Active/Pending only) |
|
||||
| `mls_media` | Media metadata and cache status |
|
||||
| `mls_sync_state` | Sync progress tracking |
|
||||
| `mls_rate_limits` | API usage tracking |
|
||||
| `mls_sync_log` | Debug logging |
|
||||
@@ -40,7 +39,7 @@ MUST comply with these limits:
|
||||
- 40,000 requests/day
|
||||
- 4GB data/hour
|
||||
|
||||
Media downloads use 700ms delay (25% buffer) between requests.
|
||||
**Important**: The API rejects `$skip` values over ~80,000. Always use `@odata.nextLink` for pagination, never manual `$skip`.
|
||||
|
||||
### Key Files
|
||||
|
||||
@@ -48,7 +47,7 @@ Media downloads use 700ms delay (25% buffer) between requests.
|
||||
|------|---------|
|
||||
| `includes/class-mls-api-client.php` | API communication, auth, gzip |
|
||||
| `includes/class-mls-sync-engine.php` | Sync orchestration |
|
||||
| `includes/class-mls-media-handler.php` | Media queue and download |
|
||||
| `includes/class-mls-media-handler.php` | On-demand media fetch and cache |
|
||||
| `includes/class-mls-query.php` | Public query API |
|
||||
| `includes/class-mls-rate-limiter.php` | Rate limit compliance |
|
||||
| `cli/class-mls-cli.php` | WP-CLI commands |
|
||||
@@ -64,18 +63,16 @@ wp mls test auth
|
||||
wp mls status
|
||||
wp mls status rate-limits
|
||||
|
||||
# Run property sync (queues media, does not download)
|
||||
wp mls sync full [--dry-run] [--limit=N] [--verbose]
|
||||
wp mls sync incremental [--dry-run] [--verbose]
|
||||
# Run property sync
|
||||
wp mls sync full [--dry-run] [--limit=N] [--verbose] # Initial: Active/Pending only
|
||||
wp mls sync incremental [--dry-run] [--verbose] # Replication: all changes
|
||||
wp mls sync resume --id=<sync_id>
|
||||
|
||||
# Media download queue (separate from property sync)
|
||||
wp mls media status # Show queue stats
|
||||
wp mls media process # Download queued media (rate limited)
|
||||
wp mls media process --limit=50 --verbose
|
||||
wp mls media reset # Reset failed downloads for retry
|
||||
wp mls media logs # View download history
|
||||
wp mls media logs --clear --days=7
|
||||
# Media cache (images fetched on-demand when viewed)
|
||||
wp mls media status # Show cache statistics
|
||||
wp mls media fetch --listing=<key> # Pre-cache images for a listing
|
||||
wp mls media fetch --listing=<key> --limit=10 # Fetch up to 10 images
|
||||
wp mls media clear --listing=<key> # Clear cached images for re-fetch
|
||||
|
||||
# Statistics
|
||||
wp mls stats
|
||||
@@ -83,9 +80,6 @@ wp mls stats
|
||||
# Cache management
|
||||
wp mls cache clear --confirm
|
||||
wp mls cache cleanup
|
||||
wp mls cache missing # View failed media downloads
|
||||
wp mls cache missing --limit=20 # View first 20 entries
|
||||
wp mls cache missing --clear # Clear the log
|
||||
|
||||
# Recovery commands
|
||||
wp mls recovery list # Show resumable syncs
|
||||
@@ -93,26 +87,64 @@ wp mls recovery auto # Auto-resume most recent failed sync
|
||||
wp mls recovery cleanup # Mark stale (>1hr) syncs as failed
|
||||
```
|
||||
|
||||
### Media Queue System
|
||||
## Sync Strategy (IMPORTANT)
|
||||
|
||||
Media downloads are now queue-based and separate from property sync:
|
||||
The sync follows MLS Grid best practices for replication:
|
||||
|
||||
1. **Property sync** (`wp mls sync full/incremental`) queues media records
|
||||
2. **Media process** (`wp mls media process`) downloads queued media with rate limiting
|
||||
3. Downloads are rate-limited to 700ms between requests (under 2/sec limit)
|
||||
4. Failed downloads get 3-hour backoff before retry
|
||||
5. After 5 attempts, items are marked failed and logged
|
||||
### Initial Import (`wp mls sync full`)
|
||||
|
||||
**Queue states:**
|
||||
- `pending` - Ready for download
|
||||
- `completed` - Successfully downloaded
|
||||
- `failed` - Max attempts reached
|
||||
- Fetches ONLY `Active` and `Pending` properties
|
||||
- Filter: `MlgCanView eq true and (StandardStatus eq 'Active' or StandardStatus eq 'Pending')`
|
||||
- Uses `@odata.nextLink` for pagination (NOT `$skip`)
|
||||
- Stores media metadata but does NOT download images
|
||||
- ~30,000 records for NorthStar MLS (vs 1.3M total including Closed)
|
||||
|
||||
**Media table columns:**
|
||||
- `download_status` - pending/completed/failed
|
||||
- `retry_after` - Next retry time (3hr backoff on rate limit)
|
||||
- `queued_at` - When item was queued
|
||||
- `download_attempts` - Attempt count (max 5)
|
||||
### Replication (`wp mls sync incremental`)
|
||||
|
||||
- Fetches ALL properties modified since last sync
|
||||
- NO filter on `MlgCanView` or `StandardStatus` - we need to see changes
|
||||
- For each record received:
|
||||
- If `MlgCanView = false` -> DELETE from local DB
|
||||
- If `StandardStatus` not in (Active, Pending) -> DELETE from local DB
|
||||
- Otherwise -> INSERT or UPDATE
|
||||
- This handles: new listings, price changes, status changes (Active->Sold), removals
|
||||
|
||||
### Why This Approach?
|
||||
|
||||
1. **MLS Grid API limits `$skip` to ~80,000** - bulk scanning all 1.3M records fails
|
||||
2. **We only care about available properties** - no need to store Closed/Sold
|
||||
3. **Replication is efficient** - only fetches changed records
|
||||
4. **Proper deletion handling** - when a property sells, we remove it
|
||||
|
||||
### Data Flow
|
||||
|
||||
```
|
||||
Initial Import:
|
||||
API (Active/Pending + MlgCanView=true) -> Local DB
|
||||
|
||||
Replication (every 15 min):
|
||||
API (ModificationTimestamp > last_sync) -> Check each record:
|
||||
- MlgCanView=false OR Status!=Active/Pending -> DELETE locally
|
||||
- Otherwise -> UPSERT locally
|
||||
```
|
||||
|
||||
## Media System (On-Demand Fetching)
|
||||
|
||||
Per MLS Grid rules, media URLs must NOT be used directly on websites. Images must be downloaded and served from our own server.
|
||||
|
||||
**How it works:**
|
||||
1. **Property sync** stores media metadata (URLs, keys, order) but does NOT download images
|
||||
2. **On-demand fetch**: When `mls_get_property_image()` is called, the image is fetched and cached locally
|
||||
3. **Subsequent requests** serve from local cache
|
||||
4. **Pre-caching**: Use `wp mls media fetch --listing=<key>` to pre-cache specific listings
|
||||
|
||||
**Benefits:**
|
||||
- No rate limit issues from bulk downloading
|
||||
- Images cached only when needed (saves bandwidth/storage)
|
||||
- Automatic re-fetch if cache is cleared
|
||||
- Works with MLS Grid's image URL expiration
|
||||
|
||||
**Cache location:** `wp-content/uploads/mls-listings/{prefix}/{listing_key}/`
|
||||
|
||||
### Progress Output
|
||||
|
||||
@@ -121,23 +153,10 @@ Property sync (compact mode):
|
||||
- `#` = property updated
|
||||
- `x` = property deleted
|
||||
- `-` = skipped (dry-run)
|
||||
- `q` = media queued
|
||||
- `p` = media skipped (already downloaded)
|
||||
- `|` = page complete
|
||||
|
||||
Media process (compact mode):
|
||||
- `P` = downloaded
|
||||
- `B` = backoff (retry later)
|
||||
- `E` = error
|
||||
|
||||
With --verbose: Full timestamped output.
|
||||
|
||||
### Missing Media Log
|
||||
|
||||
Permanently failed media downloads logged to: `wp-content/uploads/mls-missing-media.log`
|
||||
|
||||
Format: `[timestamp] listing_key | media_key | error | url`
|
||||
|
||||
### Sync Recovery
|
||||
|
||||
The sync engine saves progress after each page:
|
||||
@@ -152,16 +171,15 @@ The sync engine saves progress after each page:
|
||||
### Recommended Cron Setup
|
||||
|
||||
```bash
|
||||
# Property sync every 30 minutes
|
||||
*/30 * * * * cd /var/www/html && wp mls recovery auto --quiet && wp mls sync incremental --allow-root >> /var/log/mls-sync.log 2>&1
|
||||
# Replication sync every 15 minutes (MLS Grid recommended)
|
||||
*/15 * * * * cd /var/www/html && wp mls sync incremental --allow-root >> /var/log/mls-sync.log 2>&1
|
||||
|
||||
# Media downloads every 5 minutes (processes up to 50 items per run)
|
||||
*/5 * * * * cd /var/www/html && wp mls media process --limit=50 --quiet --allow-root >> /var/log/mls-media.log 2>&1
|
||||
|
||||
# Full sync weekly (Sunday 3am)
|
||||
0 3 * * 0 cd /var/www/html && wp mls sync full --allow-root >> /var/log/mls-sync.log 2>&1
|
||||
# Full re-sync weekly (Sunday 3am) - rebuilds from scratch
|
||||
0 3 * * 0 cd /var/www/html && wp mls cache clear --confirm --allow-root && wp mls sync full --allow-root >> /var/log/mls-sync.log 2>&1
|
||||
```
|
||||
|
||||
Note: No separate media cron needed - images are fetched on-demand when properties are viewed.
|
||||
|
||||
### Public API Functions
|
||||
|
||||
Available for themes/plugins:
|
||||
@@ -178,9 +196,19 @@ $properties = mls_get_properties([
|
||||
// Get single property
|
||||
$property = mls_get_property('NST123456');
|
||||
|
||||
// Get media
|
||||
// Get media (on-demand fetching)
|
||||
$image_url = mls_get_property_image('NST123456'); // Fetches if not cached
|
||||
$image_url = mls_get_property_image('NST123456', false); // Return null if not cached
|
||||
|
||||
// Get all images (fetches first N on demand)
|
||||
$images = mls_get_property_images('NST123456'); // Fetches first 1 if uncached
|
||||
$images = mls_get_property_images('NST123456', 5); // Fetches first 5 if uncached
|
||||
|
||||
// Get media metadata (no fetch)
|
||||
$media = mls_get_property_media('NST123456');
|
||||
$image_url = mls_get_property_image('NST123456');
|
||||
|
||||
// Get cache statistics
|
||||
$stats = mls_get_cache_stats(); // Returns total_media, cached, uncached counts
|
||||
|
||||
// Get distinct values
|
||||
$cities = mls_get_cities('Active');
|
||||
@@ -189,20 +217,12 @@ $cities = mls_get_cities('Active');
|
||||
if (mls_is_available()) { ... }
|
||||
```
|
||||
|
||||
### Sync Strategy
|
||||
|
||||
1. **Property Sync**: Full/incremental sync downloads property data and queues media
|
||||
2. **Media Queue**: Separate process downloads media with rate limiting
|
||||
3. **Delete Handling**: MlgCanView=false triggers local deletion
|
||||
4. **Media Storage**: Downloads to wp-content/uploads/mls-listings/
|
||||
5. **Recovery**: Stores last_next_link for resume on failure
|
||||
|
||||
### Testing After Changes
|
||||
|
||||
```bash
|
||||
wp mls test connection
|
||||
wp mls test auth
|
||||
wp mls sync full --dry-run --limit=10
|
||||
wp mls sync full --dry-run --limit=10 --verbose
|
||||
wp mls media status
|
||||
wp mls stats
|
||||
```
|
||||
@@ -226,3 +246,28 @@ Key fields from API to database:
|
||||
| MlgCanView | mlg_can_view |
|
||||
|
||||
Full API response stored in `raw_data` column as JSON.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "Value out of range" error
|
||||
The API is rejecting a high `$skip` value. This means pagination broke. Clear data and re-run initial sync:
|
||||
```bash
|
||||
wp mls cache clear --confirm --allow-root
|
||||
wp mls sync full --allow-root
|
||||
```
|
||||
|
||||
### All properties showing as "Sold"
|
||||
The initial sync was run without the Active/Pending filter. Clear and re-sync:
|
||||
```bash
|
||||
wp mls cache clear --confirm --allow-root
|
||||
wp mls sync full --allow-root
|
||||
```
|
||||
|
||||
### Media not loading
|
||||
Images are fetched on-demand. Check:
|
||||
1. `wp mls media status` - see cache stats
|
||||
2. `wp mls media fetch --listing=<key>` - manually fetch for a listing
|
||||
3. Check `wp-content/uploads/mls-listings/` directory permissions
|
||||
|
||||
### Sync taking too long
|
||||
Initial sync of ~30K Active/Pending properties takes about 30-45 minutes. Use `--verbose` to see progress.
|
||||
|
||||
Reference in New Issue
Block a user