b9cddd2f64
Major changes to sync strategy following MLS Grid best practices: - Initial sync now fetches only Active/Pending properties (~30K vs 1.3M) - Replication (incremental) fetches all changes, deletes non-Active/Pending - On-demand media fetching replaces background queue (avoids rate limits) - Media downloaded and cached when first viewed, not during sync - Updated CLI commands: wp mls media status/fetch/clear - Comprehensive documentation with troubleshooting guide This fixes the "Value out of range" API error caused by high $skip values. Co-Authored-By: Claude <noreply@anthropic.com>
274 lines
8.7 KiB
Markdown
274 lines
8.7 KiB
Markdown
# MLS by HansonXyz Plugin
|
|
|
|
WordPress plugin for syncing MLS Grid API data (NorthStar MLS) into local database.
|
|
|
|
## Development Rules
|
|
|
|
1. **No emojis** - nowhere in code, commits, docs, or conversation
|
|
2. **PHP 7.4+** compatible code
|
|
3. **WordPress Coding Standards**
|
|
4. Follow patterns from existing HomeProz theme
|
|
|
|
## Quick Reference
|
|
|
|
### Database Tables
|
|
|
|
All tables use `{$wpdb->prefix}mls_` prefix:
|
|
|
|
| Table | Purpose |
|
|
|-------|---------|
|
|
| `mls_properties` | Listing data (Active/Pending only) |
|
|
| `mls_media` | Media metadata and cache status |
|
|
| `mls_sync_state` | Sync progress tracking |
|
|
| `mls_rate_limits` | API usage tracking |
|
|
| `mls_sync_log` | Debug logging |
|
|
|
|
### API Configuration
|
|
|
|
Credentials in wp-config.php:
|
|
```php
|
|
define('MLSGRID_API_URL', 'https://api.mlsgrid.com/v2');
|
|
define('MLSGRID_ACCESS_TOKEN', 'your-token-here');
|
|
```
|
|
|
|
### MLS Grid API Rate Limits
|
|
|
|
MUST comply with these limits:
|
|
- 2 requests/second (500ms minimum between requests)
|
|
- 7,200 requests/hour
|
|
- 40,000 requests/day
|
|
- 4GB data/hour
|
|
|
|
**Important**: The API rejects `$skip` values over ~80,000. Always use `@odata.nextLink` for pagination, never manual `$skip`.
|
|
|
|
### Key Files
|
|
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| `includes/class-mls-api-client.php` | API communication, auth, gzip |
|
|
| `includes/class-mls-sync-engine.php` | Sync orchestration |
|
|
| `includes/class-mls-media-handler.php` | On-demand media fetch and cache |
|
|
| `includes/class-mls-query.php` | Public query API |
|
|
| `includes/class-mls-rate-limiter.php` | Rate limit compliance |
|
|
| `cli/class-mls-cli.php` | WP-CLI commands |
|
|
|
|
### WP-CLI Commands
|
|
|
|
```bash
|
|
# Test connectivity
|
|
wp mls test connection
|
|
wp mls test auth
|
|
|
|
# Show status
|
|
wp mls status
|
|
wp mls status rate-limits
|
|
|
|
# Run property sync
|
|
wp mls sync full [--dry-run] [--limit=N] [--verbose] # Initial: Active/Pending only
|
|
wp mls sync incremental [--dry-run] [--verbose] # Replication: all changes
|
|
wp mls sync resume --id=<sync_id>
|
|
|
|
# Media cache (images fetched on-demand when viewed)
|
|
wp mls media status # Show cache statistics
|
|
wp mls media fetch --listing=<key> # Pre-cache images for a listing
|
|
wp mls media fetch --listing=<key> --limit=10 # Fetch up to 10 images
|
|
wp mls media clear --listing=<key> # Clear cached images for re-fetch
|
|
|
|
# Statistics
|
|
wp mls stats
|
|
|
|
# Cache management
|
|
wp mls cache clear --confirm
|
|
wp mls cache cleanup
|
|
|
|
# Recovery commands
|
|
wp mls recovery list # Show resumable syncs
|
|
wp mls recovery auto # Auto-resume most recent failed sync
|
|
wp mls recovery cleanup # Mark stale (>1hr) syncs as failed
|
|
```
|
|
|
|
## Sync Strategy (IMPORTANT)
|
|
|
|
The sync follows MLS Grid best practices for replication:
|
|
|
|
### Initial Import (`wp mls sync full`)
|
|
|
|
- Fetches ONLY `Active` and `Pending` properties
|
|
- Filter: `MlgCanView eq true and (StandardStatus eq 'Active' or StandardStatus eq 'Pending')`
|
|
- Uses `@odata.nextLink` for pagination (NOT `$skip`)
|
|
- Stores media metadata but does NOT download images
|
|
- ~30,000 records for NorthStar MLS (vs 1.3M total including Closed)
|
|
|
|
### Replication (`wp mls sync incremental`)
|
|
|
|
- Fetches ALL properties modified since last sync
|
|
- NO filter on `MlgCanView` or `StandardStatus` - we need to see changes
|
|
- For each record received:
|
|
- If `MlgCanView = false` -> DELETE from local DB
|
|
- If `StandardStatus` not in (Active, Pending) -> DELETE from local DB
|
|
- Otherwise -> INSERT or UPDATE
|
|
- This handles: new listings, price changes, status changes (Active->Sold), removals
|
|
|
|
### Why This Approach?
|
|
|
|
1. **MLS Grid API limits `$skip` to ~80,000** - bulk scanning all 1.3M records fails
|
|
2. **We only care about available properties** - no need to store Closed/Sold
|
|
3. **Replication is efficient** - only fetches changed records
|
|
4. **Proper deletion handling** - when a property sells, we remove it
|
|
|
|
### Data Flow
|
|
|
|
```
|
|
Initial Import:
|
|
API (Active/Pending + MlgCanView=true) -> Local DB
|
|
|
|
Replication (every 15 min):
|
|
API (ModificationTimestamp > last_sync) -> Check each record:
|
|
- MlgCanView=false OR Status!=Active/Pending -> DELETE locally
|
|
- Otherwise -> UPSERT locally
|
|
```
|
|
|
|
## Media System (On-Demand Fetching)
|
|
|
|
Per MLS Grid rules, media URLs must NOT be used directly on websites. Images must be downloaded and served from our own server.
|
|
|
|
**How it works:**
|
|
1. **Property sync** stores media metadata (URLs, keys, order) but does NOT download images
|
|
2. **On-demand fetch**: When `mls_get_property_image()` is called, the image is fetched and cached locally
|
|
3. **Subsequent requests** serve from local cache
|
|
4. **Pre-caching**: Use `wp mls media fetch --listing=<key>` to pre-cache specific listings
|
|
|
|
**Benefits:**
|
|
- No rate limit issues from bulk downloading
|
|
- Images cached only when needed (saves bandwidth/storage)
|
|
- Automatic re-fetch if cache is cleared
|
|
- Works with MLS Grid's image URL expiration
|
|
|
|
**Cache location:** `wp-content/uploads/mls-listings/{prefix}/{listing_key}/`
|
|
|
|
### Progress Output
|
|
|
|
Property sync (compact mode):
|
|
- `.` = new property created
|
|
- `#` = property updated
|
|
- `x` = property deleted
|
|
- `-` = skipped (dry-run)
|
|
- `|` = page complete
|
|
|
|
With --verbose: Full timestamped output.
|
|
|
|
### Sync Recovery
|
|
|
|
The sync engine saves progress after each page:
|
|
|
|
1. **Automatic state tracking**: `last_next_link` saved after each API page
|
|
2. **Stale sync detection**: Syncs running >1 hour marked as failed
|
|
3. **Resume commands**:
|
|
- `wp mls sync resume --id=<ID>` - Resume specific sync
|
|
- `wp mls recovery auto` - Auto-resume most recent failed sync
|
|
- `wp mls recovery list` - View all resumable syncs
|
|
|
|
### Recommended Cron Setup
|
|
|
|
```bash
|
|
# Replication sync every 15 minutes (MLS Grid recommended)
|
|
*/15 * * * * cd /var/www/html && wp mls sync incremental --allow-root >> /var/log/mls-sync.log 2>&1
|
|
|
|
# Full re-sync weekly (Sunday 3am) - rebuilds from scratch
|
|
0 3 * * 0 cd /var/www/html && wp mls cache clear --confirm --allow-root && wp mls sync full --allow-root >> /var/log/mls-sync.log 2>&1
|
|
```
|
|
|
|
Note: No separate media cron needed - images are fetched on-demand when properties are viewed.
|
|
|
|
### Public API Functions
|
|
|
|
Available for themes/plugins:
|
|
|
|
```php
|
|
// Get properties with filters
|
|
$properties = mls_get_properties([
|
|
'status' => 'Active',
|
|
'city' => 'Albert Lea',
|
|
'min_price' => 100000,
|
|
'limit' => 20,
|
|
]);
|
|
|
|
// Get single property
|
|
$property = mls_get_property('NST123456');
|
|
|
|
// Get media (on-demand fetching)
|
|
$image_url = mls_get_property_image('NST123456'); // Fetches if not cached
|
|
$image_url = mls_get_property_image('NST123456', false); // Return null if not cached
|
|
|
|
// Get all images (fetches first N on demand)
|
|
$images = mls_get_property_images('NST123456'); // Fetches first 1 if uncached
|
|
$images = mls_get_property_images('NST123456', 5); // Fetches first 5 if uncached
|
|
|
|
// Get media metadata (no fetch)
|
|
$media = mls_get_property_media('NST123456');
|
|
|
|
// Get cache statistics
|
|
$stats = mls_get_cache_stats(); // Returns total_media, cached, uncached counts
|
|
|
|
// Get distinct values
|
|
$cities = mls_get_cities('Active');
|
|
|
|
// Check data availability
|
|
if (mls_is_available()) { ... }
|
|
```
|
|
|
|
### Testing After Changes
|
|
|
|
```bash
|
|
wp mls test connection
|
|
wp mls test auth
|
|
wp mls sync full --dry-run --limit=10 --verbose
|
|
wp mls media status
|
|
wp mls stats
|
|
```
|
|
|
|
### Property Data Mapping
|
|
|
|
Key fields from API to database:
|
|
|
|
| API Field | DB Column |
|
|
|-----------|-----------|
|
|
| ListingKey | listing_key |
|
|
| ListingId | listing_id |
|
|
| ListPrice | list_price |
|
|
| StandardStatus | standard_status |
|
|
| BedroomsTotal | bedrooms_total |
|
|
| BathroomsTotalInteger | bathrooms_total |
|
|
| LivingArea | living_area |
|
|
| City | city |
|
|
| ModificationTimestamp | modification_timestamp |
|
|
| PhotosChangeTimestamp | photos_change_timestamp |
|
|
| MlgCanView | mlg_can_view |
|
|
|
|
Full API response stored in `raw_data` column as JSON.
|
|
|
|
## Troubleshooting
|
|
|
|
### "Value out of range" error
|
|
The API is rejecting a high `$skip` value. This means pagination broke. Clear data and re-run initial sync:
|
|
```bash
|
|
wp mls cache clear --confirm --allow-root
|
|
wp mls sync full --allow-root
|
|
```
|
|
|
|
### All properties showing as "Sold"
|
|
The initial sync was run without the Active/Pending filter. Clear and re-sync:
|
|
```bash
|
|
wp mls cache clear --confirm --allow-root
|
|
wp mls sync full --allow-root
|
|
```
|
|
|
|
### Media not loading
|
|
Images are fetched on-demand. Check:
|
|
1. `wp mls media status` - see cache stats
|
|
2. `wp mls media fetch --listing=<key>` - manually fetch for a listing
|
|
3. Check `wp-content/uploads/mls-listings/` directory permissions
|
|
|
|
### Sync taking too long
|
|
Initial sync of ~30K Active/Pending properties takes about 30-45 minutes. Use `--verbose` to see progress.
|