Files
homeproz/wp-content/plugins/mls-by-hansonxyz/README.md
T
root b6df4dbb92 Snapshot: MLS sync fixes, image refresh, plugin/theme updates
MLS plugin fixes from this session:
- Fix silent insert failures: location column NOT NULL was rejecting wpdb->insert calls,
  causing ~18k new properties since Dec 2025 to be lost. Inserts now build raw SQL
  with ST_PointFromText so the spatial column is populated atomically.
- Auto-refresh expired media URLs in MLS_Media_Handler::fetch_and_cache(), guarded by
  a property-level GET_LOCK so concurrent fetches share one API refresh.
- Normalize WP_Error to null in mls_get_property_image() so callers can rely on the
  documented string|null contract.
- Support comma-separated property_type filters in MLS_Query and MLS_Cluster so the
  homepage "View All Commercial" link (?property_type=Commercial+Sale,Land,Farm)
  actually filters correctly.
- Incremental sync now looks back 10 minutes past the latest modification timestamp
  as a safety margin against missed records.
- Smart sync exits silently (info-level, not warning) when a full sync is in progress.

Operational:
- New cron: weekly full sync Sundays at 3 AM (/usr/local/bin/mls-full-sync).
- New cron: hourly 2GB cap on mls-thumbnails/ and cache/transformed-images/
  (/usr/local/bin/mls-image-cache-cap).
- Logrotate config for wp-content/debug.log (2-day retention, daily rotation,
  delaycompress).

Repo policy:
- CLAUDE.md updated with explicit "commit everything except build artifacts" policy.
- .gitignore: untrack runtime image caches and debug.log rotations.

Other modifications in this snapshot are pre-existing in-flight theme/plugin/db_content_updates work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 15:32:23 +00:00

785 lines
23 KiB
Markdown
Executable File

# MLS by HansonXyz
WordPress plugin for syncing MLS Grid API data (NorthStar MLS) into a local database with WP-CLI tools and a public API for themes and plugins.
## Table of Contents
- [Features](#features)
- [Requirements](#requirements)
- [Installation](#installation)
- [Configuration](#configuration)
- [Running Sync](#running-sync)
- [WP-CLI Commands](#wp-cli-commands)
- [Cron Setup](#cron-setup)
- [Public API](#public-api)
- [Database Schema](#database-schema)
- [Media Handling](#media-handling)
- [Garbage Collection](#garbage-collection)
- [Sync Strategy](#sync-strategy)
- [Error Recovery](#error-recovery)
- [Troubleshooting](#troubleshooting)
## Features
- Syncs Active and Pending property listings from MLS Grid API
- **HomeProz Listing Persistence**: Sold HomeProz listings are retained for historical viewing
- Automatic incremental updates via replication
- On-demand image fetching and local caching
- **Persistent Image Cache**: HomeProz listing images are permanently cached
- Automatic WebP conversion for cached images
- Disk space garbage collection for image cache (excludes HomeProz images)
- Self-healing sync with automatic error recovery
- Rate limit compliance (MLS Grid limits enforced)
- Resume capability for interrupted syncs
- WP-CLI commands for all operations
- Public PHP API for theme/plugin integration
- Optimized database indexes for search queries
## Requirements
- WordPress 5.0+
- PHP 7.4+
- MySQL 5.7+ or MariaDB 10.2+
- WP-CLI (for command-line operations)
- MLS Grid API access token
## Installation
1. Upload the `mls-by-hansonxyz` folder to `/wp-content/plugins/`
2. Activate the plugin through WordPress admin
3. Configure API credentials (see Configuration)
4. Run initial sync: `wp mls run`
## Configuration
### API Credentials
Add to your `wp-config.php`:
```php
define('MLSGRID_API_URL', 'https://api.mlsgrid.com/v2');
define('MLSGRID_ACCESS_TOKEN', 'your-access-token-here');
```
### Image Garbage Collection (Optional)
To enable automatic cleanup of old cached images when disk space is low, add to `wp-config.php`:
```php
// Enable garbage collection when free space drops below 5GB
define('MLS_GC_DISK_THRESHOLD', 5 * 1024 * 1024 * 1024); // 5GB in bytes
```
See [Garbage Collection](#garbage-collection) for details.
### WordPress Admin Settings
Navigate to **Settings > MLS Settings** to configure:
| Setting | Description | Default |
|---------|-------------|---------|
| Originating System | MLS identifier | `northstar` |
| Auto Sync | Enable WP-Cron sync | Disabled |
| Sync Interval | WP-Cron frequency | Hourly |
## Running Sync
### Smart Sync (Recommended)
The `wp mls run` command handles all scenarios automatically:
```bash
wp mls run # Smart sync with progress
wp mls run --quiet # Status messages only
wp mls run --verbose # Full API details
wp mls run --silent # For cron (exit code only)
```
**Automatic behavior:**
- If no data exists: runs full sync
- If data exists: runs incremental sync
- If previous sync failed: resumes from checkpoint
- If sync already running: safely aborts
### Manual Sync Commands
For more control over sync operations:
```bash
# Full sync (Active/Pending properties only)
wp mls sync full
# Incremental sync (changes since last sync)
wp mls sync incremental
# Resume a specific failed sync
wp mls sync resume --id=<sync_id>
# Dry run (no changes)
wp mls sync full --dry-run --limit=100
```
### Progress Indicators
During sync, progress characters indicate activity:
| Symbol | Meaning |
|--------|---------|
| `.` | Property created |
| `#` | Property updated |
| `x` | Property deleted |
| `!` | Error occurred |
| `\|` | Page complete |
Use `--verbose` for detailed timestamped output.
## WP-CLI Commands
### Testing
```bash
wp mls test connection # Test API connectivity
wp mls test auth # Verify authentication
```
### Status and Statistics
```bash
wp mls status # Full status overview
wp mls status rate-limits # Rate limit usage only
wp mls stats # Database statistics
```
### Sync Operations
```bash
# Smart sync (recommended)
wp mls run [--quiet] [--verbose] [--silent]
# Manual sync
wp mls sync full [--dry-run] [--limit=N] [--verbose]
wp mls sync incremental [--dry-run] [--verbose]
wp mls sync resume --id=<sync_id>
```
### Media Management
Images are fetched on-demand when properties are viewed. These commands manage the cache:
```bash
wp mls media status # Cache statistics
wp mls media fetch --listing=<key> # Pre-cache a listing's images
wp mls media fetch --listing=<key> --limit=10
wp mls media clear --listing=<key> # Clear cached images
```
### Cache Management
```bash
wp mls cache clear --confirm # Delete ALL synced data
wp mls cache cleanup # Remove orphaned media files
wp mls cache missing # View failed media downloads
wp mls cache missing --clear # Clear the missing media log
```
### Recovery
```bash
wp mls recovery list # Show resumable syncs
wp mls recovery auto # Auto-resume most recent failed sync
wp mls recovery cleanup # Mark stale syncs as failed
```
## Cron Setup
### Recommended Setup
Add to system crontab (`crontab -e`):
```bash
# Smart sync every 15 minutes (handles everything automatically)
*/15 * * * * cd /var/www/html && wp mls run --silent --allow-root >> /var/log/mls-sync.log 2>&1
```
This single entry handles:
- Initial full sync on first run
- Incremental updates on subsequent runs
- Automatic recovery from failures
- Safe concurrent execution (aborts if already running)
### Alternative: Manual Control
```bash
# Incremental sync every 15 minutes
*/15 * * * * cd /var/www/html && wp mls sync incremental --allow-root >> /var/log/mls-sync.log 2>&1
# Full rebuild weekly (Sunday 3am)
0 3 * * 0 cd /var/www/html && wp mls cache clear --confirm --allow-root && wp mls sync full --allow-root >> /var/log/mls-sync.log 2>&1
```
### Important Notes
- Use `--allow-root` when running as root
- MLS Grid requires refresh at least every 12 hours per IDX rules
- Rate limits are handled automatically (plugin waits when approaching limits)
- No separate media cron needed - images are fetched on-demand
## Public API
### Available Functions
```php
// Get properties with filters
$properties = mls_get_properties([
'status' => 'Active',
'city' => 'Albert Lea',
'min_price' => 100000,
'max_price' => 500000,
'min_beds' => 3,
'property_type' => 'Residential',
'limit' => 20,
'offset' => 0,
'orderby' => 'list_price',
'order' => 'DESC',
]);
// Get single property by listing key or MLS ID
$property = mls_get_property('NST123456');
// Get primary image (fetches on-demand if not cached)
$image_url = mls_get_property_image('NST123456');
$image_url = mls_get_property_image('NST123456', false); // Don't fetch, return null if uncached
// Get all images for a listing
$images = mls_get_property_images('NST123456'); // Fetch first 1 if uncached
$images = mls_get_property_images('NST123456', 10); // Fetch first 10 if uncached
$images = mls_get_property_images('NST123456', 0); // Don't fetch any
// Get media metadata (no fetching)
$media = mls_get_property_media('NST123456');
// Get distinct cities with listings
$cities = mls_get_cities(); // All cities
$cities = mls_get_cities('Active'); // Cities with active listings only
// Get property count
$count = mls_get_property_count(['status' => 'Active']);
// Check if data is available
if (mls_is_available()) {
// Show property search
}
// Get cache statistics
$stats = mls_get_cache_stats();
// Returns: ['total_media' => 50000, 'cached' => 1200, 'uncached' => 48800]
```
### Query Parameters
| Parameter | Type | Description |
|-----------|------|-------------|
| `status` | string | Active, Pending, Closed |
| `property_type` | string | Residential, Land, Commercial, etc. |
| `city` | string | City name |
| `county` | string | County name |
| `postal_code` | string | ZIP code |
| `min_price` | int | Minimum list price |
| `max_price` | int | Maximum list price |
| `min_beds` | int | Minimum bedrooms |
| `max_beds` | int | Maximum bedrooms |
| `min_baths` | int | Minimum bathrooms |
| `min_sqft` | int | Minimum living area |
| `max_sqft` | int | Maximum living area |
| `year_built_min` | int | Minimum year built |
| `year_built_max` | int | Maximum year built |
| `listing_key` | string | Specific listing key |
| `listing_id` | string | Specific MLS ID |
| `search` | string | Search address/remarks |
| `limit` | int | Results per page (default: 20) |
| `offset` | int | Pagination offset |
| `orderby` | string | Sort field |
| `order` | string | ASC or DESC |
| `include_media` | bool | Include media array |
| `fields` | array | Specific fields to return |
### Property Object Fields
```php
$property->listing_key // Unique identifier
$property->listing_id // MLS number
$property->standard_status // Active, Pending, Closed
$property->list_price // Current price
$property->original_list_price
$property->close_price
// Address
$property->street_number
$property->street_name
$property->street_suffix
$property->unit_number
$property->city
$property->state_or_province
$property->postal_code
$property->county
$property->latitude
$property->longitude
// Property details
$property->property_type
$property->property_sub_type
$property->bedrooms_total
$property->bathrooms_total
$property->bathrooms_full
$property->bathrooms_half
$property->living_area // Square feet
$property->lot_size_area
$property->lot_size_units
$property->year_built
$property->garage_spaces
// Description
$property->public_remarks
$property->directions
// Listing info
$property->list_agent_key
$property->list_agent_mls_id
$property->list_agent_name
$property->list_office_key
$property->list_office_mls_id
$property->list_office_name
// Dates and timestamps
$property->photos_count
$property->modification_timestamp
$property->photos_change_timestamp
$property->listing_contract_date
$property->close_date
$property->days_on_market
$property->created_at
$property->updated_at
```
## Database Schema
### Tables
All tables use the WordPress prefix (e.g., `wp_mls_properties`).
#### mls_properties
Main property listing data. Only Active and Pending properties are stored.
| Column | Type | Description |
|--------|------|-------------|
| id | BIGINT | Auto-increment primary key |
| listing_key | VARCHAR(50) | Unique MLS Grid key |
| listing_id | VARCHAR(50) | MLS number |
| standard_status | VARCHAR(30) | Active, Pending |
| list_price | DECIMAL(15,2) | Current price |
| city | VARCHAR(100) | City name |
| latitude | DECIMAL(10,8) | GPS latitude |
| longitude | DECIMAL(11,8) | GPS longitude |
| ... | ... | See property fields above |
| raw_data | LONGTEXT | Full API response (JSON) |
| modification_timestamp | DATETIME | Last modified in MLS |
| created_at | DATETIME | Record creation |
| updated_at | DATETIME | Record update |
**Indexes:**
- `listing_key` (UNIQUE)
- `listing_id`
- `standard_status`
- `city`
- `property_type`
- `list_price`
- `modification_timestamp`
- `bedrooms_total`
- `county`
- `idx_latitude` - for geo queries
- `idx_longitude` - for geo queries
- `idx_status_city_price` - composite for search
- `idx_status_type` - composite for filtering
#### mls_media
Media metadata and cache status. Images are downloaded on-demand.
| Column | Type | Description |
|--------|------|-------------|
| id | BIGINT | Auto-increment primary key |
| listing_key | VARCHAR(50) | Property reference |
| media_key | VARCHAR(100) | Unique media identifier |
| media_type | VARCHAR(30) | Photo, Document, etc. |
| media_order | INT | Display order |
| media_url | VARCHAR(1000) | Original MLS Grid URL |
| local_path | VARCHAR(500) | Cached file path |
| local_url | VARCHAR(500) | Cached file URL |
| downloaded_at | DATETIME | When cached |
#### mls_sync_state
Sync progress tracking for resume capability.
| Column | Type | Description |
|--------|------|-------------|
| id | BIGINT | Sync operation ID |
| sync_type | VARCHAR(30) | full, incremental |
| status | VARCHAR(20) | pending, running, completed, failed |
| last_next_link | VARCHAR(2000) | Resume checkpoint |
| records_processed | INT | Total processed |
| records_created | INT | New records |
| records_updated | INT | Updated records |
| records_deleted | INT | Deleted records |
#### mls_rate_limits
API rate limit tracking.
#### mls_sync_log
Debug logging for sync operations.
#### mls_media_log
Media download audit trail.
## Media Handling
### On-Demand Fetching
Per MLS Grid rules, media URLs cannot be used directly on websites. Images must be downloaded and served from your own server.
**How it works:**
1. Property sync stores media metadata (URLs, keys, order) but does NOT download images
2. When `mls_get_property_image()` is called, the image is fetched and cached locally
3. Subsequent requests serve from local cache
4. Cache location: `wp-content/uploads/mls-listings/{prefix}/{listing_key}/`
**Benefits:**
- No rate limit issues from bulk downloading
- Images cached only when needed
- Automatic re-fetch if cache cleared
- Works with MLS Grid's URL expiration
### HomeProz Persistent Cache
HomeProz listings receive special treatment to preserve images even after properties are sold:
**How it works:**
1. HomeProz listings are identified by `ListOfficeMlsId` matching the configured HomeProz office ID
2. During sync, ALL images for HomeProz listings are automatically downloaded
3. Images are stored in the persistent cache: `wp-content/uploads/mls-listings-persistent/`
4. The persistent cache is NEVER subject to garbage collection
5. Images remain available indefinitely, even after the listing is sold and removed from MLS
**Cache Directories:**
| Directory | Purpose | Garbage Collected |
|-----------|---------|-------------------|
| `mls-listings/` | Standard cache (non-HomeProz) | Yes |
| `mls-listings-persistent/` | HomeProz listings | No |
**Benefits:**
- Sold HomeProz listings can be displayed on a "Sold Homes" page
- No loss of images when listings are removed from MLS feed
- Historical record of HomeProz sales preserved
### Pre-caching Images
To pre-cache images for specific listings:
```bash
wp mls media fetch --listing=NST123456 --limit=10
```
### Cache Statistics
```bash
wp mls media status
```
Shows total media records, cached count, and uncached count.
## Garbage Collection
The plugin includes automatic garbage collection to prevent disk space from filling up with cached MLS images.
**Important:** Garbage collection ONLY affects the standard cache (`mls-listings/`). The persistent cache (`mls-listings-persistent/`) containing HomeProz listing images is NEVER touched.
### Enabling Garbage Collection
Add to `wp-config.php`:
```php
// Enable garbage collection when free space drops below 5GB
define('MLS_GC_DISK_THRESHOLD', 5 * 1024 * 1024 * 1024); // 5GB in bytes
```
If `MLS_GC_DISK_THRESHOLD` is not defined, garbage collection is disabled.
### How It Works
1. After each sync (`wp mls run`), the plugin checks free disk space on the volume hosting MLS images
2. If free space is below the threshold, cleanup begins
3. Directories older than 24 hours are deleted from the standard cache, oldest first
4. Cleanup stops when:
- Free space reaches 5GB, OR
- 2GB has been deleted in this run
5. Directories modified within the last 24 hours are never deleted (protects recently accessed images)
6. HomeProz images in the persistent cache are NEVER deleted
### Behavior Summary
| Setting | Value |
|---------|-------|
| Threshold trigger | Configurable via `MLS_GC_DISK_THRESHOLD` |
| Target free space | 5GB |
| Max delete per run | 2GB |
| Minimum directory age | 24 hours |
| Runs automatically | After every sync |
### CLI Output
During sync, garbage collection status is shown:
```
Garbage Collection:
Disk space OK: 12.45 GB free (threshold: 5.00 GB)
```
Or if cleanup occurs:
```
Garbage Collection:
Disk space low: 3.21 GB free (threshold: 5.00 GB). Starting cleanup...
Deleted: NST123456 (45.23 MB)
Deleted: NST789012 (38.91 MB)
...
Cleanup complete: Deleted 42 directories (1.89 GB). Free space now: 5.10 GB
```
### Recommended Threshold
For most installations, 5GB is a good threshold:
```php
define('MLS_GC_DISK_THRESHOLD', 5 * 1024 * 1024 * 1024);
```
For servers with limited disk space, you may want a higher threshold to trigger cleanup earlier:
```php
// Trigger cleanup when below 10GB
define('MLS_GC_DISK_THRESHOLD', 10 * 1024 * 1024 * 1024);
```
### Image Regeneration
When a deleted image is requested again, it is automatically re-fetched from MLS Grid and cached. This is the normal on-demand fetching behavior - garbage collection simply clears old cached files to free disk space.
## HomeProz Listing Persistence
HomeProz listings (properties listed by the configured HomeProz office) receive special handling to preserve them even after they are sold.
### Retention Rules
| Listing Type | Active | Pending | Sold | Other Statuses |
|--------------|--------|---------|------|----------------|
| HomeProz | Retained | Retained | Retained | Deleted |
| Non-HomeProz | Retained | Retained | Deleted | Deleted |
### How It Works
1. **Identification**: Listings are identified as HomeProz by matching `ListOfficeMlsId` to `MLS_HOMEPROZ_OFFICE_ID` (configured in plugin constants)
2. **Sync Behavior**: HomeProz listings with Active, Pending, or Sold status are kept in the database
3. **Image Caching**: All images for HomeProz listings are automatically downloaded during sync
4. **Persistent Storage**: Images are stored in `mls-listings-persistent/` which is never garbage collected
5. **Future Use**: Sold HomeProz listings can be displayed on a "Sold Homes" page
### Configuration
The HomeProz office ID is defined in the main plugin file:
```php
define('MLS_HOMEPROZ_OFFICE_ID', 'NST253235');
```
## Sync Strategy
### Initial Import (Full Sync)
- Fetches ONLY Active and Pending properties
- Filter: `MlgCanView eq true AND (StandardStatus eq 'Active' OR StandardStatus eq 'Pending')`
- Uses `@odata.nextLink` for pagination (NOT `$skip`)
- Approximately 30,000 records for NorthStar MLS
- Takes 30-45 minutes on first run
### Replication (Incremental Sync)
- Fetches ALL properties modified since last sync
- No filter on status (need to detect changes)
- For each record:
- If `MlgCanView = false`: DELETE from local DB
- If HomeProz listing: DELETE only if status not Active/Pending/Sold
- If non-HomeProz listing: DELETE if status not Active/Pending
- Otherwise: INSERT or UPDATE
- HomeProz images are auto-downloaded during sync
### Why This Approach?
1. MLS Grid API limits `$skip` to ~80,000 - bulk scanning fails
2. Only Active/Pending properties needed for display
3. Replication is efficient - only fetches changes
4. Proper deletion handling when properties sell
## Error Recovery
### Automatic Recovery
The plugin saves progress after each API page. If a sync fails:
1. Progress is preserved in `mls_sync_state` table
2. Next `wp mls run` automatically resumes from checkpoint
3. Failed syncs older than 1 hour are marked for resume
### Manual Recovery
```bash
# View resumable syncs
wp mls recovery list
# Auto-resume most recent
wp mls recovery auto
# Resume specific sync
wp mls sync resume --id=<sync_id>
# Mark stale syncs as failed
wp mls recovery cleanup
```
## Troubleshooting
### Connection Failed
```bash
wp mls test connection
wp mls test auth
```
Check:
- API token in wp-config.php
- Network connectivity
- MLS Grid API status
### No Data After Sync
```bash
wp mls status
wp mls stats
```
Check:
- Rate limits (may need to wait)
- WordPress debug log for API errors
- Sync state for failures
### Media Not Loading
```bash
wp mls media status
```
Check:
- Upload directory permissions
- Disk space
- MLS Grid media URL validity
### Sync Taking Too Long
Initial sync of ~30K properties takes 30-45 minutes. Use `--verbose` to monitor progress.
### Rate Limit Exceeded
The plugin automatically waits when approaching limits. If persistent:
- Reduce sync frequency
- Check for other API consumers
- Contact MLS Grid support
### Clearing Data
To start fresh:
```bash
wp mls cache clear --confirm
wp mls run
```
### Database Issues
If indexes are missing, trigger recreation:
```bash
wp eval "MLS_DB::create_tables();"
```
## Rate Limits
MLS Grid enforces these limits:
| Limit | Value |
|-------|-------|
| Per second | 2 requests |
| Per hour | 7,200 requests |
| Per day | 40,000 requests |
| Data per hour | 4 GB |
The plugin automatically:
- Waits 500ms between requests
- Tracks hourly/daily usage
- Pauses when approaching limits
- Retries with exponential backoff on 429 errors
## File Structure
```
mls-by-hansonxyz/
├── mls-by-hansonxyz.php # Main plugin file, public API
├── uninstall.php # Cleanup on uninstall
├── README.md # This file
├── admin/
│ └── class-mls-admin.php # WordPress admin interface
├── cli/
│ └── class-mls-cli.php # WP-CLI commands
├── includes/
│ ├── class-mls-activator.php # Plugin activation
│ ├── class-mls-api-client.php # MLS Grid API communication
│ ├── class-mls-db.php # Database operations
│ ├── class-mls-deactivator.php # Plugin deactivation
│ ├── class-mls-garbage-collector.php # Disk space management
│ ├── class-mls-logger.php # Event logging
│ ├── class-mls-media-handler.php # On-demand image caching
│ ├── class-mls-options.php # Configuration management
│ ├── class-mls-query.php # Public query API
│ ├── class-mls-rate-limiter.php # Rate limit compliance
│ └── class-mls-sync-engine.php # Sync orchestration
└── docs/
├── API.md # MLS Grid API reference
├── CLAUDE.md # AI assistant context
└── USAGE.md # User documentation
```
## Support
- Plugin logs: Settings > MLS Settings in WordPress admin
- Debug log: `wp-content/debug.log` (if WP_DEBUG enabled)
- MLS Grid API: support@mlsgrid.com
## License
GPL-2.0+