# Maintenance Task System

This package contains the maintenance task system for running background maintenance operations in the OpenHands deployment wrapper.

## Overview

The maintenance task system provides a framework for running background tasks that perform maintenance operations such as upgrading user settings, cleaning up data, or other periodic maintenance work. Tasks are designed to be short-running (typically under a minute) and handle background state upgrades. The runner is triggered as part of every deploy, though does not block it.

## Architecture

The system consists of several key components:

### 1. Database Model (`MaintenanceTask`)

Located in `storage/maintenance_task.py`, this model stores maintenance tasks with the following fields:

- `id`: Primary key
- `status`: Task status (INACTIVE, PENDING, WORKING, COMPLETED, ERROR)
- `processor_type`: Fully qualified class name of the processor
- `processor_json`: JSON serialized processor configuration
- `delay`: Delay before starting task
- `info`: JSON field containing structured information about the task outcome
- `created_at`: When the task was created
- `updated_at`: When the task was last updated

### 2. Processor Base Class (`MaintenanceTaskProcessor`)

Abstract base class for all maintenance task processors. Processors must implement the `__call__` method to perform the actual work.

```python
from storage.maintenance_task import MaintenanceTaskProcessor, MaintenanceTask

class MyProcessor(MaintenanceTaskProcessor):
    # Define your processor fields here
    some_config: str

    async def __call__(self, task: MaintenanceTask) -> dict:
        # Implement your maintenance logic here
        return {"status": "completed", "processed_items": 42}
```

## Available Processors

### UserVersionUpgradeProcessor

Located in `user_version_upgrade_processor.py`, this processor:

- Handles up to 100 user IDs per task
- Upgrades users with `user_version < ORG_SETTINGS_VERSION`
- Uses `SaasSettingsStore.create_default_settings()` for upgrades

**Usage:**

```python
from server.maintenance_task_processor.user_version_upgrade_processor import UserVersionUpgradeProcessor

processor = UserVersionUpgradeProcessor(user_ids=["user1", "user2", "user3"])
```

## Creating New Processors

To create a new maintenance task processor:

1. **Create a new processor class** inheriting from `MaintenanceTaskProcessor`:

```python
from storage.maintenance_task import MaintenanceTaskProcessor, MaintenanceTask
from typing import List

class MyMaintenanceProcessor(MaintenanceTaskProcessor):
    """Description of what this processor does."""

    # Define configuration fields
    target_ids: List[str]
    batch_size: int = 50

    async def __call__(self, task: MaintenanceTask) -> dict:
        """
        Implement your maintenance logic here.

        Args:
            task: The maintenance task being processed

        Returns:
            dict: Information about the task execution
        """
        try:
            # Your maintenance logic here
            processed_count = 0

            for target_id in self.target_ids:
                # Process each target
                processed_count += 1

            return {
                "status": "completed",
                "processed_count": processed_count,
                "message": f"Successfully processed {processed_count} items"
            }
        except Exception as e:
            return {
                "status": "error",
                "error": str(e),
                "processed_count": processed_count
            }
```

2. **Add the processor to the package** by importing it in `__init__.py` if needed.

3. **Create tasks using the utility functions** in `server/utils/maintenance_task_utils.py`:

```python
from server.utils.maintenance_task_utils import create_maintenance_task
from server.maintenance_task_processor.my_processor import MyMaintenanceProcessor

# Create a task
processor = MyMaintenanceProcessor(target_ids=["id1", "id2"], batch_size=25)
task = create_maintenance_task(processor, start_at=datetime.utcnow())
```

## Task Management

### Creating Tasks Programmatically

```python
from datetime import datetime, timedelta
from server.utils.maintenance_task_utils import create_maintenance_task
from server.maintenance_task_processor.user_version_upgrade_processor import UserVersionUpgradeProcessor

# Create a user upgrade task
processor = UserVersionUpgradeProcessor(user_ids=["user1", "user2"])
task = create_maintenance_task(
    processor=processor,
    start_at=datetime.utcnow() + timedelta(minutes=5)  # Start in 5 minutes
)
```

## Task Lifecycle

1. **INACTIVE**: Task is created but not yet scheduled
2. **PENDING**: Task is scheduled and waiting to be picked up by the runner
3. **WORKING**: Task is currently being processed
4. **COMPLETED**: Task finished successfully
5. **ERROR**: Task encountered an error during processing

## Best Practices

### Processor Design

- Keep tasks short-running (under 1 minute)
- Handle errors gracefully and return meaningful error information
- Use batch processing for large datasets
- Include progress information in the return dict

### Error Handling

- Always wrap your processor logic in try-catch blocks
- Return structured error information
- Log important events for debugging

### Performance

- Limit batch sizes to avoid long-running tasks
- Use database sessions efficiently
- Consider memory usage for large datasets

### Testing

- Create unit tests for your processors
- Test error conditions
- Verify the processor serialization/deserialization works correctly

## Database Patterns

The maintenance task system follows the repository's established patterns:

- Uses `session_maker()` for database operations
- Wraps sync database operations in `call_sync_from_async` for async routes
- Follows proper SQLAlchemy query patterns

## Integration with Existing Systems

### User Management

- Integrates with the existing `UserSettings` model
- Uses the current user versioning system (`ORG_SETTINGS_VERSION`)
- Maintains compatibility with existing user management workflows

### Authentication

- Admin endpoints use the existing SaaS authentication system
- Requires users to have `admin = True` in their UserSettings

### Monitoring

- Tasks are logged with structured information
- Status updates are tracked in the database
- Error information is preserved for debugging

## Troubleshooting

### Common Issues

1. **Tasks stuck in WORKING state**: Usually indicates the runner crashed while processing. These can be manually reset to PENDING.

2. **Serialization errors**: Ensure all processor fields are JSON serializable.

3. **Database connection issues**: Check that the processor properly handles database sessions.

### Debugging

- Check the server logs for task execution details
- Use the admin API to inspect task status and info
- Verify processor configuration is correct

## Future Enhancements

Potential improvements that could be added:

- Task dependencies and scheduling
- Retry mechanisms for failed tasks
- Real-time progress updates
- Task cancellation
- Cron-like scheduling expressions
- Audit logging for admin actions
- Role-based permissions beyond simple admin flag