docs(enterprise): Update M3 implementation checklist to reflect two-phase scheduling

Enhance section 5.3 (Embedded Telemetry Service) implementation plan to explicitly
document the two-phase adaptive scheduling requirements that were added to the
technical design in the previous commit.

Changes to Implementation Checklist:
-------------------------------------
1. Updated Key Features section (5.3.1):
   - Added two-phase adaptive scheduling description
   - Documented bootstrap phase (3-minute checks)
   - Documented normal phase (1-hour checks, 7-day collection, 24-hour upload)
   - Added identity establishment detection requirement
   - Noted hardcoded publishable key (not environment variables)

2. Enhanced service.py checklist items:
   - Implement __init__() with hardcoded Replicated publishable key
   - Add two-phase interval constants (180s bootstrap, 3600s normal)
   - Implement _is_identity_established() method for phase detection
   - Implement _collection_loop() with adaptive intervals
   - Implement _upload_loop() with adaptive intervals and transition detection
   - Implement _get_admin_email() supporting bootstrap phase
   - Implement _get_or_create_identity() for Replicated integration

3. Added two-phase scheduling test requirements (5.3.3):
   - Test bootstrap phase: 3-minute check intervals before first user
   - Test phase transition: Immediate upload when first user authenticates
   - Test normal phase: 1-hour check intervals after identity established
   - Test identity detection: _is_identity_established() logic
   - Test error handling: Falls back to bootstrap interval on errors

4. Enhanced unit test checklist:
   - Test _is_identity_established() with no/partial/complete identity
   - Test interval selection logic (bootstrap vs normal)
   - Test phase transition detection in upload loop

5. Updated demo description:
   - Added: "New installations become visible within 3 minutes of first user login"
   - Clarified ongoing behavior after identity establishment

Rationale:
----------
The previous commit added comprehensive two-phase scheduling to the technical
design (section 4.3), but the implementation checklist (section 5.3) still
described the original fixed-interval approach. This update ensures developers
implementing M3 have clear guidance on all the two-phase scheduling requirements.

The checklist now explicitly calls out:
- New methods to implement (_is_identity_established)
- New constants to define (bootstrap vs normal intervals)
- New logic to add (adaptive interval selection)
- New tests to write (phase detection and transition)

This aligns the implementation requirements with the technical design.

Co-authored-by: openhands <openhands@all-hands.dev>
This commit is contained in:
openhands 2025-12-19 14:11:39 +00:00
parent d629699326
commit b3bf509634

View File

@ -1153,13 +1153,27 @@ Implement the embedded telemetry service that runs within the main enterprise se
- [ ] `enterprise/server/telemetry/__init__.py` - Package initialization
- [ ] `enterprise/server/telemetry/service.py` - Core TelemetryService singleton class
- [ ] Implement `TelemetryService.__init__()` with hardcoded Replicated publishable key
- [ ] Add two-phase interval constants (`bootstrap_check_interval_seconds=180`, `normal_check_interval_seconds=3600`)
- [ ] Implement `_is_identity_established()` method for phase detection
- [ ] Implement `_collection_loop()` with adaptive intervals (3 min bootstrap, 1 hour normal)
- [ ] Implement `_upload_loop()` with adaptive intervals and identity creation detection
- [ ] Implement `_get_admin_email()` to support bootstrap phase (env var or first user)
- [ ] Implement `_get_or_create_identity()` for Replicated customer/instance creation
- [ ] `enterprise/server/telemetry/lifecycle.py` - FastAPI lifespan integration
- [ ] `enterprise/tests/unit/telemetry/test_service.py` - Service unit tests
- [ ] Test `_is_identity_established()` with no identity, partial identity, complete identity
- [ ] Test interval selection logic (bootstrap vs normal)
- [ ] Test phase transition detection in upload loop
- [ ] `enterprise/tests/unit/telemetry/test_lifecycle.py` - Lifespan integration tests
**Key Features**:
- Singleton service pattern with thread-safe initialization
- Independent AsyncIO background tasks for collection (7 days) and upload (24 hours)
- Two-phase adaptive scheduling:
- **Bootstrap phase**: Checks every 3 minutes until first user authenticates (rapid initial visibility)
- **Normal phase**: Checks every 1 hour, collects every 7 days, uploads every 24 hours (low overhead)
- Automatic identity establishment detection and phase transition
- Replicated publishable key hardcoded in source (not environment variables)
- Graceful startup and shutdown via FastAPI lifespan events
- Automatic recovery from errors without crashing main server
@ -1178,11 +1192,16 @@ Implement the embedded telemetry service that runs within the main enterprise se
- [ ] `enterprise/tests/integration/test_telemetry_flow.py` - Full collection and upload cycle
- [ ] Test startup/shutdown behavior
- [ ] Test two-phase scheduling:
- [ ] Bootstrap phase: 3-minute check intervals before first user
- [ ] Phase transition: Immediate upload when first user authenticates
- [ ] Normal phase: 1-hour check intervals after identity established
- [ ] Identity detection: `_is_identity_established()` logic
- [ ] Test interval timing and database state
- [ ] Test Replicated API integration (mocked)
- [ ] Test error handling and recovery
- [ ] Test error handling and recovery (falls back to bootstrap interval)
**Demo**: Telemetry service starts automatically with the enterprise server, collects metrics weekly, uploads daily to Replicated, and cannot be disabled without code modification.
**Demo**: Telemetry service starts automatically with the enterprise server. New installations become visible within 3 minutes of first user login. Established installations collect metrics weekly and upload daily to Replicated. The service cannot be disabled without code modification.
### 5.4 License Warning API (M4)