* Add integration test framework with mock llm
* Fix MonologueAgent and PlannerAgent tests
* Remove adhoc logging
* Use existing logs
* Fix SWEAgent and PlannerAgent
* Check-in test log files
* conftest: look up under test name folder only
* Add docstring to conftest
* Finish dev doc
* Avoid non-determinism
* Remove dependency on llm embedding model
* Init embedding model only for MonologueAgent
* Add adhoc fix for sandbox discrepancy
* Test ssh and exec sandboxes
* CI: fix missing sandbox type
* conftest: Remove hack
* Reword comment for TODO