52 Commits

Author SHA1 Message Date
Han Xiao
727767e93b refactor: enhance logging details and context 2025-06-10 13:00:23 -07:00
Han Xiao
4b37ec8d04 fix: logger 2025-06-10 12:34:37 -07:00
Han Xiao
9edf122a8c refactor: logger 2025-06-10 11:48:19 -07:00
Sha Zhou
0ee295b83e improve image tools 2025-06-10 21:15:57 +08:00
Sha Zhou
a768755783
feat: gather images to response (#98)
* feat: add image tools

* rank images

* add image dedup

* wip

* wip

* remove rank functions

* fix

* add embeddings to image

* move image object to agent

* build image references

* update

* add with_images param

* update dimensions for image tools

* dudup images

* save images to cloud storage

* remove extra log

* fix

* remove test data

* fix
2025-06-10 11:55:46 +08:00
Han Xiao
2affd41c79 fix: add spacing for consistency in agent and url-tools 2025-06-09 15:48:41 -07:00
Sha Zhou
e8f9e79a20 replace fetch with axios client in url-tools 2025-05-12 18:12:41 +08:00
Sha Zhou
5674402bb1 fix: optimize webContent for references, add axios client 2025-05-09 14:01:27 +08:00
Sha Zhou
d0165d419e update hostname boost value 2025-04-28 15:38:42 +08:00
Sha Zhou
dd6ee81baa normalize host name 2025-04-28 12:20:22 +08:00
Han Xiao
c7b42fb150
refactor: v2 (#95)
* refactor: optimize read and search

* refactor: v2

* refactor: v2

* refactor: v2

* refactor: v2
2025-04-13 23:32:50 +08:00
Han Xiao
153479abb6 refactor: optimize read and search 2025-04-11 22:09:42 +08:00
Han Xiao
7bd4f51f42 fix: emit url idx in visit action 2025-03-27 15:29:51 +08:00
Han Xiao
347beda0c2 feat: only hostnames 2025-03-24 11:27:27 +08:00
Han Xiao
7d07078ec5 feat: only hostnames 2025-03-24 10:41:35 +08:00
Han Xiao
efa79274c1 fix: md table render 2025-03-20 14:24:39 +08:00
Han Xiao
835d5b9a0f fix: up err model 2025-03-19 15:47:51 +08:00
Han Xiao
d5cb62f7ea revert: no spam filter 2025-03-19 15:31:42 +08:00
Han Xiao
d824957d29 revert: no spam filter 2025-03-19 15:25:49 +08:00
Han Xiao
92f1a15f8c feat: filter out blocked content 2025-03-19 15:20:27 +08:00
Han Xiao
ef5820729d feat: filter out blocked content 2025-03-19 15:07:11 +08:00
Han Xiao
96d856c848 feat: filter out blocked content 2025-03-19 14:53:20 +08:00
Han Xiao
023bf0ef9c fix: unnecessary eval 2025-03-19 08:37:35 +08:00
Han Xiao
f8decb037e fix: unnecessary eval 2025-03-18 21:07:33 +08:00
Han Xiao
492f879ef1 perf: opt reranker 2025-03-18 14:06:58 +08:00
Han Xiao
aac0db67e4 feat: add hostnames bw filter 2025-03-18 11:24:53 +08:00
Han Xiao
4ca7804e58 feat: add hostnames bw filter 2025-03-18 10:43:46 +08:00
Han Xiao
1ac80e4d20 fix: url sanitization 2025-03-17 18:09:01 +08:00
Han Xiao
3930f8b863 fix: url sanitization 2025-03-17 15:41:54 +08:00
Han Xiao
01705291c4 fix: normalize url 2025-03-17 14:44:28 +08:00
Han Xiao
5c36410b54 fix: normalize url 2025-03-17 14:23:02 +08:00
Han Xiao
90b1d39cc6 fix: normalize url 2025-03-17 12:07:19 +08:00
Han Xiao
f9cb542dd0 fix: normalize url 2025-03-15 13:55:59 +08:00
Han Xiao
f5d6bf75f5 feat: add num urls 2025-03-14 15:18:50 +08:00
Han Xiao
c9a51bb403 fix: fallback genobj 2025-03-14 13:22:17 +08:00
Han Xiao
59b2daf66b fix: updated time 2025-03-13 10:30:35 +08:00
Han Xiao
4d76f146d0 feat: late chunking 2025-03-12 15:13:32 +08:00
Han Xiao
013056f218 feat: late chunking 2025-03-12 14:07:11 +08:00
Han Xiao
c8fc259dff refactor: pull url out 2025-03-11 21:30:59 +08:00
Han Xiao
c30043e119 fix: eval 2025-03-11 17:56:39 +08:00
Han Xiao
ea42af3101 fix: eval 2025-03-11 17:09:45 +08:00
Han Xiao
05ddb30d80 refactor: query rewriter 2025-03-11 15:34:00 +08:00
Han Xiao
d947973a68 refactor: query rewriter 2025-03-11 15:10:08 +08:00
Han Xiao
1e097a9ecc fix: url datetime guessing 2025-03-07 14:32:47 +08:00
Han Xiao
8b836431af fix: url datetime guessing 2025-03-07 13:43:14 +08:00
Han Xiao
1604013788 fix: url datetime guessing 2025-03-06 17:24:25 +08:00
Han Xiao
dbeee0c8f5 fix: url datetime guessing 2025-03-06 17:15:46 +08:00
Han Xiao
d9bfc2fd1f feat: improve url ranking, fix eval bugs 2025-03-06 14:17:56 +08:00
Han Xiao
5df8d8a9c6 fix: weighted urls and hostnames 2025-03-05 10:58:52 +08:00
Han Xiao
51ad77d302 feat: add url ranking 2025-03-04 16:29:22 +08:00