wang.yuqi
|
4e2ab1861d
|
[CI Failure] pin nomic-embed-text-v1 revision (#39292)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-04-08 11:43:06 +00:00 |
|
Ilya Boytsov
|
6e1100889e
|
fix(test): recompute Jina ColBERT rotary inv_freq cleared by transformers v5 weight loader (#39176)
Signed-off-by: Ilya Boytsov <ilyaboytsov1805@gmail.com>
|
2026-04-07 22:40:55 +08:00 |
|
wang.yuqi
|
d9d21eb8e3
|
[Frontend][3/n] Improve pooling entrypoints | scoring. (#28631)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-03-31 07:52:00 +00:00 |
|
haosdent
|
a08b7733fd
|
[CI] Fix SPLADE pooler test broken by #38139 (#38495)
Signed-off-by: haosdent <haosdent@gmail.com>
|
2026-03-30 07:48:33 +00:00 |
|
wang.yuqi
|
dcdc145893
|
[CI] Reorganize scoring tests (#38207)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-03-26 12:07:01 +00:00 |
|
Ilya Boytsov
|
8b6c6b9505
|
[Model] Add LFM2-ColBERT-350M support (#37528)
Signed-off-by: Ilya Boytsov <ilyaboytsov1805@gmail.com>
|
2026-03-20 14:57:57 +00:00 |
|
whyiug
|
1ce13cf992
|
[Model] Add support for BERT-like Chinese ERNIE pooling models (#36385)
Signed-off-by: whyiug <whyiug@hotmail.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-03-13 03:23:53 +00:00 |
|
wang.yuqi
|
6ecabe4936
|
[CI Failure] Fix Language Models Test (Extended Pooling) daily CI Failure (#36761)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-03-12 12:22:05 +08:00 |
|
Angela Yi
|
13e79fc811
|
[ci] Update rtol for test_classification (#36556)
Signed-off-by: angelayi <yiangela7@gmail.com>
Co-authored-by: Richard Zou <zou3519@users.noreply.github.com>
|
2026-03-11 03:08:16 -07:00 |
|
Andreas Karatzas
|
a1ffa56a1e
|
[CI] Fix bge-m3 similarity reference values after *Defination* typo fix (#36208)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-06 05:07:29 +00:00 |
|
Jiayi Yan
|
6a895197fa
|
[Bugfix][CI] fix typos (#34934)
Signed-off-by: 1195343015 <1195343015@qq.com>
Signed-off-by: Jiayi Yan <66017932+1195343015@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-05 17:05:46 +00:00 |
|
Andreas Karatzas
|
89358f0d35
|
[CI] Fix ColBERT HF comparison tests on AMD CI + refactor (#34567)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-20 20:12:05 -08:00 |
|
Ilya Boytsov
|
071d863e20
|
Extend ColBERT support to non-standard BERT backbones (#34170)
Signed-off-by: Ilya Boytsov <ilya.boytsov@aleph-alpha.com>
|
2026-02-13 09:53:09 +00:00 |
|
Cyrus Leung
|
b96f7314b4
|
[Refactor] Pass Renderer to Input Processor (#34329)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-11 19:38:11 -08:00 |
|
wang.yuqi
|
1c3a221d3b
|
[Bugfix] Fix corner case of sparse embedding (#33886)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-02-05 02:51:22 -08:00 |
|
Ilya Boytsov
|
439afa4eea
|
feat: Add ColBERT late interaction model support (#33686)
Signed-off-by: Ilya Boytsov <ilyaboytsov1805@gmail.com>
Signed-off-by: Ilya Boytsov <boytsovpanamera@mail.ru>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-02-05 08:05:13 +08:00 |
|
Cyrus Leung
|
80f921ba4b
|
[Bugfix] Fix normalize still being passed to PoolerConfig (#33794)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-04 23:56:02 +08:00 |
|
Cyrus Leung
|
f0a1c8453a
|
[Frontend] Use new Renderer for Completions and Tokenize API (#32863)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-31 04:51:15 -08:00 |
|
Andreas Karatzas
|
6c00645712
|
[CI][Pooling] Stabilize ModernBERT test (#32909)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-27 05:26:48 +00:00 |
|
Maximilien de Bayser
|
ff365eea94
|
Support bge-m3 sparse embeddings and colbert embeddings (#14526)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
|
2026-01-22 23:52:57 +08:00 |
|
Isotr0py
|
8cc26acd8b
|
[Performance] Improve Triton prefill attention kernel's performance (#32403)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-17 20:19:59 -08:00 |
|
Cyrus Leung
|
232214b2ae
|
[Bugfix] Replace PoolingParams.normalize with use_activation (#32243)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-13 10:45:42 +00:00 |
|
Cyrus Leung
|
583a90e005
|
[Refactor] Separate sequence and token pooling types (#32026)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-10 04:53:24 +00:00 |
|
Divakar Verma
|
a1648c4045
|
[ROCm][CI] Fix test_token_classification.py::test_bert_models (#31993)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
|
2026-01-09 04:04:33 +00:00 |
|
Andreas Karatzas
|
2a42ae790d
|
[ROCm][CI] Fix ModernBERT token classification test numerical accuracy on ROCm (#31820)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-06 23:21:15 +00:00 |
|
Isotr0py
|
ee2e69d6cd
|
[Bugfix][CI/Build] Fix failing pooling models test due to Triton kernel accuracy diff (#31776)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-06 00:44:22 -08:00 |
|
Isotr0py
|
367856de14
|
[CI/Build] Revive skipped reward models e2e test (#31665)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-05 02:33:46 +00:00 |
|
Andreas Karatzas
|
013b54088c
|
[ROCm][CI] Fix ModernBERT token classification test (#31612)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-02 04:19:08 +00:00 |
|
wang.yuqi
|
4429d934de
|
[Model] Automatic conversion of TokenClassification model (#30666)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2025-12-15 08:13:00 +00:00 |
|
Cyrus Leung
|
d917747c95
|
[Bugfix] Fix task still being passed in tests/benchmarks (#30476)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-11 10:33:55 +00:00 |
|
Cyrus Leung
|
e83b7e379c
|
Revert "[Renderer] Separate out RendererConfig from ModelConfig (#30145)" (#30199)
|
2025-12-07 00:00:22 -08:00 |
|
Cyrus Leung
|
27f4c2fd46
|
[Renderer] Separate out RendererConfig from ModelConfig (#30145)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-06 23:15:42 -08:00 |
|
wang.yuqi
|
74c4d80c6c
|
[Model][6/N] Improve all pooling task | Support chunked prefill with ALL pooling (#27145)
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-12-04 13:44:15 +00:00 |
|
wang.yuqi
|
f4b76056ee
|
Improve enable chunked_prefill & prefix_caching logic. (#26623)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-27 22:05:48 -08:00 |
|
Harry Mellor
|
a8b70304d6
|
Update rope_scaling to rope_parameters in preparation for Transformers v5 (#28542)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-19 09:06:36 -08:00 |
|
Kevin H. Luu
|
c64c0b78de
|
[chore] Move the rest of wikimedia url to S3 (#28921)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-11-18 09:44:18 -08:00 |
|
wang.yuqi
|
a55b64635c
|
[Model] Allow users to control skip reading cache per request. (#28194)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-11-16 00:04:50 -08:00 |
|
Andreas Karatzas
|
9f0247cfa4
|
VLLM_USE_TRITON_FLASH_ATTN V0 variable deprecation (#27611)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Andreas Karatzas <Andreas.Karatzas@amd.com>
|
2025-11-11 18:34:36 -08:00 |
|
Li, Jiang
|
7f829be7d3
|
[CPU] Refactor CPU attention backend (#27954)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-11-12 09:43:06 +08:00 |
|
wang.yuqi
|
4464723f22
|
[Frontend][Doc][5/N] Improve all pooling task | Polish encode (pooling) api & Document. (#25524)
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-10-30 12:13:05 +00:00 |
|
wang.yuqi
|
3729ed00ba
|
[Model] Add num_cached_tokens for PoolingRequestOutput (#27378)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-10-23 14:03:42 +08:00 |
|
wang.yuqi
|
f54f85129e
|
[Model][2/N] Improve all pooling task | Support multi-vector retrieval (#25370)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-10-15 11:14:41 +00:00 |
|
wang.yuqi
|
767c3ab869
|
[Model][0/N] Improve all pooling task | clean up (#25817)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-10-13 16:44:50 +08:00 |
|
gjgjos
|
18ed7746ea
|
[Feature] Add support for naver/splade-v3 (BERT-based sparse embedding model) (#26339)
Signed-off-by: gjgjos <gjgjos@naver.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-10-12 17:00:52 +00:00 |
|
Harry Mellor
|
8fcaaf6a16
|
Update Optional[x] -> x | None and Union[x, y] to x | y (#26633)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-12 09:51:31 -07:00 |
|
Cyrus Leung
|
0f29dca988
|
[CI/Build] Fix model nightly tests (#26466)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-08 23:44:16 -07:00 |
|
antrec
|
6f59beaf0b
|
[Model] Add support for ModernBertForTokenClassification (#26340)
Signed-off-by: Antoine Recanati Le Goat <antoine.recanati@sancare.fr>
Signed-off-by: antrec <antoine.recanati@gmail.com>
Co-authored-by: Antoine Recanati Le Goat <antoine.recanati@sancare.fr>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-10-07 14:29:19 +00:00 |
|
Harry Mellor
|
d6953beb91
|
Convert formatting to use ruff instead of yapf + isort (#26247)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-05 07:06:22 -07:00 |
|
Woosuk Kwon
|
52c2a8d4ad
|
[V0 Deprecation] Remove LLMEngine (#25033)
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-20 17:56:30 -07:00 |
|
Harry Mellor
|
058525b997
|
Move PoolerConfig from config/__init__.py to config/pooler.py (#25181)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-19 11:02:55 +00:00 |
|