Commit Graph

1670 Commits

Author SHA1 Message Date
Angela Yi
d5dbdbfcb2 [docs] Fix cudagraph mode config (#29170)
Signed-off-by: angelayi <yiangela7@gmail.com>
2025-11-21 17:10:27 -08:00
Julien Denize
57430fc95c Default model load/config/tokenizer to mistral format if relevant files exist (#28659)
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-11-21 13:58:59 -08:00
wangxiyuan
4050bae417 [Doc] Update plugin doc (#28532)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-11-21 14:57:26 +00:00
Cyrus Leung
9452863088 Revert "Revert #28875 (#29159)" (#29179)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-21 04:27:43 -08:00
Cyrus Leung
aab0102a26 [V0 deprecation] Remove more V0 references (#29088)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-21 11:56:59 +00:00
Cyrus Leung
4d7231e774 Revert #28875 (#29159) 2025-11-21 01:40:17 -08:00
Cyrus Leung
56e96b37e4 [V0 Deprecation] Remove best_of (#29090)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-21 11:40:40 +08:00
Qidong Su
698024ecce [Doc] update installation guide regarding aarch64+cuda pytorch build (#28875)
Signed-off-by: Qidong Su <soodoshll@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-11-20 19:40:25 -08:00
jeremyteboul
0730414999 [Core] Add audio_embeds support to chat completions (#29059)
Signed-off-by: Jeremy Teboul <jeremyteboul@fb.com>
Co-authored-by: Jeremy Teboul <jeremyteboul@fb.com>
2025-11-21 11:39:47 +08:00
Michael Goin
87cbbdff63 Update model references for OLMo3 (#29099)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-11-21 09:16:52 +08:00
Rob Mulla
dd39f91edb [Doc] cleanup TPU documentation and remove outdated examples (#29048)
Signed-off-by: Rob Mulla <rob.mulla@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-21 00:05:59 +00:00
Shinichi Hemmi
c9e093116c [MODEL] Implement plamo3 (#28834)
Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com>
2025-11-20 03:00:19 -08:00
Shanshan Shen
d44e9df7d4 [Model][Mamba] Add selector for mamba attention backend and make it pluggable for other device (#26487)
Signed-off-by: shen-shanshan <467638484@qq.com>
2025-11-19 16:24:55 +00:00
Harry Mellor
4f5299f717 Relax Transformers modeling backend MoE experts check (#28952)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-19 21:50:30 +08:00
Didier Durand
09540cd918 [Doc]: fix typos in various files (#29010)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-11-19 04:56:21 -08:00
Harry Mellor
97cfa99d59 [Docs] Take env var definition out of folded admonition (#29005)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-19 03:32:04 -08:00
Michael Yao
fdf93486d6 [Docs] Clean up moe_kernel_features.md (#28530)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-11-19 02:35:29 -08:00
Louie Tsai
ae4821a108 Add CPU support model (#28697)
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
2025-11-18 23:47:57 -08:00
Didier Durand
7ed27f3cb5 [Doc]: fix typos in various files (#28945)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-11-18 22:52:30 -08:00
Uranus
6a25ea5f0e [Docs] Update oneshot imports (#28188)
Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com>
2025-11-19 05:30:08 +00:00
Li, Jiang
20852c8f4c [CPU] Refactor CPU WNA16 (#28826)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-11-19 10:32:00 +08:00
Kevin H. Luu
c64c0b78de [chore] Move the rest of wikimedia url to S3 (#28921)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-18 09:44:18 -08:00
Didier Durand
083cf326dc [Doc]: fix typos in various files (#28863)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-11-17 20:32:14 -08:00
Pranav
f77bce001a [Model] Add Afmoe architecture implementation (#28332)
Signed-off-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>
Signed-off-by: Pranav <veldurthipranav@gmail.com>
Co-authored-by: Maziyar Panahi <maziyar.panahi@iscpif.fr>
2025-11-17 15:11:20 -08:00
Jee Jee Li
3380ed5e11 [Doc] Add llama4 LoRA tag (#28825)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-11-17 14:08:48 +08:00
Didier Durand
63fed55506 [Doc]: fix typos in various files (#28811)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-11-16 14:30:06 +00:00
Didier Durand
2bb4435cb7 [Doc]: fix typos in various files (#28567)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-11-15 19:27:50 +00:00
Cyrus Leung
89d3679221 [Doc] Fix failing doc build (#28772)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-15 05:33:27 -08:00
tingtinggithub
cb15ee28db Allow Gemma3 to take image embeddings (#28483)
Signed-off-by: tingtinggithub <streamttt@gmail.com>
2025-11-15 04:18:08 -08:00
Harry Mellor
67187554dd [Docs] Enable some more markdown lint rules for the docs (#28731)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-14 18:39:19 +00:00
Chen Wang
9261eb3dc1 docs(lora_resolvers): clarify multi-resolver order and storage path requirement (#28153)
Signed-off-by: Chen Wang <Chen.Wang1@ibm.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-14 18:08:30 +00:00
Julien Denize
085424808e Remove audio optional dependency for mistral-common (#28722)
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-14 09:54:38 -08:00
Harry Mellor
5f3cd7f7f2 [Docs] Update the name of Transformers backend -> Transformers modeling backend (#28725)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-14 16:34:14 +00:00
Fasal Shah
8d3748d3c7 [Doc] Fix macOS installation dependency resolution issue (#26721)
Signed-off-by: faisal shah <fashah@redhat.com>
2025-11-14 12:43:56 +00:00
Chauncey
5c9ad138d5 [Frontend] supports interleaved thinking (#28531)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-11-13 16:14:13 +08:00
Harry Mellor
97d1c99302 Rename clashing method names for vLLM model protocol (#27583)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-12 19:14:33 -08:00
Harry Mellor
3226283461 [Docs] Add some details about what the MoE block needs for the Transformers backend (#28588)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-13 03:12:14 +00:00
Michael Goin
52eadcec9e [Docs] Update meetups.md description (#28583)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-11-13 00:00:23 +00:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
4ca5cd5740 [Core][AMD] Migrate fully transparent sleep mode to ROCm platform (#12695)
Signed-off-by: Hollow Man <hollowman@opensuse.org>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: kliuae <kuanfu.liu@embeddedllm.com>
2025-11-12 15:24:12 -08:00
Benjamin Chislett
304419576a [Perf] Refactor cudagraph_support to enable full CUDA graphs for spec decoding with FlashInfer (#28479)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
2025-11-13 01:56:40 +09:00
Harry Mellor
a742134cc5 Remove deprecated fields from CompilationConfig (#27593)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-12 16:10:28 +00:00
Chenguang Zheng
4ccffe561f [Core] Encoder separation for Encode-Prefill-Decode Disaggregation (#25233)
Signed-off-by: n00909098 <nguyen.kha.long@huawei.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: herotai214 <herotai214@gmail.com>
Signed-off-by: Khuong Le <khuong.le.manh@huawei.com>
Signed-off-by: Khuong Le <lemanhkhuong2611@gmail.com>
Co-authored-by: n00909098 <nguyen.kha.long@huawei.com>
Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Co-authored-by: herotai214 <herotai214@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Khuong Le <khuong.le.manh@huawei.com>
Co-authored-by: Khuong Le <lemanhkhuong2611@gmail.com>
2025-11-11 18:58:33 -08:00
Li, Jiang
7f829be7d3 [CPU] Refactor CPU attention backend (#27954)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-11-12 09:43:06 +08:00
Michael Goin
28534b92b9 Add Zurich vLLM Meetup (#28488)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-11-11 14:53:59 -08:00
xuebwang-amd
05576df85c [ROCm][Quantization] extend AMD Quark to support mixed-precision quantized model (#24239)
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Co-authored-by: fxmarty-amd <felmarty@amd.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-11 12:05:22 -05:00
the-codeboy
287bbbeb06 [Doc] Fix typo in serving docs (#28474)
Signed-off-by: the-codeboy <71213855+the-codeboy@users.noreply.github.com>
2025-11-11 16:45:49 +00:00
Maryam Tahhan
fa1970201d [Docs] Fix grammar in CPU installation guide (#28461)
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
2025-11-11 14:01:11 +00:00
iAmir97
a7adbc6c6b [Doc] Sleep mode documentation (#28357)
Signed-off-by: Amir Balwel <amir.balwel@embeddedllm.com>
Signed-off-by: iAmir97 <71513472+iAmir97@users.noreply.github.com>
Co-authored-by: Amir Balwel <amir.balwel@embeddedllm.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-10 22:44:35 -08:00
vllmellm
f080a83511 [RFC][ROCm][AITER] Keep all AITER kernels in _aiter_ops class like _custom_ops and _ipex_ops (#24490)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-11-10 08:20:53 -08:00
Kevin H. Luu
05f8d69077 [chore] Move some wikimedia images to S3 (#28351)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
2025-11-09 01:58:26 +00:00