TJian
|
916836bbfb
|
[FEAT] [ROCm] [Embedding] Add encoder-only model support into ROCm Flash Attention to enable embedding models. (#14664)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-03-12 09:31:19 -07:00 |
|
Woosuk Kwon
|
c0c25e25fa
|
[Model] Add support for Gemma 3 (#14660)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-12 08:36:33 -07:00 |
|
Li, Jiang
|
ff47aab056
|
[CPU] Upgrade CPU backend to torch-2.6 (#13381)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-03-12 10:41:13 +00:00 |
|
Pavani Majety
|
debd6bbf09
|
[Kernel] Add ModelOpt FP4 Checkpoint Support (#12520)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2025-03-12 05:13:11 +00:00 |
|
Benjamin Chislett
|
5c538c37b2
|
[V1][Bugfix][Spec Decode] Fix incorrect outputs in V1 speculative decoding due to batch indexing (#14645)
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
|
2025-03-11 22:12:41 -07:00 |
|
Szymon Ożóg
|
e22ee1e7a2
|
[Kernel] GGUF MoE kernel (#14613)
Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com>
|
2025-03-12 03:33:27 +00:00 |
|
Isotr0py
|
e392d85831
|
[Core] Refactor QKVCrossParallelLinear implementation to support BNB 4-bit quantization (#14545)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-03-11 20:12:52 -07:00 |
|
Aaron Pham
|
77a318bd01
|
[V1][Core] Support MistralTokenizer for Structured Output (#14625)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-03-12 10:40:09 +08:00 |
|
Farzad Abdolhosseini
|
80e78d02ac
|
[Model] Extend Ultravox to accept audio longer than 30s (#13631)
Signed-off-by: Farzad Abdolhosseini <farzad@fixie.ai>
|
2025-03-12 10:27:10 +08:00 |
|
Joe Runde
|
47532cd9f4
|
[core][V1] pluggable scheduler (#14466)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2025-03-12 01:15:15 +00:00 |
|
Russell Bryant
|
4bf82d4b90
|
[V1] Add regex structured output support with xgrammar (#14590)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-11 23:03:44 +08:00 |
|
Cyrus Leung
|
af295e9b01
|
[Bugfix] Update --hf-overrides for Alibaba-NLP/gte-Qwen2 (#14609)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-11 07:59:43 -07:00 |
|
Jeff Daily
|
a1c8f3796c
|
dynamic distpatch of fp8 kernels (#14245)
Signed-off-by: Jeff Daily <jeff.daily@amd.com>
|
2025-03-11 10:54:56 -04:00 |
|
Roger Wang
|
1fc973c0b5
|
[V1][Core] Fix memory issue with logits & sampling (#14508)
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Varun Sundar Rabindranath <3337719+varun-sundar-rabindranath@users.noreply.github.com>
|
2025-03-11 04:03:41 +00:00 |
|
Liangfu Chen
|
c91b64f749
|
[neuron] add reshape_and_cache (#14391)
|
2025-03-10 18:37:29 -07:00 |
|
gnovack
|
d6123170d5
|
[Neuron] Add Neuron device communicator for vLLM v1 (#14085)
|
2025-03-10 18:37:04 -07:00 |
|
Varun Sundar Rabindranath
|
5ff0d32580
|
[V1] LoRA - Add triton kernels for V1 (#13096)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2025-03-10 17:27:53 -04:00 |
|
Harry Mellor
|
3b352a2f92
|
Correct capitalisation: VLLM -> vLLM (#14562)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-10 16:36:21 +00:00 |
|
Szymon Ożóg
|
89cdaa83e7
|
[Kernel] Add more dtype support for GGUF kernels (#14043)
Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com>
Signed-off-by: SzymonOzog <szymon.ozog@gmail.com>
|
2025-03-10 07:30:04 -07:00 |
|
Robert Shaw
|
5f0b53c6ea
|
Revert "[V1][Core] Fix memory issue with logits & sampling" (#14504)
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2025-03-08 17:43:37 -08:00 |
|
22quinn
|
eb8b5eb183
|
[V1] Support bad_words in sampler (#13376)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-03-08 14:50:26 -08:00 |
|
Isotr0py
|
609ef61fea
|
[Bugfix] Fix profiling OOM and decouple encoder multimodal profiling (#14361)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-03-08 16:52:34 +00:00 |
|
Roger Wang
|
8d5aa466fb
|
[V1][Core] Fix memory issue with logits & sampling (#13776)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-03-08 06:11:04 -08:00 |
|
Alexander Matveev
|
cb8bdfade2
|
[V1] TPU - Add tensor parallel support via Ray (#13618)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
|
2025-03-08 08:19:38 -05:00 |
|
Cyrus Leung
|
33f227e16b
|
[CI/Build] Use a fixed seed to avoid flaky tests (#14480)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-08 11:30:09 +00:00 |
|
Harry Mellor
|
47512b3200
|
Default to generation_config from model (#12622)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-08 14:46:15 +08:00 |
|
afeldman-nm
|
ef64044079
|
[V1] Prompt logprobs + APC compatibility; prompt logprobs reqs cannot fill APC (#13949)
|
2025-03-08 01:48:12 +00:00 |
|
Nick Hill
|
8ed5421aaa
|
[V1] Eagerly remove finished requests from the batch (#14388)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-03-07 10:56:00 -08:00 |
|
Jinzhen Lin
|
d0feea31c7
|
[Kernel] optimize performance of gptq marlin kernel when n is small (#14138)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
|
2025-03-07 11:53:38 -05:00 |
|
Aaron Pham
|
80e9afb5bc
|
[V1][Core] Support for Structured Outputs (#12388)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-03-07 07:19:11 -08:00 |
|
மனோஜ்குமார் பழனிச்சாமி
|
cc10281498
|
[Misc] Set default value of seed to None (#14274)
Signed-off-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com>
|
2025-03-07 10:40:01 +00:00 |
|
Jee Jee Li
|
12c29a881f
|
[Bugfix] Further clean up LoRA test (#14422)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-07 10:30:55 +00:00 |
|
Ilya Lavrenov
|
8ca7a71df7
|
OpenVINO: added CPU-like conditions (#14338)
Signed-off-by: Ilya Lavrenov <ilya.lavrenov@intel.com>
|
2025-03-06 22:24:49 -08:00 |
|
Jee Jee Li
|
ddd1ef66ec
|
[Bugfix] Fix JambaForCausalLM LoRA (#14370)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-06 22:05:47 -08:00 |
|
Luka Govedič
|
e1744502c2
|
[FP8] Refactor apply_fp8_linear and apply_fp8_linear_generic into an object (#14390)
Signed-off-by: luka <luka@neuralmagic.com>
|
2025-03-07 05:20:16 +00:00 |
|
Himanshu Jaju
|
cd579352bf
|
[V1] Do not detokenize if sampling param detokenize is False (#14224)
Signed-off-by: Himanshu Jaju <hj@mistral.ai>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-03-06 10:40:24 -08:00 |
|
Harry Mellor
|
bf0560bda9
|
Reinstate best_of for V0 (#14356)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-06 08:34:22 -08:00 |
|
Thomas Parnell
|
6bd1dd9d26
|
[Kernel] [V1] Improved performance for V1 Triton (ROCm) backend (#14152)
|
2025-03-06 07:39:16 -08:00 |
|
Nicolò Lucchesi
|
fa82b93853
|
[Frontend][Docs] Transcription API streaming (#13301)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-03-06 10:39:35 +00:00 |
|
kYLe
|
1769928079
|
[Model] Update Paligemma multimodal processing with PromptUpdate (#14015)
Signed-off-by: Kyle Huang <kylhuang@nvidia.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-03-06 08:31:38 +00:00 |
|
Nicolò Lucchesi
|
5ee10e990d
|
[Bugfix][CI] ALiBi test case in xformers multi_query_kv_attention (#11301)
|
2025-03-05 20:00:53 -08:00 |
|
Varun Sundar Rabindranath
|
3dbd2d813a
|
[V1] LoRA - Enable more V1 tests (#14315)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2025-03-06 11:55:42 +08:00 |
|
Lucas Wilkinson
|
f6bb18fd9a
|
[BugFix] MLA + V1, illegal memory access and accuracy issues (#14253)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-03-05 17:10:13 -08:00 |
|
Lu Fang
|
53ea6ad830
|
[V1][Easy] Add empty allowed_token_ids in the v1 sampler test (#14308)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-03-05 21:41:18 +00:00 |
|
Vincent
|
a4f1ee35d6
|
Deprecate best_of Sampling Parameter in anticipation for vLLM V1 (#13997)
Signed-off-by: vincent-4 <vincentzhongy+githubvincent4@gmail.com>
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-05 20:22:43 +00:00 |
|
Robert Shaw
|
257e200a25
|
[V1][Frontend] Add Testing For V1 Runtime Parameters (#14159)
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
|
2025-03-05 14:18:55 +00:00 |
|
Benjamin Chislett
|
32985bed7c
|
[Frontend] Allow return_tokens_as_token_ids to be passed as a request param (#14066)
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
|
2025-03-05 06:30:40 +00:00 |
|
Michael Goin
|
dae9ec464c
|
Temporarily disable test_awq_gemm_opcheck (#14251)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-03-05 06:10:35 +00:00 |
|
Tyler Michael Smith
|
72c62eae5f
|
[V1] EP/TP MoE + DP Attention (#13931)
|
2025-03-04 21:27:26 -08:00 |
|
Congcong Chen
|
0a995d5434
|
[Model] New model support for Phi-4-multimodal-instruct (#14119)
|
2025-03-04 20:57:01 -08:00 |
|