Chauncey
|
d77f7fb871
|
[Bugfix]: Fix TypeError: 'float' object cannot be interpreted as an integer (#19283)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-06-08 08:16:31 +08:00 |
|
Luka Govedič
|
2d8476e465
|
[BugFix][V1] Fix memory profiling bug (#18974)
Signed-off-by: luka <luka@neuralmagic.com>
|
2025-06-07 10:34:51 -07:00 |
|
pramenku
|
88be823d57
|
[AMD] Update compatible packaging version (#19309)
Signed-off-by: pramkuma <Pramendra.Kumar@amd.com>
|
2025-06-07 20:55:09 +08:00 |
|
Lifans
|
4e4f63ad45
|
[Nit][Benchmark]Fix example in benchmark_serving_structured_output.py (#19311)
Signed-off-by: Lifan Shen <lifans@meta.com>
|
2025-06-07 18:25:38 +08:00 |
|
Isotr0py
|
d2f0e7e615
|
[CI/Build] Improve Llama GGUF test robustness (#19287)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-06-07 17:23:28 +08:00 |
|
Reid
|
122cdca5f6
|
[Misc] refactor context extension (#19246)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-07 05:13:21 +00:00 |
|
Driss Guessous
|
cf02f9b283
|
Add FlexAttention to V1 (#16078)
Signed-off-by: drisspg <drisspguessous@gmail.com>
|
2025-06-06 21:58:55 -07:00 |
|
Aaruni Aggarwal
|
c4296b1a27
|
[CI][PowerPC] Use a more appropriate way to select testcase in tests/models/language/pooling/test_embedding.py (#19253)
Signed-off-by: Aaruni Aggarwal <aaruniagg@gmail.com>
|
2025-06-07 11:52:52 +08:00 |
|
QiliangCui
|
66c508b137
|
[TPU][Test] Add script to run benchmark on TPU for buildkite (#19039)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
|
2025-06-06 20:10:24 -07:00 |
|
ElizaWszola
|
84166fee97
|
[Kernel] Integrate CUTLASS MoE kernel with PPLX (#18762)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-06-06 18:26:11 -07:00 |
|
Lu Fang
|
6e0cd10f72
|
[Easy][Test] Simplify test_function_tool_use with multiple parametrizes (#19269)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-06-07 09:19:09 +08:00 |
|
Alexei-V-Ivanov-AMD
|
e010688f50
|
[Build][ROCm] Update Dockerfile.rocm (#19296)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
|
2025-06-06 19:35:16 -04:00 |
|
Chenyaaang
|
441b65d8c7
|
[Misc][Tools][Benchmark] Fix and improve auto tune script (#19163)
Signed-off-by: Chenyaaang <chenyangli@google.com>
|
2025-06-06 23:31:19 +00:00 |
|
Nick Hill
|
46ecc57973
|
[BugFix] Fix tpu_model_runner block_id concatenation (#19228)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-06-06 16:28:17 -07:00 |
|
Nicolò Lucchesi
|
b6a3a9f76d
|
[Core] Fix abrupt request abort (#18485)
Signed-off-by: nicklucche <nlucches@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-06-06 16:27:59 -07:00 |
|
Adolfo Victoria
|
ca27f0f9c1
|
[Bugfix][Core] Update cancellation logic in generate() to handle Generator exits (#19225)
Co-authored-by: Adolfo Victoria <adovi@meta.com>
|
2025-06-06 20:17:54 +00:00 |
|
Nick Hill
|
aad30bd306
|
[BugFix] Fix MultiConnector test after HMA changes (#19291)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-06-06 20:16:24 +00:00 |
|
Nishidha
|
94ecee6282
|
Fixed ppc build when it runs on non-RHEL based linux distros (#18422)
Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com>
Signed-off-by: Md. Shafi Hussain <Md.Shafi.Hussain@ibm.com>
Signed-off-by: npanpaliya <nishidha.panpaliya@partner.ibm.com>
Co-authored-by: Md. Shafi Hussain <Md.Shafi.Hussain@ibm.com>
|
2025-06-06 11:54:26 -07:00 |
|
Yu Guo
|
8267f9916f
|
improve logits bias (#19041)
|
2025-06-06 19:59:25 +08:00 |
|
jmswen
|
7353492a47
|
[Core] Raise when non-multi-instance DP clients target a DP rank (#19227)
Signed-off-by: Jon Swenson <jmswen@gmail.com>
|
2025-06-06 19:03:01 +08:00 |
|
Jee Jee Li
|
7661e92ef8
|
[Model] Optimize nemotron_h implementation (#19249)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-06-06 10:05:14 +00:00 |
|
Siqi Yan
|
f168b85725
|
Unit Test for run_dp_sharded_vision_model (#19103)
Signed-off-by: Siqi Yan <siqi@meta.com>
Co-authored-by: Siqi Yan <siqi@meta.com>
|
2025-06-06 16:24:02 +08:00 |
|
Richard Zou
|
da511d54d8
|
Fix CompilationConfig repr (#19091)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-06-06 16:23:35 +08:00 |
|
Nick Hill
|
65c69444b1
|
[Docs] Improve V1 KVConnector interface documentation (#19172)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-06-06 16:22:45 +08:00 |
|
Dipika Sikka
|
94870359cd
|
[Quantization] Bump compressed-tensors version; update NVFP4A16 test model (#19224)
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>
|
2025-06-06 01:21:54 -07:00 |
|
Chengji Yao
|
0d49483ea9
|
[TPU] fix kv cache dtype in model runner (#19244)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-06-06 16:20:16 +08:00 |
|
Jinghui Zhang
|
90b78ec5f9
|
[v1][P/D] Fix a edge case in kv cache schedule (#19182)
Co-authored-by: jinghui <jinghui@fb.com>
|
2025-06-05 23:32:55 -07:00 |
|
Aaron Pham
|
91a2ef98ea
|
[Chore] update CODEOWNERS (#19247)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-06-06 06:09:43 +00:00 |
|
Xu Song
|
3da2313d78
|
Support allowed_token_ids in ChatCompletionRequest (#19143)
Signed-off-by: Xu Song <xusong.vip@gmail.com>
|
2025-06-06 05:06:48 +00:00 |
|
Chengji Yao
|
b61dc5f972
|
[TPU] update torch_xla pin (#19231)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-06-06 04:27:38 +00:00 |
|
Chen Zhang
|
f8a1a2d108
|
[v1] Hybrid Memory Allocator (#17996)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-06-05 20:47:09 -07:00 |
|
Benjamin Chislett
|
3465b87ef8
|
[Bugfix] Fix EAGLE vocab embedding construction for Llama 70B (#19033)
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
|
2025-06-05 19:10:08 -07:00 |
|
Jerry Zhang
|
c8134bea15
|
Fix AOPerModuleConfig name changes (#18869)
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
|
2025-06-05 18:51:32 -07:00 |
|
Luis Vega
|
cb6d572e85
|
[Model] NemotronH support (#18863)
Signed-off-by: Luis Vega <2478335+vegaluisjose@users.noreply.github.com>
Co-authored-by: Luis Vega <2478335+vegaluisjose@users.noreply.github.com>
|
2025-06-05 21:29:28 +00:00 |
|
Michael Goin
|
87360308b7
|
[V1] Use FlashInfer by default on Blackwell GPUs (#19118)
|
2025-06-05 15:40:39 -04:00 |
|
Dipika Sikka
|
aa49f14832
|
[Quantization] Skip Fp4 Test for compressed-tensors (#19217)
|
2025-06-05 18:21:53 +00:00 |
|
Nicolò Lucchesi
|
9ef9173cfa
|
[P/D][NixlConnector] Enable FlashInfer backend (#19090)
|
2025-06-05 17:10:15 +00:00 |
|
Povilas Kanapickas
|
85e2b7bb13
|
[MISC][Bugfix] Use less CPU when message queue has been empty for some time (#16226)
Signed-off-by: Povilas Kanapickas <povilas@radix.lt>
|
2025-06-05 16:53:08 +00:00 |
|
Chiyue Wei
|
61059bee40
|
[Hardware][NVIDIA] FP4 MoE kernel optimization (#19110)
Signed-off-by: Chiyue Wei <chiyuew@nvidia.com>
Co-authored-by: Chiyue Wei <chiyuew@nvidia.com>
|
2025-06-05 09:48:26 -07:00 |
|
Xu Wenqing
|
ec89524f50
|
Add H20-3e fused MoE kernel tuning configs for DeepSeek-R1/V3 (#19205)
|
2025-06-05 16:38:54 +00:00 |
|
Patrick von Platen
|
f20f9f063b
|
[mistral_common] Add v11 tokenizer (#19193)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
|
2025-06-05 08:27:41 -07:00 |
|
Guillaume Calmettes
|
9bc8bb07cf
|
[Bugfix] properly catch PIL-related errors for vision models when incorrect data urls are provided (#19202)
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
|
2025-06-05 12:59:28 +00:00 |
|
Reid
|
1aeb925f34
|
[Frontend] improve vllm run-batch --help display (#19187)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-05 11:16:25 +00:00 |
|
22quinn
|
188a4590d8
|
[Misc] Do not override NCCL_CUMEM_ENABLE if set explicitly (#19105)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-06-05 11:14:32 +00:00 |
|
vllmellm
|
18093084be
|
[Misc] Remove unnecessary fallback to prefill-decode attention (#19138)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-06-05 16:08:26 +08:00 |
|
Simon Mo
|
da40380214
|
[Build] Annotate wheel and container path for release workflow (#19162)
Signed-off-by: simon-mo <simon.mo@hey.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-06-04 23:24:56 -07:00 |
|
Chauncey
|
8fc57501d3
|
[Bugfix]: Fix the incompatibility issue with stream when Thinking is disabled (#19135)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-06-05 06:24:24 +00:00 |
|
Woosuk Kwon
|
af7fc84fd2
|
[BugFix][Minor] Fix full cuda graph bug when max_num_seqs < 512 (#19171)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-06-05 13:41:25 +08:00 |
|
Huy Do
|
0678b52251
|
Handle non-serializable objects when dumping benchmark results (#19114)
|
2025-06-04 22:40:04 -07:00 |
|
Yang Wang
|
25b918eee6
|
[Torch Nightly]add missing dependency (#18770)
Signed-off-by: Yang Wang <elainewy@meta.com>
|
2025-06-04 21:56:12 -07:00 |
|