quanliu
|
92183b41f3
|
[Bugfix][Core] Prefix caching causes incorrect outputs due to outdated ComputedBlocksTracker (#18957)
Signed-off-by: 刘全 <quan.liu2@dbappsecurity.com.cn>
Co-authored-by: 刘全 <quan.liu2@dbappsecurity.com.cn>
|
2025-06-15 21:56:37 -07:00 |
|
22quinn
|
0b73736a0d
|
[Kernel] Raise verbose error and consolidate num_heads/num_kv_heads divisibility check (#19339)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-06-15 13:43:48 +08:00 |
|
Lu Fang
|
ee1531bc38
|
[Bugfix][2/n] Fix speculative decoding CI - Fix test_ngram_e2e_greedy_correctness (#19644)
|
2025-06-14 21:15:41 -07:00 |
|
Isotr0py
|
2db9044ab6
|
[Bugfix] Fix auto dtype casting for BatchFeature (#19316)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-06-14 15:13:08 +00:00 |
|
Lu Fang
|
06be858828
|
[Bugfix] Fix the speculative decoding test by setting the target dtype (#19633)
|
2025-06-13 20:57:32 -07:00 |
|
Concurrensee
|
d65668b4e8
|
Adding "AMD: Multi-step Tests" to amdproduction. (#19508)
Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-06-13 17:08:51 -07:00 |
|
Luka Govedič
|
3597b06a4f
|
[CUDA] Enable full cudagraph for FlashMLA (#18581)
Signed-off-by: luka <luka@neuralmagic.com>
|
2025-06-13 18:12:26 +00:00 |
|
Ekagra Ranjan
|
017ef648e9
|
[Spec Decode][Benchmark] Generalize spec decode offline benchmark to more methods and datasets (#18847)
|
2025-06-12 10:30:56 -07:00 |
|
Luka Govedič
|
f98548b9da
|
[torch.compile][ROCm] Fuse quantization onto attention using a torch.compile pass (#16756)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-12 08:31:04 -07:00 |
|
mobicham
|
96846bb360
|
Fix TorchAOConfig skip layers (#19265)
Signed-off-by: mobicham <hicham@mobiuslabs.com>
|
2025-06-12 22:22:53 +08:00 |
|
Wentao Ye
|
b6efafd9e4
|
[Perf] Vectorize static / dynamic INT8 quant kernels (#19233)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-06-12 06:51:41 -07:00 |
|
jmswen
|
c9280e6346
|
[Bugfix] Respect num-gpu-blocks-override in v1 (#19503)
Signed-off-by: Jon Swenson <jmswen@gmail.com>
|
2025-06-12 11:00:23 +00:00 |
|
Nick Hill
|
d5bdf899e4
|
[BugFix] Work-around incremental detokenization edge case error (#19449)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-06-12 06:43:20 +00:00 |
|
Ning Xie
|
2f1c19b245
|
[CI] change spell checker from codespell to typos (#18711)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-06-11 19:57:10 -07:00 |
|
bnellnm
|
29fa5cac1c
|
[Kernels] Add activation chunking logic to FusedMoEModularKernel (#19168)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-06-11 12:53:10 -04:00 |
|
Cyrus Leung
|
a2142f0196
|
Support non-string values in JSON keys from CLI (#19471)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-06-11 09:34:04 +00:00 |
|
leopardracer
|
7c644ab6d5
|
Fix Typo in Documentation and Function Name (#19442)
|
2025-06-10 22:44:11 -07:00 |
|
Michael Goin
|
1e473b3010
|
[CI] Disable failing GGUF model test (#19454)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-11 05:12:38 +00:00 |
|
Lu Fang
|
2b1e2111b0
|
Fix test_max_model_len in tests/entrypoints/llm/test_generate.py (#19451)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-06-11 12:54:59 +08:00 |
|
wang.yuqi
|
3952731e8f
|
[New Model]: Support Qwen3 Embedding & Reranker (#19260)
|
2025-06-10 20:07:30 -07:00 |
|
Richard Zou
|
77f0d465d0
|
[BugFix] Allow use_cudagraph to work with dynamic VLLM_USE_V1 (#19390)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-06-11 07:54:41 +08:00 |
|
Isotr0py
|
5f1ac1e1d1
|
Revert "[v1] Add fp32 support to v1 engine through flex attn" (#19404)
|
2025-06-10 01:30:20 -07:00 |
|
Nick Hill
|
646d62f636
|
[Core] Use tuple for kv cache group block ids (#19175)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-06-10 07:01:17 +02:00 |
|
Siyuan Liu
|
7d44c469fe
|
[TPU]Fix KV cache sharing tests (#19371)
|
2025-06-09 18:38:15 -04:00 |
|
liusiqian-tal
|
31f58be96a
|
[Frontend] Make TIMEOUT_KEEP_ALIVE configurable through env var (#18472)
Signed-off-by: liusiqian <liusiqian@tal.com>
|
2025-06-09 21:41:21 +00:00 |
|
22quinn
|
c1c7dbbeeb
|
[Bugfix][Core] Prevent token lengths exceeding max_model_len in V0 (#19348)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-06-09 23:01:29 +08:00 |
|
Varun Sundar Rabindranath
|
5cf2daea9a
|
[Misc] Fixes and Optimizations for DeepEP + DeepGEMM combination. (#19298)
Signed-off-by: Varun <vsundarr@redhat.com>
Co-authored-by: Varun <vsundarr@redhat.com>
|
2025-06-09 10:50:39 -04:00 |
|
Isotr0py
|
b8089195b4
|
[v1] Add fp32 support to v1 engine through flex attn (#19319)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-06-09 22:10:44 +08:00 |
|
Jee Jee Li
|
95a6568b5c
|
[CI/Build] Fix LoRA test (#19350)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-06-09 09:52:10 +00:00 |
|
Richard Zou
|
3a4d417707
|
[Misc] Cleanup compilation tests (#19343)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-06-09 15:05:44 +08:00 |
|
Dipika Sikka
|
c123bc33f9
|
[Quantization] Add compressed-tensors NVFP4 support (#18312)
|
2025-06-08 09:05:55 -04:00 |
|
Richard Zou
|
3d64d366e0
|
[Misc] Change tests/compile to use VLLM_V1 by default (#19302)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-06-08 16:06:48 +08:00 |
|
Richard Zou
|
eaa2e51088
|
[Bugfix] Re-enable use_cudagraph in vLLM v1 (#19299)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2025-06-08 08:56:12 +08:00 |
|
Luka Govedič
|
2d8476e465
|
[BugFix][V1] Fix memory profiling bug (#18974)
Signed-off-by: luka <luka@neuralmagic.com>
|
2025-06-07 10:34:51 -07:00 |
|
Isotr0py
|
d2f0e7e615
|
[CI/Build] Improve Llama GGUF test robustness (#19287)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-06-07 17:23:28 +08:00 |
|
Driss Guessous
|
cf02f9b283
|
Add FlexAttention to V1 (#16078)
Signed-off-by: drisspg <drisspguessous@gmail.com>
|
2025-06-06 21:58:55 -07:00 |
|
ElizaWszola
|
84166fee97
|
[Kernel] Integrate CUTLASS MoE kernel with PPLX (#18762)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-06-06 18:26:11 -07:00 |
|
Lu Fang
|
6e0cd10f72
|
[Easy][Test] Simplify test_function_tool_use with multiple parametrizes (#19269)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-06-07 09:19:09 +08:00 |
|
Nick Hill
|
46ecc57973
|
[BugFix] Fix tpu_model_runner block_id concatenation (#19228)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-06-06 16:28:17 -07:00 |
|
Adolfo Victoria
|
ca27f0f9c1
|
[Bugfix][Core] Update cancellation logic in generate() to handle Generator exits (#19225)
Co-authored-by: Adolfo Victoria <adovi@meta.com>
|
2025-06-06 20:17:54 +00:00 |
|
Nick Hill
|
aad30bd306
|
[BugFix] Fix MultiConnector test after HMA changes (#19291)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-06-06 20:16:24 +00:00 |
|
jmswen
|
7353492a47
|
[Core] Raise when non-multi-instance DP clients target a DP rank (#19227)
Signed-off-by: Jon Swenson <jmswen@gmail.com>
|
2025-06-06 19:03:01 +08:00 |
|
Siqi Yan
|
f168b85725
|
Unit Test for run_dp_sharded_vision_model (#19103)
Signed-off-by: Siqi Yan <siqi@meta.com>
Co-authored-by: Siqi Yan <siqi@meta.com>
|
2025-06-06 16:24:02 +08:00 |
|
Richard Zou
|
da511d54d8
|
Fix CompilationConfig repr (#19091)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-06-06 16:23:35 +08:00 |
|
Dipika Sikka
|
94870359cd
|
[Quantization] Bump compressed-tensors version; update NVFP4A16 test model (#19224)
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>
|
2025-06-06 01:21:54 -07:00 |
|
Chengji Yao
|
b61dc5f972
|
[TPU] update torch_xla pin (#19231)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-06-06 04:27:38 +00:00 |
|
Chen Zhang
|
f8a1a2d108
|
[v1] Hybrid Memory Allocator (#17996)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-06-05 20:47:09 -07:00 |
|
Benjamin Chislett
|
3465b87ef8
|
[Bugfix] Fix EAGLE vocab embedding construction for Llama 70B (#19033)
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
|
2025-06-05 19:10:08 -07:00 |
|
Jerry Zhang
|
c8134bea15
|
Fix AOPerModuleConfig name changes (#18869)
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
|
2025-06-05 18:51:32 -07:00 |
|
Luis Vega
|
cb6d572e85
|
[Model] NemotronH support (#18863)
Signed-off-by: Luis Vega <2478335+vegaluisjose@users.noreply.github.com>
Co-authored-by: Luis Vega <2478335+vegaluisjose@users.noreply.github.com>
|
2025-06-05 21:29:28 +00:00 |
|