Woosuk Kwon
|
0ff8ebb2d7
|
[V0 Deprecation] Remove async_output_proc, preemption mode, delay factor (#25334)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-21 08:52:32 -07:00 |
|
Woosuk Kwon
|
26e673fe93
|
[V0 Deprecation] Remove V0 Sequence class & Sampler (#25332)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
|
2025-09-21 08:52:15 -07:00 |
|
Isotr0py
|
cf56cf78b4
|
[V1] Add sliding window support to Flex Attention backend (#24089)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-21 05:08:07 +00:00 |
|
Woosuk Kwon
|
72dd1595b4
|
[CI] Skip tests failing on main (#25326)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-20 19:57:46 -07:00 |
|
Woosuk Kwon
|
572ddf83ce
|
[Chore] Remove unused sampler in models (#25324)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-20 19:53:20 -07:00 |
|
Woosuk Kwon
|
86647d1cd0
|
[V0 Deprecation] Remove V0 Output Processor (#25320)
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-20 17:57:20 -07:00 |
|
Woosuk Kwon
|
52c2a8d4ad
|
[V0 Deprecation] Remove LLMEngine (#25033)
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-20 17:56:30 -07:00 |
|
Cyrus Leung
|
bef180f009
|
[V0 Deprecation] Enable the remaining multimodal tests in V1 (#25307)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-20 17:50:58 +00:00 |
|
lirong
|
d88918e4c2
|
[Core] Enable sharded state loader for V1 engine and enhance test coverage (#25308)
Signed-off-by: pengdrumli <pengdrumli@tencent.com>
|
2025-09-20 21:15:22 +08:00 |
|
Cyrus Leung
|
3d9a1d2de5
|
[V1] Support LLM.apply_model (#18465)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-20 07:14:35 +00:00 |
|
Chen Zhang
|
9607d5eb44
|
[Hybrid Allocator] Support full attention with different hidden size (#25101)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-09-19 23:43:59 -07:00 |
|
Chauncey
|
f91480b2d4
|
[Bugfix] fix tool call arguments is empty (#25223)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: xin.li <xin.li@daocloud.io>
|
2025-09-20 13:29:54 +08:00 |
|
Nick Hill
|
535d80056b
|
[Misc] Support more collective_rpc return types (#25294)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-09-20 02:02:38 +00:00 |
|
Boyuan Feng
|
8945b001db
|
[torch.compile] CUDAGraph Inductor partition integration (#24281)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
Signed-off-by: Boyuan Feng <fby.1994@gmail.com>
Signed-off-by: boyuanfeng <boyuan@meta.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-09-20 01:02:15 +00:00 |
|
Andrew Sansom
|
c7e713616a
|
test: Remove vestigial skip for prompt embeds tests after landing v1 Prompt Embeds support (#25291)
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
|
2025-09-19 17:33:40 -07:00 |
|
Lucas Kabela
|
3da17c2cc2
|
[Bugfix] Remove VLLM_TEST_DYNAMO_FULLGRAPH_CAPTURE #2969 (#25090)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
|
2025-09-19 20:27:21 -04:00 |
|
Zhiyu
|
431535b522
|
Enable modelopt gemma3 nvfp4/fp8, make workflow more robust (#22771)
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-09-19 22:40:33 +00:00 |
|
Alec S
|
e69e0b8b5f
|
[Frontend] Responses API messages out, just harmony for now (#24985)
Signed-off-by: Alec Solder <alecs@fb.com>
Co-authored-by: Alec Solder <alecs@fb.com>
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2025-09-19 21:40:16 +00:00 |
|
qizixi
|
a2a5f79e09
|
Optimize triton unified attention performance for sliding window attention (#24390)
Signed-off-by: zixi-qi <qizixi@meta.com>
|
2025-09-19 13:07:26 -06:00 |
|
Or Ozeri
|
c59a0eca42
|
[KV offload][4/N] Offloading KV connector (#22595)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2025-09-19 19:07:17 +00:00 |
|
Jialin Ouyang
|
2506ce5189
|
[Core][Prefix Hash] Fix prefix hash metrics sliding window maintainance (#24990)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-09-19 12:22:53 -06:00 |
|
Chauncey
|
47fd08aaf9
|
[CI/Build] fix test function_calling (#25072)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-09-19 12:16:32 -06:00 |
|
Harry Mellor
|
12aed7e453
|
Encoder model support for the Transformers backend (#25174)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-19 19:15:22 +01:00 |
|
Jee Jee Li
|
2821986450
|
[Core] Modify the initialization parameters of the lora manager (#25249)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-19 18:01:28 +00:00 |
|
Cyrus Leung
|
6c117cff7d
|
[Frontend] Pass API server count to each process (#23717)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-20 01:15:19 +08:00 |
|
Or Ozeri
|
7ac67ea525
|
[KV offload][3/N] Add worker-side CPU support (#21448)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2025-09-19 09:53:45 -07:00 |
|
Harry Mellor
|
aed16879a9
|
Move ModelConfig from config/__init__.py to config/model.py (#25252)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-19 16:22:33 +00:00 |
|
Nicolò Lucchesi
|
a3d087adec
|
[P/D][Nixl] Introduce KVTransferMetrics and aggregation strategy (#22188)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-09-19 11:09:14 +00:00 |
|
Harry Mellor
|
058525b997
|
Move PoolerConfig from config/__init__.py to config/pooler.py (#25181)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-19 11:02:55 +00:00 |
|
Isotr0py
|
cea91a32f2
|
[Kernel][Performance] Add Triton kernel for Qwen3-VL interleaved MRoPE (#25055)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-19 10:27:49 +00:00 |
|
Isotr0py
|
f2718d2948
|
[Misc] Cleanup test conftest for deprecated encoder-decoder models (#25231)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-19 07:44:56 +00:00 |
|
Andrew Xia
|
6d8246aaff
|
[gpt-oss] Add ResponseReasoningPartAddedEvent, ResponseReasoningPartDoneEvent for streaming (#24938)
Signed-off-by: Andrew Xia <axia@meta.com>
|
2025-09-18 19:11:59 -07:00 |
|
Or Ozeri
|
9d1c50a5ac
|
[KV offload][2/N] Introduce LRU-based CPU offloading management (#20075)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2025-09-19 00:20:51 +00:00 |
|
Andrew Sansom
|
9a4600e4dc
|
[CORE] Prompt Embeddings Support for v1 Engine (#24278)
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Andrew Sansom <qthequartermasterman@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-09-19 08:03:09 +08:00 |
|
Or Ozeri
|
a53ad626d6
|
[KV offload][1b/N] rename offloading to kv_offload (#25191)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2025-09-18 20:53:52 +00:00 |
|
Woosuk Kwon
|
e19bce40a1
|
[V0 Deprecation] Remove AsyncLLMEngine (#25025)
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-18 11:07:42 -07:00 |
|
Or Ozeri
|
505805b645
|
[KV offload][1/N] Introduce an offloading component (#19848)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2025-09-18 10:57:07 -07:00 |
|
wang.yuqi
|
5f696c33b1
|
[New Model] Support BertForTokenClassification / Named Entity Recognition (NER) task (#24872)
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-18 23:22:01 +08:00 |
|
jvlunteren
|
01a583fea4
|
[Kernel] Decouple Tile Size from Block Size in Triton Unified Attention Kernel (#21197)
Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com>
|
2025-09-18 14:27:01 +00:00 |
|
Roger Wang
|
21da73343a
|
[Misc] Clean up flags in vllm bench serve (#25138)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2025-09-18 12:43:33 +00:00 |
|
Asaf Joseph Gardin
|
66072b36db
|
[Bugfix][Mamba] - Fix Conv State Kernel FP32 Support (#24883)
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com>
|
2025-09-18 12:21:17 +00:00 |
|
Chauncey
|
cc935fdd7e
|
[Frontend] Support setting logprobs to -1 (#25031)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-09-18 10:34:42 +00:00 |
|
Aaron Pham
|
29283e8976
|
[Chore] Cleanup guided namespace, move to structured outputs config (#22772)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-18 09:20:27 +00:00 |
|
Gerard Finol
|
aa3f105c59
|
Add 'path' option to ImagePrompt data_format (#25081)
Signed-off-by: Gerard Finol <gerard.finol@urv.cat>
|
2025-09-18 02:02:14 -07:00 |
|
Benjamin Chislett
|
b7433ca1a4
|
[Spec Decode] Efficient padded speculation (#24539)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2025-09-18 01:07:24 -04:00 |
|
Woosuk Kwon
|
5c65a72bb1
|
[V0 Deprecation] Remove more V0 tests (#25117)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-17 22:05:25 -07:00 |
|
Andrew Sansom
|
bec060fd99
|
Mark prompt logprobs as incompatible with prompt embeds at API level (#25077)
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
|
2025-09-17 21:25:07 -07:00 |
|
Woosuk Kwon
|
7fb2a5be28
|
[V0 Deprecation] Skip PP test (#25128)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-17 20:18:36 -07:00 |
|
Woosuk Kwon
|
6c036615dc
|
[V0 Deprecation] Remove misc V0 tests (#25118)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-17 19:41:55 -07:00 |
|
Woosuk Kwon
|
2fc24e94f9
|
[V0 Deprecation] Remove V0 Tracing & Metrics tests (#25115)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-17 19:40:44 -07:00 |
|