Commit Graph

175 Commits

Author SHA1 Message Date
Woosuk Kwon
134a8ee8fd [gpt-oss] Add openai-harmony as default dependency (#22332)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com>
Co-authored-by: Minseok Lee <47620120+minseokl@users.noreply.github.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
2025-08-06 00:10:14 -07:00
Harry Mellor
796bae07c5 Update transformers to v4.55 (#21931)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-05 22:56:14 -07:00
Woosuk Kwon
dd16bdc798 Increase openai-python version (#22316)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-05 21:43:21 -07:00
Rui Qiao
2ff46b8826 [Misc] Bump ray to 2.48.0 (#22123)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2025-08-02 19:42:00 -07:00
Woosuk Kwon
0f81b310db [Misc] Remove upper bound in openai package version (#22060)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-01 02:11:40 -07:00
Simon Mo
da31f6ad3d Revert precompile wheel changes (#22055) 2025-08-01 08:26:24 +00:00
Michael Goin
0bd409cf01 Move flashinfer-python to optional extra vllm[flashinfer] (#21959)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-07-31 18:02:11 -07:00
XiongfeiWei
53c21e492e Update torch_xla pin to 20250730 (#21956)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
2025-07-31 17:26:43 +00:00
Doug Smith
58bb902186 fix(setup): improve precompiled wheel setup for Docker builds (#22025)
Signed-off-by: dougbtv <dosmith@redhat.com>
2025-07-31 09:52:48 -07:00
Harry Mellor
88edf5994c [Docs] Reduce the size of the built docs (#21920)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-30 07:35:08 -07:00
Michael Goin
a33ea28b1b Add flashinfer_python to CUDA wheel requirements (#21389)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-07-29 12:51:58 -07:00
Isotr0py
31084b3b1f [Bugfix][CI/Build] Update peft version in test requirement (#21729)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-07-28 06:17:43 -07:00
Harry Mellor
1395dd9c28 [Docs] Add revision date to rendered docs (#21752)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-28 06:12:46 -07:00
Chengji Yao
f1b286b2fb [TPU] Update ptxla nightly version to 20250724 (#21555)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
2025-07-25 17:09:00 -07:00
Kebe
396ee94180 [CI] Unifying Dockerfiles for ARM and X86 Builds (#21343)
Signed-off-by: Kebe <mail@kebe7jun.com>
2025-07-25 07:33:56 -07:00
Juncheng Gu
6066284914 [P/D] Support CPU Transfer in NixlConnector (#18293)
Signed-off-by: Juncheng Gu <juncgu@gmail.com>
Signed-off-by: Richard Liu <ricliu@google.com>
Co-authored-by: Richard Liu <39319471+richardsliu@users.noreply.github.com>
Co-authored-by: Richard Liu <ricliu@google.com>
2025-07-24 17:58:42 +01:00
elvischenv
5a19a6c670 [Fix] Update mamba_ssm to 2.2.5 (#21421)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
2025-07-24 03:25:41 -07:00
Chauncey
6da0078523 [Feat] Allow custom naming of vLLM processes (#21445)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-07-24 03:15:23 -07:00
Julien Denize
6d8d0a24c0 Add think chunk (#21333)
Some checks failed
Create Release / Create Release (push) Has been cancelled
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
2025-07-23 21:51:32 -07:00
Christian Pinto
8560a5b258 [Core][Model] PrithviMAE Enablement on vLLM v1 engine (#20577)
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
2025-07-23 11:00:23 -07:00
Li, Jiang
e3a0e43d7f [bugfix] Fix auto thread-binding when world_size > 1 in CPU backend and refactor code (#21032)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-07-19 05:13:55 -07:00
Woosuk Kwon
4de7146351 [V0 deprecation] Remove V0 HPU backend (#21131)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-07-17 16:37:36 -07:00
kYLe
4ef00b5cac [VLM] Add Nemotron-Nano-VL-8B-V1 support (#20349)
Signed-off-by: Kyle Huang <kylhuang@nvidia.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-07-17 03:07:55 -07:00
XiongfeiWei
58760e12b1 [TPU] Start using python 3.12 (#21000)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
2025-07-16 19:37:44 -07:00
Michael Goin
4e7dfbe7b4 Update PyTorch to torch==2.7.1 for CUDA (#21011)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-07-17 02:30:44 +00:00
Chauncey
b5c3b68359 [Misc] bump xgrammar version to v0.1.21 (#20992)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-07-15 19:42:16 -07:00
Harry Mellor
b637e9dcb8 Add full serve CLI reference back to docs (#20978)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-15 17:42:30 +00:00
Patrick von Platen
e7e3e6d263 Voxtral (#20970)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-07-15 07:35:30 -07:00
Reid
37e2ecace2 feat: add image zoom to improve image viewing experience (#20763)
Signed-off-by: reidliu41 <reid201711@gmail.com>
2025-07-14 20:14:23 -07:00
XiongfeiWei
d4170fad39 Use w8a8 quantized matmul Pallas kernel (#19170)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
2025-07-15 03:06:33 +00:00
22quinn
f326ab9c88 [Bugfix] Bump up mistral_common to support v13 tokenizer (#20905)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-07-14 10:45:03 +00:00
Maroon Ayoub
66f6fbd393 [Prefix Cache] Add reproducible prefix-cache block hashing using SHA-256 + CBOR (64bit) (#20511)
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
2025-07-14 02:45:31 +00:00
Woosuk Kwon
f45a332886 [Sched] Enhance the logic to remove stopped requests from queues (#20739) 2025-07-12 15:33:13 -07:00
Isotr0py
01cae37713 [CI/Build] Ensure compatability with Transformers v4.53 (#20541)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-07-11 20:53:07 -07:00
Luka Govedič
762be26a8e [Bugfix] Upgrade depyf to 0.19 and streamline custom pass logging (#20777)
Signed-off-by: Luka Govedic <lgovedic@redhat.com>
Signed-off-by: luka <lgovedic@redhat.com>
2025-07-11 00:15:22 -07:00
Nathan Hoos
d6902ce79f [V0][V1][Core] Add outlines integration for V1, and update V0 integration. (#15975)
Signed-off-by: Nathan Hoos <thwackyy.y@gmail.com>
2025-07-10 15:30:26 -04:00
Harry Mellor
3482fd7e4e [Doc] Add engine args back in to the docs (#20674)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-10 08:02:40 -07:00
Jacob Manning
bf03ff3575 [Kernel] Add Conch backend for mixed-precision linear layer (#19818)
Signed-off-by: Jacob Manning <jmanning+oss@stackav.com>
2025-07-09 13:17:55 -07:00
XiongfeiWei
849590a2a7 Update torch/xla pin to 20250703 (#20589)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
2025-07-08 07:44:02 -07:00
Sanger Steel
72d14d0eed [Frontend] [Core] Integrate Tensorizer in to S3 loading machinery, allow passing arbitrary arguments during save/load (#19619)
Signed-off-by: Sanger Steel <sangersteel@gmail.com>
Co-authored-by: Eta <esyra@coreweave.com>
2025-07-07 22:47:43 -07:00
Jee Jee Li
4ff79a136e [Misc] Set the minimum openai version (#20539)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-07-07 09:15:26 +00:00
Peter Pan
5561681d04 [CI] add kvcache-connector dependency definition and add into CI build (#18193)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
2025-07-04 06:49:18 -07:00
Nicolò Lucchesi
d1b689c445 [Bugfix] Fix flaky test_streaming_response test (#20363)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-07-03 14:46:24 +00:00
Jee Jee Li
1819fbda63 [Quantization] Bump to use latest bitsandbytes (#20424)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-07-03 21:58:46 +08:00
Prashant Gupta
22e9d42040 [Misc] add xgrammar for arm64 (#18359)
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
2025-07-01 07:02:20 +00:00
Yang Wang
8b64c895c0 [CI] Sync test dependency with test.in for torch nightly (#19632)
Signed-off-by: Yang Wang <elainewy@meta.com>
Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Concurrensee <yida.wu@amd.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-06-26 20:55:25 -07:00
Dipika Sikka
a57d57fa72 [Quantization] Bump to use latest compressed-tensors (#20033)
Signed-off-by: Dipika <dipikasikka1@gmail.com>
Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>
2025-06-26 20:50:06 -07:00
Kunshang Ji
b69781f107 [Hardware][Intel GPU] Add v1 Intel GPU support with Flash attention backend. (#19560)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-06-26 09:27:18 -07:00
Li, Jiang
0567c8249f [CPU] Fix torch version in x86 CPU backend (#19258)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-06-26 03:34:47 -07:00
h-avsha
3443aaf8dd Move to a faster base64 implementation (#19984)
Signed-off-by: h-avsha <avshalom.manevich@hcompany.ai>
2025-06-24 20:33:51 -07:00