This website requires JavaScript.
b4ac449a83
[Misc] Merge the logs of pp layers partitions (#16225 )
Kebe
2025-04-08 15:18:15 +08:00
8e5314a468
[V1] Add disable_chunked_mm_input arg to disable partial mm input prefill (#15837 )
Michael Goin
2025-04-08 00:24:07 -06:00
87918e40c4
[torch.compile][TPU] Make @support_torch_compile work for XLA backend (#15782 )
Siyuan Liu
2025-04-07 23:23:53 -07:00
f6b32efb7f
[Bugfix] Fix and reorganize broken GGUF tests and bump gguf version (#16194 )
Isotr0py
2025-04-08 13:38:13 +08:00
b99733d092
[Bugfix] Do not skip "empty" parts of chats that are parsable (#16219 )
Michael Goin
2025-04-07 23:14:15 -06:00
05a015d6a5
Add warning for Attention backends that do not support irope yet (#16212 )
Yong Hoon Shin
2025-04-07 20:59:26 -07:00
ad971af8c7
[Bugfix] fix use-ep bug to enable ep by dp/tp size > 1 (#16161 )
zxfan-cpu
2025-04-08 11:48:47 +08:00
f2ebb6f541
[V1] Scatter and gather placeholders in the model runner (#16076 )
Roger Wang
2025-04-07 19:43:41 -07:00
1d01211264
Update BASE_IMAGE to 2.22 release of Neuron (#16218 )
Satyajith Chilappagari
2025-04-07 19:11:18 -07:00
f94ab12f79
[Misc] Update compressed-tensors to version 0.9.3 (#16196 )
Miles Williams
2025-04-08 03:09:06 +01:00
a865bc1ca6
[core] do not send error across process (#16174 )
youkaichao
2025-04-08 10:09:03 +08:00
21802c4b6d
[ROCm][Bugfix][FP8] Make fp8 quant respect fused modules mapping (#16031 )
Michael Goin
2025-04-07 19:28:14 -06:00
652907b354
Torchao (#14231 )
Driss Guessous
2025-04-07 16:39:28 -07:00
24f1c01e0f
[Bugfix][V0] XGrammar structured output supports Enum (#15878 )
leon-seidel
2025-04-08 00:38:25 +02:00
fad6e2538e
[Misc] add description attribute in CLI (#15921 )
Reid
2025-04-08 06:30:35 +08:00
7f6d47c1a2
[V1][BugFix] Exit properly if engine core fails during startup (#16137 )
Nick Hill
2025-04-07 15:30:15 -07:00
3147586ebd
[Bugfix] Fix guidance backend for Qwen models (#16210 )
Benjamin Chislett
2025-04-07 18:15:43 -04:00
ed636d99ca
[Misc] Move Llama 4 projector call into encoder execution (#16201 )
Roger Wang
2025-04-07 14:02:05 -07:00
090c856d76
[Misc] Human-readable max-model-len cli arg (#16181 )
Nicolò Lucchesi
2025-04-07 20:40:58 +02:00
ad434d4cfe
Print the warning only once (#16193 )
Gregory Shtrasberg
2025-04-07 14:30:06 -04:00
66d433b94f
[V1] Revert the default max_num_seqs to V0 values for most hardware (#16158 )
Cyrus Leung
2025-04-08 01:54:36 +08:00
027b204ff1
[Bugfix] Re-enable support for ChatGLMForConditionalGeneration (#16187 )
Cyrus Leung
2025-04-07 23:15:58 +08:00
55dcce91df
Upstream Llama4 Support to Main (#16113 )
Lu Fang
2025-04-07 08:06:27 -07:00
8017c8db7f
[Doc]Update image to latest version (#16186 )
Robin
2025-04-07 22:17:39 +08:00
dc3529dbf6
[Misc] improve example mlpspeculator and llm_engine_example (#16175 )
Reid
2025-04-07 19:53:52 +08:00
7699258ef0
[Model] Add Qwen3 and Qwen3MoE (#15289 )
YamPengLi
2025-04-07 19:06:41 +08:00
e9ba99f296
[V1][Structured Output] Add supports_structured_output() method to Platform (#16148 )
Shanshan Shen
2025-04-07 19:06:24 +08:00
7c80368710
[VLM] Florence-2 supports online serving (#16164 )
Isotr0py
2025-04-07 19:04:02 +08:00
95d63f38c0
doc: fix some typos in doc (#16154 )
yihong
2025-04-07 13:32:06 +08:00
bb8dab821e
[CI] Set max transformers version for Ultravox model test (#16149 )
Roger Wang
2025-04-06 21:37:58 -07:00
fc0f87768a
[Bugfix] Make dummy encoder prompt padding alternative and add missing warnings (#16129 )
Isotr0py
2025-04-07 12:07:15 +08:00
0a57386721
[Misc] Update Mistral-3.1 example (#16147 )
Cyrus Leung
2025-04-07 11:57:37 +08:00
3749e28774
[V1][Minor] Minor simplification for get_computed_blocks (#16139 )
Woosuk Kwon
2025-04-06 20:38:12 -07:00
86fc2321ff
[Metrics] Add bucket for request_latency, time_to_first_token and time_per_output_token (#15202 )
Kay Yan
2025-04-07 11:34:51 +08:00
2549c0dfef
Fix requires-python (#16132 )
Martin Hoyer
2025-04-07 04:22:25 +02:00
b10e519895
[V1][Minor] Optimize get_cached_block (#16135 )
Woosuk Kwon
2025-04-06 13:48:14 -07:00
9bde5ba127
[TPU] Update PyTorch/XLA (#16130 )
Chengji Yao
2025-04-06 11:25:55 -07:00
72c8f1ad04
[Misc] update requires-python in pyproject.toml (#16116 )
Reid
2025-04-06 22:56:34 +08:00
da224daaa9
[Bugfix] add hf_token to EngineArgs (#16093 )
paolovic
2025-04-06 16:47:33 +02:00
3a100b9278
[Bugfix] LoRA : Fix the order in which the kernels process LoRAs (#16040 )
Varun Sundar Rabindranath
2025-04-06 10:04:50 -04:00
242a637aea
[Model] use AutoWeightsLoader for stablelm,starcoder2,zamba2 (#16103 )
rongfu.leng
2025-04-06 20:52:01 +08:00
c2a9671510
[Misc] Improve model redirect to accept json dictionary (#16119 )
Isotr0py
2025-04-06 20:51:45 +08:00
d5ae4f7f42
[Doc][Bugfix] Add missing EOF in k8s deploy doc (#16025 )
Paul Schweigert
2025-04-06 08:10:57 -04:00
b6c502a150
[Misc] refactor example eagle (#16100 )
Reid
2025-04-06 17:42:48 +08:00
9ca710e525
[CI][V1] Fix passing tokenizer as kwarg to validate_guidance_grammar (#16117 )
Roger Wang
2025-04-06 01:18:00 -07:00
eb07c8cb5b
[Frontend] Fix typo in tool chat templates for llama3.2 and toolace (#14501 )
Ben Jackson
2025-04-06 00:44:36 -07:00
ba10801961
[Benchmark] Add sampling parameters to benchmark_serving. (#16022 )
Hyesoo Yang
2025-04-05 21:30:35 -07:00
620fc2d09e
[Model] fix model testing for TeleChat2ForCausalLM and V0 llama4 (#16112 )
Lucia Fang
2025-04-05 21:23:40 -07:00
296c6572dd
Revert "[V1] DP scale-out (1/N): Use zmq ROUTER/DEALER sockets for input queue (#15906 )"
v0.8.3
simon-mo
2025-04-05 21:10:57 -07:00
c575232395
[Model] Support Llama4 in vLLM (#16104 )
Lu Fang
2025-04-05 21:01:00 -07:00
29283eaa7e
[Model] use AutoWeightsLoader for phi, gemma, deepseek (#16088 )
Jonghyun Choe
2025-04-06 12:34:38 +09:00
2fa66ef713
[Bugfix] fix use_atomic_add support of marlin kernel when using v1 engine (#15946 )
Jinzhen Lin
2025-04-06 11:04:22 +08:00
13affc432d
[Misc] Remove redundant code (#16098 )
Chauncey
2025-04-06 11:03:50 +08:00
d8f094a92a
[Misc] format output for encoder_decoder.py (#16095 )
Reid
2025-04-06 10:57:18 +08:00
97ae6d777f
Fix some capitalisations in generated examples doc titles (#16094 )
Harry Mellor
2025-04-05 14:44:03 +01:00
6baeee70d1
Revert "doc: add info for macos clang errors (#16049 )" (#16091 )
yihong
2025-04-05 19:51:51 +08:00
d2517a4939
[doc] fix 404 (#16082 )
Reid
2025-04-05 19:39:18 +08:00
6342adc438
fix: support clang17 for macos and fix the real libomp (#16086 )
yihong
2025-04-05 19:00:12 +08:00
0adba91547
[CI] Fix benchmark script level (#16089 )
Kevin H. Luu
2025-04-05 03:36:01 -07:00
4285e423a6
[Misc] Auto detect bitsandbytes pre-quantized models (#16027 )
Tristan Leclercq
2025-04-05 08:30:45 +02:00
63375f0cdb
[V1][Spec Decode] Update N-gram Proposer Interface (#15750 )
v0.8.3rc1
Woosuk Kwon
2025-04-04 16:32:54 -07:00
70ad3f9e98
[Bugfix][TPU] Fix V1 TPU worker for sliding window (#16059 )
Michael Goin
2025-04-04 17:31:19 -06:00
d6fc629f4d
[Kernel][Minor] Re-fuse triton moe weight application (#16071 )
bnellnm
2025-04-04 19:27:34 -04:00
af51d80fa1
Revert "[V1] Scatter and gather placeholders in the model runner" (#16075 )
Roger Wang
2025-04-04 14:50:57 -07:00
f5722a5052
[V1] Scatter and gather placeholders in the model runner (#15712 )
Cyrus Leung
2025-04-05 05:26:44 +08:00
651cf0fec1
[V1] DP scale-out (1/N): Use zmq ROUTER/DEALER sockets for input queue (#15906 )
Nick Hill
2025-04-04 12:56:43 -07:00
4dc52e1c53
[CI] Reorganize .buildkite directory (#16001 )
Kevin H. Luu
2025-04-04 12:16:20 -07:00
4708f13a9c
[Bugfix] Fix default behavior/fallback for pp in v1 (#16057 )
Michael Goin
2025-04-04 11:58:08 -06:00
a6d042df0a
[ROCm][Bugfix] Bring back fallback to eager mode removed in #14917 , but for ROCm only (#15413 )
Gregory Shtrasberg
2025-04-04 12:40:37 -04:00
40a36ccfeb
[ROCm][Bugfix] Use platform specific FP8 dtype (#15717 )
Gregory Shtrasberg
2025-04-04 12:40:20 -04:00
ef608c37a7
[Distributed] [ROCM] Fix custom allreduce enable checks (#16010 )
Ilya Markov
2025-04-04 18:39:08 +02:00
2386803f2a
[CPU] Change default block_size for CPU backend (#16002 )
Li, Jiang
2025-04-05 00:39:05 +08:00
95862f7b4d
[Benchmark][Doc] Update throughput benchmark and README (#15998 )
Ziji Shi (Steven)
2025-04-04 09:39:02 -07:00
230b131b54
[Bugfix][kernels] Fix half2float conversion in gguf kernels (#15995 )
Isotr0py
2025-04-05 00:38:58 +08:00
0812d8dd41
[Hardware][Gaudi][BugFix] fix arguments of hpu fused moe (#15945 )
liuzhenwei
2025-04-05 00:38:55 +08:00
bf7e3c51ae
[Model] use AutoWeightsLoader for baichuan, gpt-neox, mpt (#15939 )
Jonghyun Choe
2025-04-05 01:38:52 +09:00
a35a8a8392
[V1][Spec Decode] Avoid logging useless nan metrics (#16023 )
Mark McLoughlin
2025-04-04 16:52:41 +01:00
4ef0bb1fcf
doc: add info for macos clang errors (#16049 )
yihong
2025-04-04 22:58:16 +08:00
fadc59c0e6
[TPU][V1] Remove ragged attention kernel parameter hard coding (#16041 )
Chengji Yao
2025-04-04 04:48:50 -07:00
86cbd2eee9
[Misc] improve gguf check (#15974 )
Reid
2025-04-04 09:33:36 +08:00
092475f738
[ROCm] Tweak the benchmark script to run on ROCm (#14252 )
Huy Do
2025-04-03 17:12:48 -07:00
dcc56d62da
[Bugfix] Fix function names in test_block_fp8.py (#16033 )
bnellnm
2025-04-03 19:01:34 -04:00
f15e70d906
[TPU] Switch Test to Non-Sliding Window (#15981 )
Robert Shaw
2025-04-03 14:28:45 -07:00
b6be6f8d1e
[TPU] Support sliding window and logit soft capping in the paged attention kernel for TPU. (#15732 )
iefgnoix
2025-04-03 14:23:28 -07:00
03a70eacaf
Re-enable the AMD Testing for the passing tests. (#15586 )
Alexei-V-Ivanov-AMD
2025-04-03 13:05:17 -05:00
45b1ff7a25
[Misc][Performance] Advance tpu.txt to the most recent nightly torch … (#16024 )
yarongmu-google
2025-04-03 10:32:54 -07:00
15ba07ef25
[Minor] Fused experts refactor (#15914 )
bnellnm
2025-04-03 13:19:38 -04:00
d2b58ca203
[Neuron][kernel] Fuse kv cache into a single tensor (#15911 )
Liangfu Chen
2025-04-03 09:51:32 -07:00
82e7e19a6e
[SupportsQuant] Chameleon, Chatglm, Commandr (#15952 )
Kyle Sayers
2025-04-03 11:25:22 -04:00
421c462948
[SupportsQuant] Bert, Blip, Blip2, Bloom (#15573 )
Kyle Sayers
2025-04-03 11:23:19 -04:00
84884cd9ac
fix: tiny fix make format.sh excutable (#16015 )
yihong
2025-04-03 23:18:05 +08:00
a43aa183dc
[doc] update contribution link (#15922 )
Reid
2025-04-03 18:47:31 +08:00
463bbb1835
[Bugfix][V1] Fix bug from putting llm_engine.model_executor in a background process (#15367 )
wwl2755
2025-04-03 02:32:10 -05:00
5e125e74d1
[misc] improve error message for "Failed to infer device type" (#15994 )
youkaichao
2025-04-03 14:45:03 +08:00
06f21ce7a5
[Benchmark] Add AIMO Dataset to Benchmark (#15955 )
Ziji Shi (Steven)
2025-04-02 23:09:18 -07:00
57a810db9c
[ROCM][V0] PA kennel selection when no sliding window provided (#15982 )
Aleksandr Malyshev
2025-04-02 22:28:44 -07:00
8b664706aa
[bugfix] add seed in torchrun_example.py (#15980 )
youkaichao
2025-04-03 12:25:01 +08:00
37bfee92bf
fix: better error message for get_config close #13889 (#15943 )
yihong
2025-04-03 11:53:19 +08:00
e73ff24e31
[ROCM][KERNEL] Paged attention for V1 (#15720 )
Aleksandr Malyshev
2025-04-02 19:48:00 -07:00
bd7599d34a
[V1][TPU] Do not compile sampling more than needed (#15883 )
Nicolò Lucchesi
2025-04-03 03:36:01 +02:00