756848e79e
[Bugfix] Fix Lora Name Parsing (#17196)
Alex Brooks
2025-04-27 06:33:09 -06:00
18445edd0f
[Misc] Change buckets of histogram_iteration_tokens to [1, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8096] to represent number of tokens (#17033)
Flex Wang
2025-04-27 05:30:53 -07:00
30215ca61f
[MISC] Use string annotation types for class definitions (#17244)
Jade Zheng
2025-04-27 16:39:57 +08:00
838cedade7
[Bugfix] Get a specific type of layer from forward context (#17222)
Chen Zhang
2025-04-27 15:58:05 +08:00
fd11a325b8
[MISC] rename interval to max_recent_requests (#14285)
Ning Xie
2025-04-27 00:59:18 +08:00
4d17e20310
Disable the torch.compile cache checks when VLLM_DISABLE_COMPILE_CACHE=1 (#16573)
Lu Fang
2025-04-26 09:17:58 -07:00
10fd1d7380
[Bugfix] fix error due to an uninitialized tokenizer when using skip_tokenizer_init with num_scheduler_steps (#9276)
changjun.lee
2025-04-27 00:51:17 +09:00
52b4f4a8d7
[Docs] Update structured output doc for V1 (#17135)
Russell Bryant
2025-04-26 11:12:18 -04:00
e782e0a170
[Chore] added stubs for vllm_flash_attn during development mode (#17228)
Aaron Pham
2025-04-26 10:45:26 -04:00
dc2ceca5c5
[BUGFIX] use random for NONE_HASH only when PYTHONHASHSEED not set (#17088)
Ning Xie
2025-04-26 22:34:24 +08:00
f8acd01ff7
[V1] Add structural_tag support using xgrammar (#17085)
Russell Bryant
2025-04-26 10:06:37 -04:00
c48334d405
[Hardware][Intel-Gaudi] Update hpu-extension and update bucketing system for HPU device (#17186)
Agata Dobrzyniewicz
2025-04-26 14:55:14 +02:00
909fdaf152
[Bugfix] Fix standard models tests (#17217)
Cyrus Leung
2025-04-26 17:26:41 +08:00
8c1c926d00
[Bugfix] Fix missing int type for -n in multi-image example (#17223)
Isotr0py
2025-04-26 16:49:52 +08:00
df6f3ce883
[Core] Remove prompt string from engine core data structures (#17214)
Nick Hill
2025-04-25 23:41:05 -07:00
423e9f1cbe
Use Transformers helper get_text_config() instead of checking for text_config (#17105)
Harry Mellor
2025-04-25 16:47:35 +01:00
0bd7f8fca5
Bump Transformers to 4.51.3 (#17116)
Harry Mellor
2025-04-25 16:34:34 +01:00
d5615af9ae
[Bugfix] Fix Mistral ChatCompletionRequest Body Exception (#16769)
Jasmond L
2025-04-25 22:26:30 +08:00
19dcc02a72
[Bugfix] Fix mistral model tests (#17181)
Cyrus Leung
2025-04-25 21:03:34 +08:00
7feae92c1f
[Doc] Move todo out of beam search docstring (#17183)
Alex Brooks
2025-04-25 05:44:58 -06:00
f851b84266
[Doc] Add two links to disagg_prefill.md (#17168)
Michael Yao
2025-04-25 18:23:57 +08:00
fc966e9cc6
Only turn on FastIncrementalDetokenizer when tokenizers >= 0.21.1 (#17158)
Lu Fang
2025-04-25 02:10:32 -07:00
ef19e67d2c
[Doc] Add headings to improve gptqmodel.md (#17164)
Michael Yao
2025-04-25 16:13:13 +08:00
a41351f363
[Quantization][FP8] Add support for FP8 models with input_scale for output projection and QK quantization (#15734)
rasmith
2025-04-25 02:45:02 -05:00
6aae216b4e
[Bugfix] remove fallback in guided_json (int range, patterns) (#16725)
Sangyeon Cho
2025-04-25 15:54:43 +09:00
b22980a1dc
[Perf]Optimize rotary_emb implementation to use Triton operator for improved inference performance (#16457)
yexin(叶鑫)
2025-04-25 14:52:28 +08:00
881f735827
[Misc] Benchmark Serving Script Support Appending Results (#17028)
Lucas Wilkinson
2025-04-25 01:53:55 -04:00
2f54045508
[Bugfix][Misc] Use TritonPlaceholderModule to defensively import triton (#15099)
Mengqing Cao
2025-04-25 13:51:02 +08:00
5aa6efb9a5
[Misc] Clean up redundant code in uniproc_executor.py (#16762)
Lifu Huang
2025-04-24 22:49:30 -07:00
6ca0234478
Move missed SchedulerConfig args into scheduler config group in EngineArgs (#17131)
Harry Mellor
2025-04-25 06:48:53 +01:00
649818995f
[Docs] Fix True->true in supported_models.md (#17141)
Michael Goin
2025-04-24 22:20:04 -06:00
69bff9bc89
fix float16 support for kimi-vl (#17156)
Zaida Zhou
2025-04-25 11:16:32 +08:00
41ca7eb491
[Attention] FA3 decode perf improvement - single mma warp group support for head dim 128 (#16864)
Lucas Wilkinson
2025-04-24 23:12:21 -04:00