Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

756848e79e [Bugfix] Fix Lora Name Parsing (#17196) Alex Brooks 2025-04-27 06:33:09 -06:00
18445edd0f [Misc] Change buckets of histogram_iteration_tokens to [1, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8096] to represent number of tokens (#17033) Flex Wang 2025-04-27 05:30:53 -07:00
30215ca61f [MISC] Use string annotation types for class definitions (#17244) Jade Zheng 2025-04-27 16:39:57 +08:00
838cedade7 [Bugfix] Get a specific type of layer from forward context (#17222) Chen Zhang 2025-04-27 15:58:05 +08:00
4283a28c2f [Bugfix] Fix QWen2 VL multimodal mapping (#17240) Jee Jee Li 2025-04-27 13:53:23 +08:00
93a126fbc7 [Misc] Make cached tokenizer pickle-compatible (#17048) Cyrus Leung 2025-04-27 13:05:00 +08:00
8e4b351a0c [Kernel][Triton][FP8] Adding fp8 and variable length sequence support to Triton FAv2 kernel (#12591) rasmith 2025-04-26 19:35:08 -05:00
9869453c42 Update test_flash_attn.py (#17102) Happy 2025-04-27 06:17:35 +08:00
3642c59aa8 [CI/Build] remove -t for run-lm-eval-gsm-hf-baseline.sh (#16271) Reid 2025-04-27 02:25:05 +08:00
43eea2953b [Minor] Fix lint error in main branch (#17233) Woosuk Kwon 2025-04-26 11:10:14 -07:00
de7eb10ce4 [Bugfix] Fix Qwen2.5-Omni M-RoPE position ids generation (#16878) Kero Liang 2025-04-27 01:41:35 +08:00
fd11a325b8 [MISC] rename interval to max_recent_requests (#14285) Ning Xie 2025-04-27 00:59:18 +08:00
4d17e20310 Disable the torch.compile cache checks when VLLM_DISABLE_COMPILE_CACHE=1 (#16573) Lu Fang 2025-04-26 09:17:58 -07:00
10fd1d7380 [Bugfix] fix error due to an uninitialized tokenizer when using skip_tokenizer_init with num_scheduler_steps (#9276) changjun.lee 2025-04-27 00:51:17 +09:00
52b4f4a8d7 [Docs] Update structured output doc for V1 (#17135) Russell Bryant 2025-04-26 11:12:18 -04:00
e782e0a170 [Chore] added stubs for vllm_flash_attn during development mode (#17228) Aaron Pham 2025-04-26 10:45:26 -04:00
dc2ceca5c5 [BUGFIX] use random for NONE_HASH only when PYTHONHASHSEED not set (#17088) Ning Xie 2025-04-26 22:34:24 +08:00
f8acd01ff7 [V1] Add structural_tag support using xgrammar (#17085) Russell Bryant 2025-04-26 10:06:37 -04:00
c48334d405 [Hardware][Intel-Gaudi] Update hpu-extension and update bucketing system for HPU device (#17186) Agata Dobrzyniewicz 2025-04-26 14:55:14 +02:00
909fdaf152 [Bugfix] Fix standard models tests (#17217) Cyrus Leung 2025-04-26 17:26:41 +08:00
8c1c926d00 [Bugfix] Fix missing int type for -n in multi-image example (#17223) Isotr0py 2025-04-26 16:49:52 +08:00
df6f3ce883 [Core] Remove prompt string from engine core data structures (#17214) Nick Hill 2025-04-25 23:41:05 -07:00
513f074766 [CI/test] Fix Eagle Correctness Test (#17209) Woosuk Kwon 2025-04-25 23:40:36 -07:00
b07bf83c7d [BugFix] Avoid race conditions in zero-copy tensor transmission (#17203) Nick Hill 2025-04-25 23:00:07 -07:00
53e8cf53a4 [V1][Metrics] Allow V1 AsyncLLM to use custom logger (#14661) Zijing Liu 2025-04-25 22:05:40 -07:00
54271bb766 [ROCm][Misc] Follow-ups for Skinny Gemms on ROCm. (#17011) Charlie Fu 2025-04-26 00:05:10 -05:00
9e96f56efb Allocate kv_cache with stride order (#16605) Shu Wang 2025-04-26 00:03:31 -05:00
b278911229 [Minor][Models] Fix Return Types of Llama & Eagle (#17220) Woosuk Kwon 2025-04-25 21:54:47 -07:00
7bd0c7745c [Doc] Minor fix for the vLLM TPU setup page (#17206) yarongmu-google 2025-04-25 21:39:56 -07:00
1cf0719ebd [Minor][Spec Decode] Add use_eagle to SpeculativeConfig (#17213) Woosuk Kwon 2025-04-25 21:08:15 -07:00
537d5ee025 [doc] add Anything LLM integration (#17216) Reid 2025-04-26 12:03:23 +08:00
c8e5be35f7 [MISC][AMD] Add unused annotation to rocm kernel file (#17097) Lu Fang 2025-04-25 20:33:35 -07:00
a6e72e1e4f [Bugfix] [pytorch] Patch AOTAutogradCache._get_shape_env (#17142) James Wu 2025-04-25 23:28:20 -04:00
5e83a7277f [v1] [P/D] Adding LMCache KV connector for v1 (#16625) Yihua Cheng 2025-04-25 22:03:38 -05:00
68af5f6c5c [AMD][FP8][BugFix] Remove V1 check in arg_utils.py for FP8 since it is not necessary (#17215) rasmith 2025-04-25 21:55:05 -05:00
8de2901fea [Bugfix] gemma[2,3] interleaved attention when sliding window is disabled (#17180) Chen Zhang 2025-04-26 10:53:51 +08:00
c53e0730cb [Misc] Refine ray_serve_deepseek example (#17204) Rui Qiao 2025-04-25 16:06:59 -07:00
a0e619e62a [V1][Spec Decode] EAGLE-3 Support (#16937) Benjamin Chislett 2025-04-25 18:43:07 -04:00
70116459c3 [BugFix][Frontend] Fix LLM.chat() tokenization (#16081) Nick Hill 2025-04-25 15:20:05 -07:00
65e262b93b Fix Python packaging edge cases (#17159) Christian Heimes 2025-04-26 00:15:07 +02:00
43faa0461a [Bugfix] Fix hybrid model tests (#17182) Cyrus Leung 2025-04-26 06:14:37 +08:00
48cb2109b6 [V1] Move usage stats to worker and start logging TPU hardware (#16211) Daniel Li 2025-04-25 13:06:01 -07:00
a5450f11c9 [Security] Use safe serialization and fix zmq setup for mooncake pipe (#17192) Russell Bryant 2025-04-25 12:53:23 -04:00
9d98ab5ec6 [Misc] Inline Molmo requirements (#17190) Cyrus Leung 2025-04-26 00:41:44 +08:00
df5c879527 [doc] update wrong hf model links (#17184) Reid 2025-04-26 00:40:54 +08:00
423e9f1cbe Use Transformers helper get_text_config() instead of checking for text_config (#17105) Harry Mellor 2025-04-25 16:47:35 +01:00
0bd7f8fca5 Bump Transformers to 4.51.3 (#17116) Harry Mellor 2025-04-25 16:34:34 +01:00
d5615af9ae [Bugfix] Fix Mistral ChatCompletionRequest Body Exception (#16769) Jasmond L 2025-04-25 22:26:30 +08:00
19dcc02a72 [Bugfix] Fix mistral model tests (#17181) Cyrus Leung 2025-04-25 21:03:34 +08:00
7feae92c1f [Doc] Move todo out of beam search docstring (#17183) Alex Brooks 2025-04-25 05:44:58 -06:00
f851b84266 [Doc] Add two links to disagg_prefill.md (#17168) Michael Yao 2025-04-25 18:23:57 +08:00
fc966e9cc6 Only turn on FastIncrementalDetokenizer when tokenizers >= 0.21.1 (#17158) Lu Fang 2025-04-25 02:10:32 -07:00
ef19e67d2c [Doc] Add headings to improve gptqmodel.md (#17164) Michael Yao 2025-04-25 16:13:13 +08:00
a41351f363 [Quantization][FP8] Add support for FP8 models with input_scale for output projection and QK quantization (#15734) rasmith 2025-04-25 02:45:02 -05:00
6aae216b4e [Bugfix] remove fallback in guided_json (int range, patterns) (#16725) Sangyeon Cho 2025-04-25 15:54:43 +09:00
b22980a1dc [Perf]Optimize rotary_emb implementation to use Triton operator for improved inference performance (#16457) yexin(叶鑫) 2025-04-25 14:52:28 +08:00
881f735827 [Misc] Benchmark Serving Script Support Appending Results (#17028) Lucas Wilkinson 2025-04-25 01:53:55 -04:00
2f54045508 [Bugfix][Misc] Use TritonPlaceholderModule to defensively import triton (#15099) Mengqing Cao 2025-04-25 13:51:02 +08:00
5aa6efb9a5 [Misc] Clean up redundant code in uniproc_executor.py (#16762) Lifu Huang 2025-04-24 22:49:30 -07:00
6ca0234478 Move missed SchedulerConfig args into scheduler config group in EngineArgs (#17131) Harry Mellor 2025-04-25 06:48:53 +01:00
649818995f [Docs] Fix True->true in supported_models.md (#17141) Michael Goin 2025-04-24 22:20:04 -06:00
7a0a9da72b [Doc] V1 : Update LoRA status (#17133) Varun Sundar Rabindranath 2025-04-24 23:17:22 -04:00
69bff9bc89 fix float16 support for kimi-vl (#17156) Zaida Zhou 2025-04-25 11:16:32 +08:00
41ca7eb491 [Attention] FA3 decode perf improvement - single mma warp group support for head dim 128 (#16864) Lucas Wilkinson 2025-04-24 23:12:21 -04:00
eef364723c [FEAT] [ROCm]: AITER Fused MOE V1 Support (#16752) vllmellm 2025-04-25 11:06:50 +08:00
0d6e187e88 Use custom address for listening socket (#15988) jglaser 2025-04-24 21:57:16 -04:00
9420a1fc30 Better error message for missing mistral params.json (#17132) Michael Goin 2025-04-24 17:43:08 -06:00
583e900996 [Misc] Add example to run DeepSeek with Ray Serve LLM (#17134) Rui Qiao 2025-04-24 15:25:21 -07:00
05e1fbfc52 Add chat template for Llama 4 models (#16428) Maximilien de Bayser 2025-04-24 17:19:36 -03:00
fe92176321 Add collective_rpc to llm engine (#16999) Yinghai Lu 2025-04-24 13:16:52 -07:00
6d0df0ebeb [Docs] Generate correct github links for decorated functions (#17125) Russell Bryant 2025-04-24 13:39:43 -04:00
0fa939e2d1 Improve configs - LoRAConfig + PromptAdapterConfig (#16980) Harry Mellor 2025-04-24 18:29:34 +01:00
0422ce109f Add :markdownhelp: to EngineArgs docs so markdown docstrings render properly (#17124) Harry Mellor 2025-04-24 18:28:45 +01:00
47bdee409c Molmo Requirements (#17026) Eyshika Agarwal 2025-04-24 12:08:37 -05:00
49f189439d existing torch installation pip command fix for docs (#17059) Atilla 2025-04-24 20:07:21 +03:00
5adf6f6b7f Updating builkite job for IBM Power (#17111) Aaruni Aggarwal 2025-04-24 22:36:17 +05:30
4115f19958 [CI] Add automation for the tool-calling github label (#17118) Russell Bryant 2025-04-24 12:22:00 -04:00
340d7b1b21 [V1][Spec Decoding] Add num_drafts and num_accepted_tokens_per_position metrics (#16665) Mark McLoughlin 2025-04-24 16:57:40 +01:00
1bcbcbf574 [Misc] refactor example series - structured outputs (#17040) Reid 2025-04-24 22:49:48 +08:00
82e43b2d7e Add missing rocm_skinny_gemms kernel test to CI (#17060) Michael Goin 2025-04-24 08:49:37 -06:00
67309a1cb5 [Frontend] Using matryoshka_dimensions control the allowed output dimensions. (#16970) wang.yuqi 2025-04-24 22:06:28 +08:00
b724afe343 [V1][Structured Output] Clear xgrammar compiler object when engine core shut down to avoid nanobind leaked warning (#16954) Shanshan Shen 2025-04-24 21:15:03 +08:00
21f4f1c9a4 Improve static type checking in LoRAModelRunnerMixin (#17104) Harry Mellor 2025-04-24 14:14:47 +01:00
b0c1f6202d [Misc] Remove OLMo2 config copy (#17066) Isotr0py 2025-04-24 21:14:32 +08:00
c0dfd97519 [V1][PP] Optimization: continue scheduling prefill chunks (#17080) Rui Qiao 2025-04-24 05:27:08 -07:00
a9138e85b1 Fix OOT registration test (#17099) Harry Mellor 2025-04-24 12:44:12 +01:00
0a05ed57e6 Simplify TokenizerGroup (#16790) Harry Mellor 2025-04-24 12:43:56 +01:00
14288d1332 Disable enforce_eager for V1 TPU sampler and structured output tests (#17016) Michael Goin 2025-04-24 03:50:09 -06:00
b411418ff0 [Chore] Remove Sampler from Model Code (#17084) Woosuk Kwon 2025-04-24 02:49:33 -07:00
2bc0f72ae5 Add docs for runai_streamer_sharded (#17093) omer-dayan 2025-04-24 11:03:21 +03:00
9c1244de57 [doc] update to hyperlink (#17096) Reid 2025-04-24 15:58:08 +08:00
db2f8d915c [V1] Update structured output (#16812) Reid 2025-04-24 14:57:17 +08:00
6167c0e5d2 [Bugfix][Core] add seq_id_to_seq_group clearing to avoid memory leak when s… (#16472) 张宇 2025-04-24 11:25:37 +08:00
ed2e464653 Addendum Fix to support FIPS enabled machines with MD5 hashing (#17043) Areeb Syed 2025-04-24 08:25:00 +05:30
2c8ed8ee48 More informative error when using Transformers backend (#16988) Harry Mellor 2025-04-24 03:54:03 +01:00
ed50f46641 [Bugfix] Enable V1 usage stats (#16986) Michael Goin 2025-04-23 20:54:00 -06:00
46e678bcff [Minor] Use larger batch sizes for A100/B100/B200/MI300x (#17073) Woosuk Kwon 2025-04-23 19:18:59 -07:00
6b2427f995 [Quantization]add prefix for commandA quantized model (#17017) Chen Xia 2025-04-23 17:32:40 -07:00
b07d741661 [CI/Build] workaround for CI build failure (#17070) Sangyeon Cho 2025-04-24 08:14:18 +09:00
41fb013d29 [V1][Spec Decode] Always use argmax for sampling draft tokens (#16899) Woosuk Kwon 2025-04-23 14:57:43 -07:00

... 97 98 99 100 101 ...