Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

da63274d9f [Bugfix][NIXL] Fix Async Scheduler timeout issue (#25808) Nicolò Lucchesi 2025-09-27 21:17:35 +02:00
c216119d64 [Core] GC Debug callback (#24829) Jialin Ouyang 2025-09-27 10:53:31 -07:00
5546acb463 [Bug]: Set LD_LIBRARY_PATH to include the 'standard' CUDA location (#25766) Clayton Coleman 2025-09-27 13:36:28 -04:00
c0ec81836f [torch.compile]: Add VLLM_DEBUG_DUMP_PATH environment variable (#25651) Jiangyun Zhu 2025-09-28 00:09:00 +08:00
b65e56babe [Core] Refactor self.model() to call a helper for subclassing. (#25084) Patrick C. Toulme 2025-09-27 11:40:59 -04:00
49996cd597 [env] default nixl side port conflicts with kv-event zmq port (#25056) Peter Pan 2025-09-27 23:02:40 +08:00
ecb37e276a [docs] transcriptions API audio upload (#25446) yyzxw 2025-09-27 23:00:35 +08:00
a5354b3ed2 [Bugfix][WideEP] Apply TP Attn + EP MoE fix to other models (#24982) Tyler Michael Smith 2025-09-27 10:22:28 -04:00
f9df8b4ad7 [Bugfix] Fix triton import precommit failure (#25803) Tyler Michael Smith 2025-09-27 10:13:11 -04:00
ec152c8748 Fix GPTQ model loading in Transformers backend (#25770) Harry Mellor 2025-09-27 13:18:20 +01:00
7977e5027c Add filtering for chat template kwargs (#25794) Russell Bryant 2025-09-27 06:46:49 -04:00
3f5d902d2a Validate API tokens in constant time (#25781) Russell Bryant 2025-09-27 06:09:26 -04:00
27d7638b94 [Bugfix] Merge MM embeddings by index instead of token IDs (#16229) Cyrus Leung 2025-09-27 16:15:12 +08:00
176173989a [Bugfix] Add missing image_size for phi4_multimodal (#25796) Xiaohan Zou 2025-09-27 03:59:22 -04:00
23b8ee672d [Misc] Update openai client example file for multimodal (#25795) Roger Wang 2025-09-27 00:57:07 -07:00
3939152069 [Misc] Fix codeowners override for v1 sample and attention (#25037) 22quinn 2025-09-27 00:47:29 -07:00
cd87bfbf37 [CI/Build] Reorganize root-level V1 tests (#25767) Cyrus Leung 2025-09-27 13:51:15 +08:00
b3613e3ace [CI/Build] Add timing to Model Executor Test (#25799) 22quinn 2025-09-26 21:57:27 -07:00
d346ec695e [CI/Build] Consolidate model loader tests and requirements (#25765) Cyrus Leung 2025-09-27 12:45:20 +08:00
c242c98031 [Bugfix] Allow Only SDPA Backend for ViT on B200 for Qwen3-VL (#25788) Wentao Ye 2025-09-26 23:44:52 -04:00
f1d53d150c [Multimodal][Speculative Decoding]Eagle Eagle3 mm support, enablement on qwen2.5vl (#22872) WeiQing Chen 2025-09-27 11:35:47 +08:00
92da847cf5 Add flashinfer-build.sh and register precompiled cu128 wheel in Dockerfile (#25782) Michael Goin 2025-09-26 21:54:09 -04:00
3958b96bf5 Add option to restrict media domains (#25783) Russell Bryant 2025-09-26 21:23:52 -04:00
8bf8f45822 [Core] Don't count preempted tokens in prefix cache hit rate (#25787) Zhuohan Li 2025-09-26 17:16:40 -07:00
6f5c0931c1 [Spec decode] automatically disable mm for text-only draft models (#25667) Jonas M. Kübler 2025-09-27 02:10:21 +02:00
4e33a7ea85 [Bugfix] Optimize CpuGpuBuffer initialization (#25447) Naman Lalit 2025-09-26 17:07:36 -07:00
dc48ba0c75 Kernel-override Determinism [1/n] (#25603) Bram Wasti 2025-09-26 19:59:09 -04:00
4778b42660 Reduce the Cuda Graph memory footprint when running with DBO (#25779) Sage Moore 2025-09-26 15:29:56 -07:00
c70ac4b8ff [spec decode] Consolidate speculative decode method name for MTP (#25232) qizixi 2025-09-26 15:27:05 -07:00
cf89202855 [CI] Fix FlashInfer AOT in release docker image (#25730) Michael Goin 2025-09-26 17:11:40 -04:00
f075693da7 [V1] address post issues related to #20059 (part 1) (#23046) fhl2000 2025-09-27 03:58:19 +08:00
f708bd4904 [CI] Add E2E Blackwell Quantized MoE Test (#25723) Michael Goin 2025-09-26 15:23:00 -04:00
0002b7f0d1 [Docs] Add Toronto Meetup (#25773) Michael Goin 2025-09-26 15:00:46 -04:00
11aafd9886 [Bugfix] Improve GLM4 MoE Reasoning Parser's is_reasoning_end Condition (#25355) Frank Wang 2025-09-26 11:54:00 -07:00
b761df963c [Doc]: improve CPU(x86) build-wheel-from-source section (#25617) v0.11.1rc0 v0.11.0rc1 Clouddude 2025-09-26 13:26:33 -04:00
33f6aaf972 Eagle3 that supports the Minicpm3 model (#24243) 阿丹(adan) 2025-09-27 01:04:57 +08:00
56aafa8c0b [Misc] fix unique_filepath (#25732) Jiangyun Zhu 2025-09-27 00:56:15 +08:00
8d52f2b3a7 [ray][metrics] Replace ':' with '_' for OpenTelemetry compatibility in Ray (#25439) Seiji Eicher 2025-09-26 09:43:30 -07:00
984d18498a [BugFix] Fix using dbo_decode_token_threshold always (and ignoring dbo_prefill_token_threshold) (#25622) Lucas Wilkinson 2025-09-26 12:22:49 -04:00
d4d9899860 [Quantization] Add field to skip unquantized modules for GPTQ config (#25455) Isotr0py 2025-09-26 23:47:41 +08:00
db1e42f627 [CI/Build] Fix some V1 tests not being run (#25569) Cyrus Leung 2025-09-26 20:52:36 +08:00
bc9d7b5595 [CI/Build] Split up Distributed Tests (#25572) Cyrus Leung 2025-09-26 20:49:33 +08:00
fe6b19c314 [Bugfix] Properly abort pooling request. (#25734) wang.yuqi 2025-09-26 20:47:34 +08:00
2827b3f4a3 [CI] Fix test_shared_storage_connector_hashes (#25748) Chauncey 2025-09-26 20:46:17 +08:00
2b6b1d7809 [Model] Mamba2 varlen refactor (#21467) Chih-Chieh Yang 2025-09-26 07:31:14 -04:00
633f943e30 [Doc] Update Batch-level DP docs (#25757) Cyrus Leung 2025-09-26 17:37:40 +08:00
b03b1b97f6 Support LongCat-Flash-Chat tool call (#24083) Xu Wenqing 2025-09-26 17:25:39 +08:00
dfb9af2014 [Bugfix] Fix Shared Expert/Zero expert code in FusedMoE.process_chunk (#25698) Sage Moore 2025-09-26 01:25:28 -07:00
19f76ee68e [misc] refactor speculative config (#25657) yyzxw 2025-09-26 16:22:06 +08:00
dd70437a4f Remove cuda hard-code in compute_causal_conv1d_metadata (#25555) Icey 2025-09-26 16:19:20 +08:00
99b3a504c5 [Qwen3-Next][GDN] fixes cuda graph capturing bug in GDN metadata and a stride bug in causal_conv_1d. (#25743) Tao He 2025-09-26 16:18:58 +08:00
6e30010d2f fix: print outputt offline_inference/base/chat.py example (#25744) Iceber Gu 2025-09-26 16:18:24 +08:00
52621c8f5c [Harware][AMD][Model] Triton MoE tuning configs for GLM-4.5 for MI300X (#25703) xaguilar-amd 2025-09-26 10:18:20 +02:00
d48f4d6daf perf: Avoid copying inputs_embeds tensors to GPU unless prompt_embeds is enabled (#25739) Andrew Sansom 2025-09-26 03:18:09 -05:00
e84e0735c7 fix: revert cast to cpu in MsgpackEncoder._encode_tensor to avoid hidden performance regressions (#25738) Andrew Sansom 2025-09-26 03:18:05 -05:00
3edf87d25f [CI/Build] fix doc build warning: Failed to get 'name: description' pair (#25733) yitingdc 2025-09-26 16:18:02 +08:00
392edee34a EVS Support (Video tokens pruning) (#22980) Eugene Khvedchenya 2025-09-26 06:54:54 +03:00
983056e456 [Misc] Remove unnecessary memoryviews in shm_broadcast.py (#25721) Nick Hill 2025-09-25 20:11:44 -07:00
13dd93c667 [Core] Force PIECEWISE CUDAGraph mode for encoder-decoder (#25701) Russell Bryant 2025-09-25 21:21:56 -04:00
53a30845be Llamas 3.1 405B fp4 changes upstreaming from 355_wip (#25135) Aleksandr Malyshev 2025-09-25 18:16:53 -07:00
8b77328ffe [Misc] Don't log shm dequeue delay warning on worker side (#25720) Nick Hill 2025-09-25 18:08:30 -07:00
9fe4c2bdb9 [Refactor] Remove DeepGEMM OP Register (#25710) Wentao Ye 2025-09-25 20:13:41 -04:00
081b5594a2 Fix routing_bias dtype (#25711) Shu Wang 2025-09-25 18:35:14 -05:00
57329a8c01 [Model] rename NemotronH_Nano_VL -> NemotronH_Nano_VL_V2 (#25708) tomeras91 2025-09-26 02:10:29 +03:00
8c435c9bce [Core] Enable command line logging for LLMEngine (#25610) Zhuohan Li 2025-09-25 15:31:17 -07:00
e71b8e210d [Spec Decode] Add Batch Parallel Ngram. Upto 8x lower overhead. (#24986) Ekagra Ranjan 2025-09-25 18:22:03 -04:00
89fa54e6f7 [Optimization] Use a cheaper cache key in get_model_architecture (#25682) Cyrus Leung 2025-09-26 05:54:20 +08:00
3d54bdcb73 [Optimization] Streamline InputPreprocessor (#25702) Cyrus Leung 2025-09-26 05:06:49 +08:00
6b0fcbbf43 [Misc] Simplify test_argsort_mm_positions (#25690) Cyrus Leung 2025-09-26 02:23:01 +08:00
0fa673af4c [V0 deprecation] Clean up LoRA (#25686) Jee Jee Li 2025-09-26 02:12:33 +08:00
3468f17ebe [V0 deprecation] Remove _VLLM_V1 suffixes from attention backend names (#25489) Matthew Bonanni 2025-09-25 13:37:50 -04:00
71b25b0d48 [V0 deprecation] Clean up V0 fallback in compilation config (#25675) Isotr0py 2025-09-26 01:29:51 +08:00
0ea80c87d9 [Model] Define merge_by_field_config MM interface (#25676) Cyrus Leung 2025-09-26 01:13:07 +08:00
b8d9e4a326 [Model] Add optional parameter to reasoning parser constructor (#25554) Tao Hui 2025-09-26 01:12:50 +08:00
13cc7f5370 [BugFix] Fix DBO hang (#25625) Lucas Wilkinson 2025-09-25 13:04:48 -04:00
916bd9204d Revert "[Bug] Dynamo Unsupported due to BasevLLMParameter.torch_function calling disabled super()" (#25681) Michael Goin 2025-09-25 12:45:06 -04:00
e04a1b6b21 [BUGFIX] Fix crash in Eagle Speculative Decoding models when exceedin… (#24662) AlonKejzman 2025-09-25 18:40:14 +03:00
2e5df88c92 [Logging] Remove TORCH_NCCL_AVOID_RECORD_STREAMS to squash a warning (#25532) Tyler Michael Smith 2025-09-25 11:16:06 -04:00
0754ac4c49 [Misc] Remove cruft file in repo (#25678) Nicolò Lucchesi 2025-09-25 17:05:12 +02:00
03858e6d1c [Bugfix] Fix InternS1 video processing after Transformers v4.56 (#25644) Isotr0py 2025-09-25 22:46:04 +08:00
532a6cfccb [ux] Switch a warning to debug about a pytorch fallback (#23750) Russell Bryant 2025-09-25 10:38:16 -04:00
eb32335e35 [CPU] update torch 2.8 and fix missing fields in TorchSDPAMetadata (#25652) Li, Jiang 2025-09-25 21:29:11 +08:00
69a8c8e99a [torch.compile] Make Query Quantization Fusable (#24914) Jonas M. Kübler 2025-09-25 15:25:12 +02:00
6c340da4df [misc] log info messages by default for hanging / busy / idle (#25627) youkaichao 2025-09-25 21:14:57 +08:00
2f17117606 [mypy] Fix wrong type annotations related to tuple (#25660) Cyrus Leung 2025-09-25 21:00:45 +08:00
1e9a77e037 [Hardware][RISC-V] Add riscv64 support for vLLM with scalar (#22112) chenlang 2025-09-25 20:46:11 +08:00
d2af67441d [XPU][Triton]add xpu config in triton_reshape_and_cache_flash (#25643) Kunshang Ji 2025-09-25 20:38:11 +08:00
0bcc3a160d [CI/Build] Fix flaky entrypoints test (#25663) Cyrus Leung 2025-09-25 20:19:40 +08:00
70fbdb26e9 Add backward compatibility for guided_... API (#25615) Harry Mellor 2025-09-25 12:45:25 +01:00
7f570f1caa [V0 deprecation] Remove unreachable model_config.supported_tasks (#25642) wang.yuqi 2025-09-25 19:26:31 +08:00
eaeca3cd7f [Bugfix] Parse SpeculativeConfig Error (#25142) yyzxw 2025-09-25 19:09:39 +08:00
12c1287d64 [mypy] Further improve MM type annotations (#25654) Cyrus Leung 2025-09-25 18:57:36 +08:00
17b4c6685c [Bugfix] Fix Qwen3-VL max_num_video_tokens calculation for video profiling (#25648) Isotr0py 2025-09-25 18:36:01 +08:00
3c2b2ccece [Bugfix] Add triton.language.tensor placeholder (#25649) Agata Dobrzyniewicz 2025-09-25 12:31:14 +02:00
7be9ffcd9f [Misc] Fix Qwen3-VL video_grid_thw typing (#25646) Roger Wang 2025-09-25 03:16:45 -07:00
393de22d2e [fix] Update torch version in cpu-build.txt for AArch64/ppc64le and Darwin (#25579) Fadi Arafeh 2025-09-25 10:39:18 +01:00
1260180c67 Revert "[Performance] Move apply_w8a8_block_fp8_linear to an op class… (#25607) Tyler Michael Smith 2025-09-25 04:05:21 -04:00
af4ee63e0e typo: remove duplicate is (#25641) Nicole LiHui 🥜 2025-09-25 15:46:22 +08:00
bc092ea873 Map CwmForCausalLM to llama and LlamaForCausalLM (#25611) Jacob Kahn 2025-09-25 09:37:03 +02:00
755ed7b05b [Misc] Simplify PoolerOutput and move to v1/outputs (#25629) Cyrus Leung 2025-09-25 14:47:03 +08:00

... 58 59 60 61 62 ...