Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

6fad29b11b Remove graph_pool as member of VllmBackend and argument to CUDAGraphWrapper (#23385) Copilot 2025-08-25 19:34:15 -07:00
6fd45e7b8a [CI/Build] Use vLLM client's user agent to fetch images (#23561) Cyrus Leung 2025-08-26 10:34:12 +08:00
56dcf4e7e9 [Bug] Fix DeepGEMM Env Control (#23591) Wentao Ye 2025-08-25 21:41:21 -04:00
ae067888d6 Update Flashinfer to 0.2.14.post1 (#23537) weiliang 2025-08-26 09:30:44 +08:00
906e461ed6 [CI Fix] Pin deepep and pplx tags in tools/ep_kernels/, gate multigpu tests (#23568) Michael Goin 2025-08-25 21:29:00 -04:00
2a97ffc33d [Misc] Add release note draft to PR template (#23598) Simon Mo 2025-08-25 16:44:51 -07:00
efc88cf64a [Misc] Simplify FlashInfer attention metadata (#23585) Woosuk Kwon 2025-08-25 15:42:29 -07:00
7b6a837275 [Docs] Update Documentation of Cohere Command-A Models (#23584) Terrence Zhao 2025-08-25 17:53:52 -04:00
c34c82b7fe [TPU][Bugfix] Fixes prompt_token_ids error in tpu tests. (#23574) Pate Motter 2025-08-25 14:29:16 -07:00
8a044754bd [XPU] Delay BF16 check to worker init for spawn compatibility (#22979) Chaojun Zhang 2025-08-26 04:09:26 +08:00
9188ae7cb5 [Bugfix][V1][P/D]Fix the issue where repeated requests for the same input produce abnormal outputs for P2pNcclConnector (#23403) Zhonghua Deng 2025-08-26 03:57:08 +08:00
8a3cd90af5 [Kernel] Add fused grouped_topk kernel for MoE (#23274) Xin Yang 2025-08-25 11:47:52 -07:00
2a167b2eeb [test][RL] Add sleep level 2 test and fix reload with sleep mode (#23521) 22quinn 2025-08-25 09:25:52 -07:00
0ff902f3b4 [Refactor] Refactor persistent buffers with CpuGpuBuffer (#23515) Woosuk Kwon 2025-08-25 08:44:48 -07:00
a9082a4d14 [Bugfix] Fix Qwen3 MoE GPTQ inference (#23490) Isotr0py 2025-08-25 21:40:20 +08:00
e0329ed4b4 Updates to Flex + VLLm integration (#21416) Driss Guessous 2025-08-25 06:32:42 -07:00
6879cd80ae [Refactor] Pass tokenizer explicitly instead of binding to prompt update (#23542) Cyrus Leung 2025-08-25 21:31:57 +08:00
e269be2ba2 [Doc] Add caution for API server scale-out (#23550) Cyrus Leung 2025-08-25 21:14:15 +08:00
5c4b6e66fe [Attention] Unify mamba and attention backend selection (#23171) Ayush Satyam 2025-08-25 14:39:36 +05:30
d0a4a3f645 [misc] add shanghai meetup (#23535) youkaichao 2025-08-25 17:00:03 +08:00
ebafb0936d [Bugfix] Allow dynamic number of patches for llava_onevision (#23525) Cyrus Leung 2025-08-25 16:34:54 +08:00
0cb7b065c3 Feature/benchmark/random mm data/images (#23119) Breno Baldas Skuk 2025-08-25 10:28:35 +02:00
2da02dd0d8 [Fix] DeepSeek V3.1 tool parser error message (#23492) ZiTian Zhao 2025-08-25 15:56:39 +08:00
d765cf01fe [Core][Multimodal] Track encode cache entries by mm_hash and enable embedding sharing between requests (#22711) Chenguang Zheng 2025-08-25 15:41:17 +08:00
712d0f88d8 [Refactor] Dynamic target and content for prompt updates (#23411) Cyrus Leung 2025-08-25 14:39:58 +08:00
49ab23b3cc [gpt-oss] use reasoning channel for reasoning text in serving_chat (#22920) Yu Guo 2025-08-24 23:29:34 -07:00
c9abb10489 [Bugfix] Fix Dense module loading for sentence-transformers embedding models (simplified V2) (#23408) LIYIFAN_liyifan 2025-08-24 22:39:24 -07:00
787cdb3829 Migrate DonutImagePixelInputs to TensorSchema (#23509) Benji Beck 2025-08-24 22:02:15 -07:00
a5203d04df Migrate skyworkr1v inputs to TensorSchema (#23499) Benji Beck 2025-08-24 21:43:21 -07:00
99f8094400 Migrate tarsier inputs to TensorSchema (#23500) Benji Beck 2025-08-24 21:42:36 -07:00
170e8ea9ea [Misc] Unified linear print info (#23516) Jee Jee Li 2025-08-25 11:13:51 +08:00
a71e4765cc [Bugfix] Fix Qwen2.5-VL quantized model weights loading (#23512) zifeitong 2025-08-24 19:40:22 -07:00
39971db3aa Frontend: Adding LM Format Enforcer support to V1 engine (#22564) Noam Gat 2025-08-25 05:31:22 +03:00
504d914314 [Perf] Add Triton config for DeepSeek V3 FP8 EP32 H200 (#23504) Ming Yang 2025-08-24 18:06:35 -07:00
47455c424f [Doc: ]fix various typos in multiple files (#23487) Didier Durand 2025-08-25 02:04:04 +02:00
c7fc6b1354 fix incompatibililty with non cuda platform for nvfp4 (#23478) Lucia Fang 2025-08-24 15:35:41 -07:00
ad78868450 [Misc] Remove unused slot_mapping buffer (#23502) Woosuk Kwon 2025-08-24 14:03:36 -07:00
e2db1164a1 [Model] Enable BLOOM on V1 (#23488) Cyrus Leung 2025-08-24 21:30:47 +08:00
416f05929a [New Model]Donut model (#23229) 汪志鹏 2025-08-24 20:52:24 +08:00
5e021b4981 (Misc): add missing test for zero truncation size. (#23457) TeeKen Lau 2025-08-24 20:12:47 +10:00
1b9b16649c [Misc] update dict parse to EPLBConfig from json dumps to dict unpacking (#23305) rongfu.leng 2025-08-24 16:06:34 +08:00
e76e233540 [kernel] Support W4A8 on Hopper (#23198) czhu-cohere 2025-08-24 02:18:04 -04:00
a75277285b Migrate Paligemma inputs to TensorSchema (#23470) Benji Beck 2025-08-23 21:56:56 -07:00
9dc30b7068 [Bugfix] Add strong reference to CUDA pluggable allocator callbacks (#23477) 22quinn 2025-08-23 21:56:17 -07:00
053278a5dc Migrate Pixtral inputs to TensorSchema (#23472) Benji Beck 2025-08-23 21:55:53 -07:00
c55c028998 [gpt-oss] Streaming Output for Python Tool (#23409) Jiangyun Zhu 2025-08-24 12:42:38 +08:00
65197a5fb3 [Misc] Modify CacheConfig import (#23459) Jee Jee Li 2025-08-23 14:05:27 +08:00
b8f17f5d98 Support DeepSeek-V3.1 tool call (#23454) Xu Wenqing 2025-08-23 13:50:16 +08:00
d9a55204ba fix(tests): Correct unreachable assertion in truncation test (#23425) Aziz 2025-08-23 07:23:54 +02:00
b4e9fd811f Revert "[PERF] Use faster way of decode in tokenizer: avoid useless list-to-list conversion (#20000)" (#23396) Cyrus Leung 2025-08-23 12:16:48 +08:00
308fa287a8 Add glm4.5v tp2,4 fp8 config on H100_80GB (#23443) Chenxi Yang 2025-08-22 19:54:19 -07:00
fa78de9dc3 Quantization: support FP4 quantized models on AMD CDNA2/CDNA3 GPUs (#22527) Daifeng Li 2025-08-23 10:53:21 +08:00
f6818a92cb [UX] Move Dockerfile DeepGEMM install to tools/install_deepgemm.sh (#23360) Michael Goin 2025-08-22 22:52:50 -04:00
23c939fd30 [Model] Support DP for ViT on MiniCPM-V-4 (#23327) WeiQing Chen 2025-08-23 10:14:41 +08:00
add1adfec7 [BugFix] Fix MinPLogitsProcessor.update_states() (#23401) Nick Hill 2025-08-22 17:22:11 -07:00
c80c53a30f [BugFix] Fix batch updates for pooling models (#23398) Nick Hill 2025-08-22 17:20:41 -07:00
24d0c9e6ed [NVIDIA][torch.compile] Support Flashinfer TRTLLM FP8-q/kv NVFP4-out Attention Kernel (#22703) elvischenv 2025-08-23 06:09:05 +08:00
cc7ae5e7ca [BugFix][AMD][Quantization] Fix torch.compile issue where wvSplitKQ not being called when it should when using quantized FP8 model (#22281) rasmith 2025-08-22 16:47:57 -05:00
0313cf854d [PERF] PyTorch Symmetric Memory All-Reduce (#20759) Ilya Markov 2025-08-22 23:39:08 +02:00
0483fabc74 [CI/Build] add EP dependencies to docker (#21976) Zhewen Li 2025-08-22 13:34:40 -07:00
da65bec309 add an env var for path to pre-downloaded flashinfer cubin files (#22675) Shiyan Deng 2025-08-22 12:25:45 -07:00
4645024d3a [Quantization] Allow GGUF quantization to skip unquantized layer (#23188) Isotr0py 2025-08-23 03:04:22 +08:00
cd7a3df26f [Bugfix] Fix broken Florence-2 model (#23426) Isotr0py 2025-08-23 01:50:52 +08:00
32d2b4064f [Model] Add Ovis2.5 PP support (#23405) Isotr0py 2025-08-23 01:46:34 +08:00
22cf679aad [Doc]: fix various typos in multiple files (#23179) Didier Durand 2025-08-22 19:38:46 +02:00
b6d7d34fc6 Add unit tests for batched guided and non-guided requests (#23389) Yong Hoon Shin 2025-08-22 10:31:24 -07:00
341923b982 fix(tests): Ensure reliable CUDA cache clearing in MoE test (#23416) Aziz 2025-08-22 19:20:59 +02:00
424fb7a5d2 [BugFix] Fix the issue where image embeddings were incorrectly split.… (#23366) bppps 2025-08-23 00:56:46 +08:00
88491c1b6b [Speculators][Speculative Decoding] Fix Qwen 2 Eagle3 Support (#23337) PapaGoose 2025-08-22 19:39:19 +03:00
613a23b57f [Bugfix]: Installing dev environment due to pydantic incompatible version (#23353) Martin Hickey 2025-08-22 17:22:29 +01:00
51a215300b [Fix] Bump triton version in rocm-build requirements (#21630) Burkhard Ringlein 2025-08-22 17:13:39 +02:00
ebe14621e3 [Bug fix] Dynamically setting the backend variable for genai_perf_tests in the run-nightly-benchmark script (#23375) Naman Lalit 2025-08-22 08:12:28 -07:00
325aa3dee9 [Misc] local import code clean (#23420) Ning Xie 2025-08-22 22:01:35 +08:00
a073be6d87 [Doc] Update the doc for log probs + prefix caching (#23399) Chen Zhang 2025-08-22 06:20:39 -07:00
695e7adcd2 [misc] Remove outdate comment about runai_model_streamer (#23421) 杨朱 · Kiki 2025-08-22 21:08:53 +08:00
281710ef9a [Attention] Allow V1 flash_attn to support cross-attention (#23297) Russell Bryant 2025-08-22 08:10:16 -04:00
808d2e9aa0 [Misc] Move M-RoPE init logic to _init_mrope_positions (#23422) Woosuk Kwon 2025-08-22 03:07:22 -07:00
285178b3b8 [V0 Deprecation] Remove V0 LoRA test (#23418) Jee Jee Li 2025-08-22 17:56:51 +08:00
88016c372a [Bugfix] Fix pooling models on CPU backend (#23392) Li, Jiang 2025-08-22 17:47:17 +08:00
998720859c Migrate MiniCPMOAudioInputs to TensorSchema (#21847) Benji Beck 2025-08-22 01:43:29 -07:00
0ba1b54ac6 [gpt-oss] add input/output usage in responses api when harmony context is leveraged (#22667) Guillaume Calmettes 2025-08-22 10:32:24 +02:00
53415653ff [P/D][Nixl] Make kv cache register compatible with hybrid memory allocator (#23079) Flora Feng 2025-08-21 22:30:48 -07:00
17373dcd93 [Attention] Refactor AttentionMetadata Preparation for Encoder-only Models (#23154) Chen Zhang 2025-08-21 22:05:59 -07:00
5964069367 [New Model] Add Seed-Oss model (#23241) Bin Jia 2025-08-22 12:58:10 +08:00
de9c085e17 [Misc] Add gemma3 chat template with pythonic-style function calling (#17149) Philip Chung 2025-08-21 21:06:50 -07:00
111692bb8c [CI] Add end-to-end V1 min_tokens test coverage (#22495) Arjun Reddy 2025-08-21 23:04:07 -05:00
394591e343 [Feature] Enable DeepGEMM Linear on B200; 1.5% E2E throughput improvement (#23351) Wentao Ye 2025-08-22 00:01:08 -04:00
3ac849665d [CI/Build] Skip Idefics3 and SmolVLM generation test again (#23356) Isotr0py 2025-08-22 11:39:46 +08:00
0b9cc56fac Migrate MllamaImagePixelInputs to TensorSchema (#22020) Benji Beck 2025-08-21 20:28:49 -07:00
8896eb72eb [Deprecation] Remove prompt_token_ids arg fallback in LLM.generate and LLM.embed (#18800) Cyrus Leung 2025-08-22 10:56:57 +08:00
19fe1a0510 [Kernel] Add FP8 support with FlashMLA backend (#22668) Matthew Bonanni 2025-08-21 22:26:32 -04:00
480bdf5a7b [Core] Support custom executor qualname (#23314) 22quinn 2025-08-21 18:40:54 -07:00
5368f76855 [Feature][Responses API] Support logprobs(non-stream) (#23319) Kebe 2025-08-22 07:09:16 +08:00
8ef6b8a38c Always use cache mounts when installing vllm to avoid populating pip cache in the image. Also remove apt cache. (#23270) tvalentyn 2025-08-22 00:01:03 +02:00
3bbe11cc13 [Perf] Small optimizations for silu_mul_fp8_quant_deep_gemm (#23265) Michael Goin 2025-08-21 17:56:15 -04:00
c5041f899f [CI] improve pr comments bot (#23380) Simon Mo 2025-08-21 14:49:03 -07:00
8b5fe6eb51 [CI] Clean up actions: remove helm, publish workflows and improve pr … (#23377) Simon Mo 2025-08-21 14:29:04 -07:00
800349c2a5 [Structured Outputs] Refactor bitmask construction into get_grammar_bitmask (#23361) Woosuk Kwon 2025-08-21 13:53:33 -07:00
044931f97b Make sure that vectorize_with_alignment produced vectorized global loads (#23182) Elvir Crnčević 2025-08-21 22:06:54 +02:00
1d353b6352 [Core] Always use tensor cores for Flashinfer Decode Wrapper (#23214) Pavani Majety 2025-08-21 13:02:11 -07:00

... 69 70 71 72 73 ...