Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

cea96a0156 [Bugfix] Fix sync_and_slice_intermediate_tensors (#21537) Rui Qiao 2025-07-25 17:07:58 -07:00
2eddd437ba Add interleaved RoPE test for Llama4 (Maverick) (#21478) Yong Hoon Shin 2025-07-25 17:07:26 -07:00
75d29cf4e1 [Perf] Cuda Kernel for Int8 Per Token Group Quant (#21476) Wentao Ye 2025-07-25 20:07:07 -04:00
41d3082c41 Add Unsloth to RLHF.md (#21636) Daniel Han 2025-07-25 17:06:48 -07:00
7cfea0df39 [TPU][Test] Rollback PR-21550. (#21619) QiliangCui 2025-07-25 13:22:01 -07:00
5ac3168ee3 [Docs] add auto-round quantization readme (#21600) Wenhua Cheng 2025-07-25 23:52:42 +08:00
396ee94180 [CI] Unifying Dockerfiles for ARM and X86 Builds (#21343) Kebe 2025-07-25 22:33:56 +08:00
e189b50f53 Add support for Prithvi in Online serving mode (#21518) mgazz 2025-07-25 15:01:27 +01:00
136d750f5f [Kernel] Improve machete memory bound perf (#21556) czhu-cohere 2025-07-25 06:53:21 -07:00
b3caeb82e7 [ROCm][AITER] Enable fp8 kv cache on rocm aiter backend. (#20295) who who who 2025-07-25 21:50:21 +08:00
eab2f3980c [Model] Replace Mamba2 RMSNorm Gated with Fused Triton Kernel (#20839) Chih-Chieh Yang 2025-07-25 09:49:36 -04:00
9fe98d4250 [Frontend] Add request_id to the Request object so they can be controlled better via external load balancers (#21009) kourosh hakhamaneshi 2025-07-25 06:49:11 -07:00
29c6fbe58c [MODEL] New model support for naver-hyperclovax/HyperCLOVAX-SEED-Vision-Instruct-3B (#20931) bigshanedogg 2025-07-25 22:05:42 +09:00
c72f049cb4 [Model] Fix Ernie4.5MoE e_score_correction_bias parameter (#21586) xyxinyang 2025-07-25 21:02:53 +08:00
f3a683b7c9 [Bugfix][Logprobs] Fix logprobs op to support more backend (#21591) Mengqing Cao 2025-07-25 20:53:07 +08:00
46d81d6951 [V1] Get supported tasks from model runner instead of model config (#21585) Cyrus Leung 2025-07-25 20:36:45 +08:00
5c3f2628d5 [Quantization] Enable BNB support for more MoE models (#21370) Jee Jee Li 2025-07-25 18:57:34 +08:00
7311f74468 [Bugfix] GGUF: fix AttributeError: 'PosixPath' object has no attribute 'startswith' (#21579) Kebe 2025-07-25 18:42:23 +08:00
8ed01e32f7 Add H20-3e fused MoE kernel tuning configs for Qwen3-Coder-480B-A35B-Instruct (#21598) Xu Wenqing 2025-07-25 17:36:55 +08:00
e38e96a3c0 [Tests] Harden DP tests (#21508) Nick Hill 2025-07-25 10:27:24 +01:00
40d86ee412 [TPU][Bugfix] fix OOM issue in CI test (#21550) Chengji Yao 2025-07-24 23:01:53 -07:00
85d051f026 [Misc] Removed undefined cmake variables MOE_PERMUTE_ARCHS (#21262) Yang Chen 2025-07-24 22:54:23 -07:00
5140f54b89 [CI/Build] fix cpu_extension for apple silicon (#21195) Ignacio Sica 2025-07-25 02:53:59 -03:00
947edd099e [Misc][Tools] make max-model-len a parameter in auto_tune script (#21321) Chengji Yao 2025-07-24 22:46:43 -07:00
fde60ee775 [Model] Fix a check for None but the return value was empty list in Gemma3 MM vision_embeddings (#21479) hfan 2025-07-25 01:46:06 -04:00
b38bc652ac [Model] Support tensor parallel for timm ViT in Deepseek_vl2 (#21494) Jason Gu 2025-07-25 13:45:16 +08:00
adaf2c6d4f [Bugfix] fix modelscope snapshot_download serialization (#21536) Ning Xie 2025-07-25 13:44:38 +08:00
42343f1f89 [CI] Update CODEOWNERS for CPU and Intel GPU (#21582) Li, Jiang 2025-07-25 12:58:03 +08:00
965bc71b04 Integrate TensorSchema with shape validation for Phi3VImagePixelInputs (#21232) Benji Beck 2025-07-24 21:43:52 -07:00
807a328bb6 [Docs] Add requirements/common.txt to run unit tests (#21572) Zhou Fang 2025-07-24 20:51:15 -07:00
e0be2c4d09 [TPU][Test] Temporarily suspend this MoE model in test_basic.py. (#21560) QiliangCui 2025-07-24 20:44:50 -07:00
9c8b2c2a8a [DP] Support api-server-count > 0 in hybrid DP LB mode (#21510) Nick Hill 2025-07-25 04:18:16 +01:00
2212cd6cfb [Bugfix] DeepGemm utils : Fix hardcoded type-cast (#21517) Varun Sundar Rabindranath 2025-07-25 08:47:29 +05:30
ce3a9b1378 [Kernel] adding fused_moe configs for upcoming granite4 (#21332) Burkhard Ringlein 2025-07-25 05:16:59 +02:00
2ce90e5b01 Fix GLM-4 PP Missing Layer When using with PP. (#21531) Yuxuan Zhang 2025-07-25 11:07:38 +08:00
633f6e804b [Bug] Fix DeepGemm Init Error (#21554) Wentao Ye 2025-07-24 23:07:22 -04:00
b57296bb9a [Docs] Fix site_url for RunLLM (#21564) Harry Mellor 2025-07-25 04:05:58 +01:00
34ddcf9ff4 [Frontend] run-batch supports V1 (#21541) Cyrus Leung 2025-07-25 11:05:55 +08:00
fe56180c7f [MoE] More balanced expert sharding (#21497) Woosuk Kwon 2025-07-24 15:56:08 -07:00
07d80d7b0e [TPU][TEST] HF_HUB_DISABLE_XET=1 the test 3. (#21539) QiliangCui 2025-07-24 15:33:04 -07:00
2dd72d23d9 update flashinfer to v0.2.9rc1 (#21485) weiliang 2025-07-25 05:06:11 +08:00
a6c7fb8cff [Docs] Add Expert Parallelism Initial Documentation (#21373) Simon Mo 2025-07-24 12:36:06 -07:00
a7272c23d0 [Docs][minor] Fix broken gh-file link in distributed serving docs (#21543) Ricardo Decal 2025-07-24 10:36:56 -07:00
6066284914 [P/D] Support CPU Transfer in NixlConnector (#18293) Juncheng Gu 2025-07-24 09:58:42 -07:00
1e9ea8e69d [P/D] Move FakeNixlWrapper to test dir (#21328) Rui Qiao 2025-07-24 08:53:45 -07:00
d9f9a3fd96 [XPU] Conditionally import CUDA-specific passes to avoid import errors on xpu platform (#21036) Chaojun Zhang 2025-07-24 23:23:36 +08:00
1b25f1fe75 Update flashinfer CUTLASS MoE Kernel (#21408) Shu Wang 2025-07-24 10:13:31 -05:00
e8cb0d0495 [Bug] Fix Compressed Tensor NVFP4 cutlass_fp4_group_mm illegal memory access (#21465) Wentao Ye 2025-07-24 11:13:24 -04:00
684174115d [Docs] Rewrite Distributed Inference and Serving guide (#20593) Ricardo Decal 2025-07-24 08:13:05 -07:00
cdb79ee63d [Docs] Update Tensorizer usage documentation (#21190) Sanger Steel 2025-07-24 09:56:18 -04:00
5a19a6c670 [Fix] Update mamba_ssm to 2.2.5 (#21421) elvischenv 2025-07-24 18:25:41 +08:00
2ded067fd2 [Bugfix] Fix CUDA arch flags for MoE permute (#21426) Ming Yang 2025-07-24 03:23:59 -07:00
13abd0eaf9 [Model] Officially support Emu3 with Transformers backend (#21319) Harry Mellor 2025-07-24 11:22:12 +01:00
61b8cea3b4 [Attention] Optimize FlashInfer MetadataBuilder Build call (#21137) Lucas Wilkinson 2025-07-24 06:21:46 -04:00
526078a96c bump flashinfer to v0.2.8 (#21385) cjackal 2025-07-24 19:20:38 +09:00
6da0078523 [Feat] Allow custom naming of vLLM processes (#21445) Chauncey 2025-07-24 18:15:23 +08:00
73e3949d07 [Misc] Improve comment for DPEngineCoreActor._set_cuda_visible_devices() (#21501) Rui Qiao 2025-07-24 03:13:40 -07:00
6eca337ce0 Replace --expand-tools-even-if-tool-choice-none with --exclude-tools-when-tool-choice-none for v0.10.0 (#20544) Shintarou Okada 2025-07-24 18:56:36 +09:00
85bda9e7d0 remove GLM-4.5 quantization wrong Code (#21435) Yuxuan Zhang 2025-07-24 16:52:43 +08:00
610852a423 [Core] Support model loader plugins (#21067) 22quinn 2025-07-24 01:49:44 -07:00
f0f4de8f26 [Misc] Fix duplicate FusedMoEConfig debug messages (#21455) Nick Hill 2025-07-24 09:27:30 +01:00
fc5f756db4 [v1][Core] Clean up usages of SpecializedManager (#21407) Zhou Fang 2025-07-24 00:40:11 -07:00
e74bfc70e4 [TPU][Bugfix] fix moe layer (#21340) Chengji Yao 2025-07-24 00:38:39 -07:00
90eeea8f85 [Bugfix][ROCm] Fix for warp_size uses on host (#21205) Gregory Shtrasberg 2025-07-24 03:37:19 -04:00
dde295a934 Deduplicate Transformers backend code using inheritance (#21461) Harry Mellor 2025-07-24 08:16:23 +01:00
6d8d0a24c0 Add think chunk (#21333) v0.10.0rc2 v0.10.0 Julien Denize 2025-07-24 06:51:32 +02:00
11ef7a611e [BugFix] Set CUDA_VISIBLE_DEVICES before spawning the subprocesses (#21211) Yinghai Lu 2025-07-23 21:44:04 -07:00
dc2f159f8a Dump input metadata on crash for async scheduling (#21258) Woosuk Kwon 2025-07-23 21:10:30 -07:00
d5b981f8b1 [DP] Internal Load Balancing Per Node [one-pod-per-node] (#21238) Robert Shaw 2025-07-23 23:57:32 -04:00
eec6942014 [BugFix] Fix KVConnector TP worker aggregation (#21473) Nick Hill 2025-07-24 04:56:49 +01:00
fd48d99ffd [BugFix]: Batch generation from prompt_embeds fails for long prompts (#21390) KazusatoOoko 2025-07-23 20:43:17 -07:00
f8c15c4efb [Bugfix] Fix example disagg_example_p2p_nccl_xpyd.sh zombie process (#21437) WeiQing Chen 2025-07-24 11:42:11 +08:00
aa08a954f9 [Bugfix] Fix casing warning (#21468) Matthew Bonanni 2025-07-23 23:41:23 -04:00
13e4ee1dc3 [XPU][UT] increase intel xpu CI test scope (#21492) Liangliang Ma 2025-07-24 11:24:04 +08:00
772ce5af97 [Misc] Add dummy maverick test to CI (#21324) Ming Yang 2025-07-23 20:22:42 -07:00
63d92abb7c [Frontend] Set MAX_AUDIO_CLIP_FILESIZE_MB via env var instead of hardcoding (#21374) deven-labovitch 2025-07-23 23:22:19 -04:00
11599b0e1f feat(gguf_loader): accept HF repo paths & URLs for GGUF (#20793) Hardik Gupta 2025-07-23 20:21:02 -07:00
f3137cdd81 [Core] Freeze gc during cuda graph capture to speed up init (#21146) Michael Goin 2025-07-23 20:20:14 -04:00
82ec66f514 [V0 Deprecation] Remove Prompt Adapters (#20588) Michael Goin 2025-07-23 19:36:48 -04:00
78c13e30e1 [V1] Fix local chunked attention always disabled (#21419) Yong Hoon Shin 2025-07-23 15:59:30 -07:00
5c9b807b34 [Core] Add reload_weights RPC method (#20096) 22quinn 2025-07-23 14:24:52 -07:00
14bf19e39f [TPU][TEST] Fix the downloading issue in TPU v1 test 11. (#21418) QiliangCui 2025-07-23 11:29:36 -07:00
4ac7713e32 Add test case for compiling multiple graphs (#21044) Yong Hoon Shin 2025-07-23 11:00:47 -07:00
8560a5b258 [Core][Model] PrithviMAE Enablement on vLLM v1 engine (#20577) Christian Pinto 2025-07-23 19:00:23 +01:00
316b1bf706 [Tests] Add tests for headless internal DP LB (#21450) Nick Hill 2025-07-23 15:49:25 +01:00
7c734ee09b [Bugfix][Qwen][DCA] fixes bug in dual-chunk-flash-attn backend for qwen 1m models. (#21364) Tao He 2025-07-23 21:34:37 +08:00
f59ec35b7f [V1] Check all pooling tasks during profiling (#21299) Cyrus Leung 2025-07-23 20:53:26 +08:00
2671334d45 [Model] add Hunyuan V1 Dense Model support. (#21368) Asher 2025-07-23 18:54:08 +08:00
2cc5016a19 [Docs] Clean up v1/metrics.md (#21449) Michael Yao 2025-07-23 18:37:25 +08:00
6929f8b437 [Misc] fixed nvfp4_moe test failures due to invalid kwargs (#21246) Yang Chen 2025-07-23 01:41:43 -07:00
32ec9e2f2a Mamba V2 Test not Asserting Failures. (#21379) Yu Chin Fabian Lim 2025-07-23 04:40:27 -04:00
accac82928 [Sampler] Introduce logprobs mode for logging (#21398) Lu Fang 2025-07-23 01:39:25 -07:00
23637dcdef [Docs] Fix bullets and grammars in tool_calling.md (#21440) Michael Yao 2025-07-23 16:23:20 +08:00
6364af92f8 Fixed typo in profiling logs (#21441) Sergio Paniego Blanco 2025-07-23 10:18:54 +02:00
7aaa2bd5a8 [Bugfix] ensure tool_choice is popped when tool_choice:null is passed in json payload (#19679) Guillaume Calmettes 2025-07-23 09:30:05 +02:00
2f5c14de6a add clear messages for deprecated models (#21424) youkaichao 2025-07-23 15:03:16 +08:00
f002e9a870 [Cleanup] Only log MoE DP setup warning if DP is enabled (#21315) Michael Goin 2025-07-23 03:02:48 -04:00
a1f3610fc6 [Core] Add basic unit test for maybe_evict_cached_block (#21400) Jialin Ouyang 2025-07-23 00:02:02 -07:00
4ecedd1806 [Bugfix] Fix nightly transformers CI failure (#21427) Isotr0py 2025-07-23 15:01:01 +08:00
107111a859 Changing "amdproduction" allocation. (#21409) Alexei-V-Ivanov-AMD 2025-07-22 22:48:31 -05:00

... 78 79 80 81 82 ...