Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

c59a0eca42 [KV offload][4/N] Offloading KV connector (#22595) Or Ozeri 2025-09-19 22:07:17 +03:00
b716ab93a7 [bugfix] fix structured outputs key missing issue from #24929 (#25195) Lucia Fang 2025-09-19 11:37:57 -07:00
138f0d1e75 [Docs] add __init__.py to vllm/model_executor/layers/quantization/compressed_tensors/transform (#24974) samzong 2025-09-20 02:32:27 +08:00
2506ce5189 [Core][Prefix Hash] Fix prefix hash metrics sliding window maintainance (#24990) Jialin Ouyang 2025-09-19 11:22:53 -07:00
47fd08aaf9 [CI/Build] fix test function_calling (#25072) Chauncey 2025-09-20 02:16:32 +08:00
12aed7e453 Encoder model support for the Transformers backend (#25174) Harry Mellor 2025-09-19 19:15:22 +01:00
d90e212a3a Remove Redundant Assignment in Qwen3_VisionPatchMerger (#25224) LJH-LBJ 2025-09-20 02:15:13 +08:00
2821986450 [Core] Modify the initialization parameters of the lora manager (#25249) Jee Jee Li 2025-09-20 02:01:28 +08:00
6c117cff7d [Frontend] Pass API server count to each process (#23717) Cyrus Leung 2025-09-20 01:15:19 +08:00
7ac67ea525 [KV offload][3/N] Add worker-side CPU support (#21448) Or Ozeri 2025-09-19 19:53:45 +03:00
ce75e15373 refactor(benchmarks): add type annotations to wait_for_endpoint parameters (#25218) samzong 2025-09-20 00:36:52 +08:00
aed16879a9 Move ModelConfig from config/__init__.py to config/model.py (#25252) Harry Mellor 2025-09-19 17:22:33 +01:00
cf278ff3b2 Update CODEOWNERS (#25269) Harry Mellor 2025-09-19 17:12:55 +01:00
838d7116ba [Qwen] Remove cuda hard-code in qwen3 next (#25243) Icey 2025-09-19 20:25:12 +08:00
5089fd749c [V0 Deprecation] Remove V0 logic from get_input_embeddings interface (#25242) Cyrus Leung 2025-09-19 19:10:52 +08:00
a3d087adec [P/D][Nixl] Introduce KVTransferMetrics and aggregation strategy (#22188) Nicolò Lucchesi 2025-09-19 13:09:14 +02:00
058525b997 Move PoolerConfig from config/__init__.py to config/pooler.py (#25181) Harry Mellor 2025-09-19 12:02:55 +01:00
1dfea5f4a9 [Bugfix][Perf] Misc fixes for Qwen3 VL (#25238) Roger Wang 2025-09-19 03:46:16 -07:00
cea91a32f2 [Kernel][Performance] Add Triton kernel for Qwen3-VL interleaved MRoPE (#25055) Isotr0py 2025-09-19 18:27:49 +08:00
a684c0124c [bugfix] fix MHA for models like OpenGVLab/InternVL3_5-38B (#25146) Yan Ma 2025-09-19 16:45:06 +08:00
f2718d2948 [Misc] Cleanup test conftest for deprecated encoder-decoder models (#25231) Isotr0py 2025-09-19 15:44:56 +08:00
825fdb11ad [Bugfix][CPU] Add placeholder to avoid import errors when using fused_moe ops on platforms without triton (#25137) Li, Jiang 2025-09-19 15:41:12 +08:00
8c1d4acbfe [CPU] Disable oneDNN linear on non-x86 platforms (#25166) Li, Jiang 2025-09-19 15:27:22 +08:00
486c5599e3 [Build] Update Xgrammar to 0.1.24 to get a CVE fix (#25188) Russell Bryant 2025-09-19 02:27:17 -04:00
a6149aa587 [OOT] Support sync_model_loading for OOT (#25126) Chendi.Xue 2025-09-19 00:41:53 -05:00
6c8a3c099b [Docs] Fix griffe warnings in vllm/multimodal (#25216) Michael Yao 2025-09-19 13:10:44 +08:00
31a8a2a7bc [Misc] Clean up MM profiling warnings (#25222) Roger Wang 2025-09-18 21:46:57 -07:00
1a0a04dae9 [Perf] Optimize memory peak during EAGLE model loading. (#24585) Chen Ding 2025-09-19 11:31:16 +08:00
6d8246aaff [gpt-oss] Add ResponseReasoningPartAddedEvent, ResponseReasoningPartDoneEvent for streaming (#24938) Andrew Xia 2025-09-18 19:11:59 -07:00
9d1c50a5ac [KV offload][2/N] Introduce LRU-based CPU offloading management (#20075) Or Ozeri 2025-09-19 03:20:51 +03:00
9a4600e4dc [CORE] Prompt Embeddings Support for v1 Engine (#24278) Andrew Sansom 2025-09-18 19:03:09 -05:00
9fac6aa30b [BugFix] Fix DeepGEMM warmup, no m.weight_scale_inv (#25206) Lucas Wilkinson 2025-09-18 17:26:28 -04:00
a53ad626d6 [KV offload][1b/N] rename offloading to kv_offload (#25191) Or Ozeri 2025-09-18 23:53:52 +03:00
1c3dad22ff [V0 Deprecation] Remove unused async_timeout.py (#25190) Woosuk Kwon 2025-09-18 13:35:21 -07:00
d2a30a2d93 [Bug] Fix torch Compilation Cache Hit Error (#25093) Wentao Ye 2025-09-18 15:38:37 -04:00
75fb112d80 [Bug] Fix returned_lse not Defined issue (#25106) Wentao Ye 2025-09-18 15:32:24 -04:00
38db529f66 [feat]: Create interface for model-specific M-RoPE (#24194) Aziz 2025-09-18 21:18:56 +02:00
064cac7bb7 [fix]: remove data type hardcoding from gptoss model implementation (#23807) Nikhil Gupta 2025-09-18 19:15:23 +01:00
e19bce40a1 [V0 Deprecation] Remove AsyncLLMEngine (#25025) Woosuk Kwon 2025-09-18 11:07:42 -07:00
505805b645 [KV offload][1/N] Introduce an offloading component (#19848) Or Ozeri 2025-09-18 20:57:07 +03:00
bbdc0f2366 [ROCm][AITER][Bugfix] Switch AITER to use PIECEWISE_AND_FULL compilation (#25104) Rohan Potdar 2025-09-18 12:46:47 -05:00
dc34059360 [ROCm][CI/Build] Use ROCm7.0 as the base (#25178) Gregory Shtrasberg 2025-09-18 12:36:55 -04:00
c4cb0af98a [spec decode] Fix MTP inference path for MiMo-7B model (#25136) qizixi 2025-09-18 09:12:19 -07:00
1c3b1634aa [Misc] Add codeowner for Transformers backend (#25180) Harry Mellor 2025-09-18 17:01:50 +01:00
2ea50e977a Enable Allgather/ReduceScatter backend for NaiveAllToAll (#23964) Shu Wang 2025-09-18 10:52:58 -05:00
b419937c78 [Docs] Fix warnings in mkdocs build (continued) (#25163) Hyogeun Oh (오효근) 2025-09-19 00:23:26 +09:00
5f696c33b1 [New Model] Support BertForTokenClassification / Named Entity Recognition (NER) task (#24872) wang.yuqi 2025-09-18 23:22:01 +08:00
67244c86f0 feat(api): Return 503 on /health when engine is dead (#24897) dongbo910220 2025-09-18 22:29:40 +08:00
072d7e53e5 [PERF] Add conv1d metadata to GDN attn (#25105) Vadim Gimpelson 2025-09-18 18:27:49 +04:00
01a583fea4 [Kernel] Decouple Tile Size from Block Size in Triton Unified Attention Kernel (#21197) jvlunteren 2025-09-18 16:27:01 +02:00
bc19d75985 [Misc] Add kv-connector label (#25156) Nicolò Lucchesi 2025-09-18 15:56:07 +02:00
fbd6523ac0 Refactor dense FP8 tensor/channel/block utils and add CT FP8 block (#21404) Michael Goin 2025-09-18 08:53:45 -04:00
470484a4f5 [Structured Output][Refactor] Move apply_grammar_bitmask() method from ModelRunner to structured output utils (#21999) Shanshan Shen 2025-09-18 20:44:31 +08:00
21da73343a [Misc] Clean up flags in vllm bench serve (#25138) Roger Wang 2025-09-18 05:43:33 -07:00
66072b36db [Bugfix][Mamba] - Fix Conv State Kernel FP32 Support (#24883) Asaf Joseph Gardin 2025-09-18 15:21:17 +03:00
3ed1ec4af2 Fix validate-config pre-commit check (#25157) Harry Mellor 2025-09-18 13:06:28 +01:00
5a33ae9a3f Fix forward reference warning in documentation (#25150) Harry Mellor 2025-09-18 12:41:41 +01:00
c9ff9e6f0c [Docs] add the parallel sampling usage in LLMEngine and AsyncLLM (#24222) William Song 2025-09-18 20:37:08 +09:00
eaffe4486c [Docs] Fix pooling-params doc references in openai_compatible_server.md (#24939) Kay Yan 2025-09-18 19:36:47 +08:00
8ed039d527 Move StructuredOutputsConfig from config/__init__.py to config/structured_outputs.py (#25153) Harry Mellor 2025-09-18 12:24:27 +01:00
37970105fe [Model] Improve Pooling Model (#25149) Jee Jee Li 2025-09-18 19:04:21 +08:00
cc935fdd7e [Frontend] Support setting logprobs to -1 (#25031) Chauncey 2025-09-18 18:34:42 +08:00
abdfcd4f3d silu-v1: Fix EPS not being used during max-reduction (#25069) Elvir Crnčević 2025-09-18 12:25:12 +02:00
4f02b77de4 Fix: Add explicit #include <omp.h> for OpenMP compatibility on certain toolchains (#24951) ihb2032 2025-09-18 17:43:23 +08:00
29283e8976 [Chore] Cleanup guided namespace, move to structured outputs config (#22772) Aaron Pham 2025-09-18 05:20:27 -04:00
05b044e698 [Doc] Fix cross-reference warnings (#25058) Punitvara 2025-09-18 14:35:16 +05:30
aa3f105c59 Add 'path' option to ImagePrompt data_format (#25081) Gerard Finol 2025-09-18 11:02:14 +02:00
ef7eefe17a [Qwen] Add fp8 checkpoint support for qwen3-next. (#25079) Tao He 2025-09-18 16:16:04 +08:00
350c94deb3 [Bugfix] when use s3 model cannot use default load_format (#24435) rongfu.leng 2025-09-18 15:47:43 +08:00
f4cd80f944 Retrieve sliding_window from text config in Gemma3 MM (#25085) Harry Mellor 2025-09-18 07:29:05 +01:00
349e0e3462 [Docs] Fix API Reference (#25140) Harry Mellor 2025-09-18 07:23:29 +01:00
81b16a2bc9 [Kernel] Better inf handling for grouped topk cu (#24886) Lumina 2025-09-18 13:53:55 +08:00
e111d5b0ae [CLI] Use streaming in CLI chat and completion commands (#23769) Simon Mo 2025-09-17 22:30:26 -07:00
a904ea78ea [benchmark] add peak throughput metrics and plot (#23867) Simon Mo 2025-09-17 22:30:02 -07:00
b7433ca1a4 [Spec Decode] Efficient padded speculation (#24539) Benjamin Chislett 2025-09-18 01:07:24 -04:00
5c65a72bb1 [V0 Deprecation] Remove more V0 tests (#25117) Woosuk Kwon 2025-09-17 22:05:25 -07:00
9d8a2d86d2 [EPLB] Add EPLB support for hunyuan_v1 (#23078) YiwenC 2025-09-17 21:51:35 -07:00
3bc18127ff [XPU] Whisper model support on XPU Platform (#25123) Chaojun Zhang 2025-09-18 12:30:10 +08:00
bec060fd99 Mark prompt logprobs as incompatible with prompt embeds at API level (#25077) Andrew Sansom 2025-09-17 23:25:07 -05:00
52bc9d5b3e [Model] enable data parallel for InternVL vision encoder (#23909) YiwenC 2025-09-17 21:11:46 -07:00
dc2979c585 [Kernels] Overlap shared experts with combine instead of dispatch (#24254) bnellnm 2025-09-18 00:10:21 -04:00
027d37df38 [Bugfix][Qwen3-Next] add prefixes to shared_expert in qwen3-next and mlp in qwen2moe to successfully load ignored params in quantized models (#24960) toncao 2025-09-18 11:08:50 +07:00
b98219670f [Core][MM] Cleanup MultiModalCache (#25006) Lukas Geiger 2025-09-18 05:08:41 +01:00
32baf1d036 [Docs] Clean up the contributing README (#25099) Harry Mellor 2025-09-18 05:05:18 +01:00
3127274d02 [MM Encoder] Apply DP ViT for Qwen3-VL model series (#24955) Roger Wang 2025-09-17 21:04:21 -07:00
4ac510f484 [Kernels] Enable DeepGEMM by default (#24462) bnellnm 2025-09-17 23:19:52 -04:00
7fb2a5be28 [V0 Deprecation] Skip PP test (#25128) Woosuk Kwon 2025-09-17 20:18:36 -07:00
6c036615dc [V0 Deprecation] Remove misc V0 tests (#25118) Woosuk Kwon 2025-09-17 19:41:55 -07:00
2fc24e94f9 [V0 Deprecation] Remove V0 Tracing & Metrics tests (#25115) Woosuk Kwon 2025-09-17 19:40:44 -07:00
2c3c1bd07a [V0 Deprecation] Remove V0 Engine tests (#25114) Woosuk Kwon 2025-09-17 19:38:09 -07:00
5963b98b46 [Kernel] Delegate construction of FusedMoEQuantConfig to FusedMoEMethodBase subclasses (#22537) bnellnm 2025-09-17 19:43:31 -04:00
e6585ddb45 [Bugfix] Fix accuracy issue for silu_mul + nvfp4 quant fusion kernel (#24833) elvischenv 2025-09-18 07:37:23 +08:00
2a4d6412e6 Add a batched auto tune script (#25076) Karan Goel 2025-09-17 15:41:18 -07:00
e67a79db03 [Bugfix] Refactor Flashinfer TRTLLM attention kernel selection logic (#24600) elvischenv 2025-09-18 06:36:29 +08:00
9f882d8791 Disable failing GPT-OSS Eval (Blackwell) for now (#25107) Michael Goin 2025-09-17 18:36:00 -04:00
1a456c7c90 Aiter mha fp8 fix (#24991) Douglas Lehr 2025-09-17 17:29:14 -05:00
fedb75fa27 [Bugfix][B200] Fix cutlass_mla hang (#24966) Alexander Matveev 2025-09-17 18:06:38 -04:00
bff2e5f1d6 [gpt-oss][2] fix types for streaming (#24556) Andrew Xia 2025-09-17 15:04:28 -07:00
3c068c637b [Kernel] Faster pre-processing time for W4A8 (#23972) czhu-cohere 2025-09-17 17:35:32 -04:00
f20c3b0951 [BUG] Exclude .pth files when pulling remote files (#25092) ahao-anyscale 2025-09-17 13:42:09 -07:00

... 61 62 63 64 65 ...