Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

ac201a0eaf [Feature] Support Decode Context Parallel (DCP) for MLA (#23734) yzds 2025-09-06 13:24:05 +08:00
3c529fc994 [KV Sharing] Raise error if using eagle with fast prefill (#24350) Yong Hoon Shin 2025-09-05 20:22:40 -07:00
35bf193864 [Doc]: fix typos in Python comments (#24294) Didier Durand 2025-09-06 04:41:12 +02:00
35efa70297 Add @22quinn as code reviewer for RL related components (#24346) 22quinn 2025-09-05 18:56:15 -07:00
cee182b297 [Perf][V1] Fully overlap model execution (#23569) Benjamin Chislett 2025-09-05 21:20:17 -04:00
c954c6629c [CI] Add timeouts to tests (#24260) Rafael Vasquez 2025-09-05 20:26:22 -04:00
9dfbeb41e5 [RFC] allow cancelation after shutdown in blocking collective_rpc (#23390) Shiyan Deng 2025-09-05 14:14:18 -07:00
eedb2a2a10 [Bugfix] Fix silu_mul+quant fusion test (#24341) elvischenv 2025-09-06 04:13:42 +08:00
23a6c5280e [gpt-oss][Bugfix]Fix streamableparser for missing handling of certain token_ids (#24306) Chauncey 2025-09-06 01:26:00 +08:00
7812bcf278 [docs] add shenzhen meetup (#24326) youkaichao 2025-09-05 22:48:42 +08:00
006e7a34ae Adding int4 and int8 models for CPU benchmarking (#23709) Louie Tsai 2025-09-05 05:08:50 -07:00
e599e2c65e [XPU][P/D] Add XPU support in NixlConnector (#22436) liuzhenwei 2025-09-05 12:03:12 +08:00
c29fb540ff [gpt-oss] tool parser supports for /chat/completions [1/n] (#22386) Aaron Pham 2025-09-04 23:39:12 -04:00
65e038931d [Frontend] Skip unnecessary detokenization when token_id is requested (#24236) Nicolò Lucchesi 2025-09-05 01:04:12 +02:00
886ccbe5ba [CI/Build] Reduce the number of redundant cases to test for LoRA (#24276) Zhuohan Li 2025-09-04 14:58:44 -07:00
adc3ddb430 [Bugfix][Misc] Fix silu_and_mul_nvfp4_quant issue and extract common utils for nvfp4 kernel source files (#23727) elvischenv 2025-09-05 05:25:45 +08:00
60b755cbcb [Misc] Have AsyncLLM custom_stat_loggers extend default logger list (#20952) Seiji Eicher 2025-09-04 14:25:30 -07:00
482e52f56c QWEN3 Coder Fused MoE kernels Optimization configs (#24266) Saman A. Pour 2025-09-04 13:33:43 -07:00
78336a0c3e Upgrade FlashInfer to v0.3.0 (#24086) Po-Han Huang (NVIDIA) 2025-09-05 00:49:20 +08:00
94866d7c93 [Misc] Slight improve deepgemm print (#24085) Jee Jee Li 2025-09-05 00:06:51 +08:00
83609ca91d [Doc]: fix typos in Python comments (#24173) Didier Durand 2025-09-04 17:52:17 +02:00
e41a0fa377 [Perf] Freeze core engine proc heap after init (#24008) Nick Hill 2025-09-04 07:55:23 -07:00
37241077d5 [Misc] Removed force_fp8_e4m3fnuz from FP8LinearOp (#23725) nvjullin 2025-09-04 21:25:40 +08:00
c9f7081f9c [LoRA]: Add lora support to qwen-2.5-omni (#24231) Yash Pratap Singh 2025-09-04 18:20:50 +05:30
16ded21eeb [XPU] support Triton Attention backend on Intel GPU (#24149) Kunshang Ji 2025-09-04 20:41:08 +08:00
2b30afa442 Use hidden_size_per_head as head_size fallback (#24221) nopperl 2025-09-04 20:59:16 +09:00
eafa8dcde6 [Model] Add pp support for hunyuan (#24212) Jiangyun Zhu 2025-09-04 18:58:26 +08:00
6c7af8110a [Doc] Update vLLM Singapore Meetup info (#24234) TJian 2025-09-04 02:58:18 -07:00
8f423e5f43 [Feature][Response API] Add streaming support for non-harmony (#23741) Kebe 2025-09-04 18:49:06 +09:00
369a079568 [Hardware][Apple-CPU] Disable OneDNN build for Apple Silicon (#24200) Ignacio Sica 2025-09-04 06:48:25 -03:00
402759d472 [Attention] FlashAttn MLA (#14258) Lucas Wilkinson 2025-09-04 05:47:59 -04:00
2c301ee2eb [Bugfix] Fix Incremental Detokenization with tokenizers == 0.22.0 (#24159) Fanli Lin 2025-09-04 17:47:08 +08:00
3efb9f4d95 [Attention][Platform] Refactor MLA to support Custom Op (#23332) whx 2025-09-04 17:46:37 +08:00
04f3c35cff Improve flexibility of auto_tune.sh execution. (#23766) anthonsu 2025-09-04 02:41:41 -07:00
51d5e9be7d [Core][Model] Terratorch backend integration (#23513) mgazz 2025-09-04 08:22:41 +01:00
e7fc70016f [Model] Add MiDashengLM model support (#23652) bingchen-mi 2025-09-04 15:08:09 +08:00
12e1e63cc5 [Misc] Enhance output readability of helper script (#24214) Weida Hong 2025-09-04 14:38:26 +08:00
57b1ce94f7 [CPU] Refactor CPU unquantized linear (#24150) Li, Jiang 2025-09-04 14:28:45 +08:00
cb55ad86fe Migrate ultravox inputs to TensorSchema (#23503) Benji Beck 2025-09-03 23:09:11 -07:00
712b273f65 [Refactor] Introduce basic Renderer for completion-style request (#24010) Flora Feng 2025-09-03 22:21:12 -07:00
e919d6f549 [Kernel][Bugfix] Fix grouped topk cu (#24146) Qiming Zhang 2025-09-03 21:37:37 -07:00
a38f8bd54c [Feature][Responses API]Support MCP tools with streaming mode + background mode (#23927) wuhang 2025-09-04 12:05:10 +08:00
b5ee1e3261 Remove deprecated PyNcclConnector (#24151) Peter Pan 2025-09-04 06:49:16 +08:00
36c260dad6 [Feature][gpt-oss] Add support for num_cached_tokens and num_reasoning_tokens tracking (#23460) George Nagy II 2025-09-03 15:08:47 -06:00
a43a3f1770 [Bugfix][DP] DP distribution does not require ray[default] (#23822) Kebe 2025-09-04 05:21:36 +09:00
6adaed42f4 [Feature][P/D]: Optimize NIXL Connector xfer Launch (#23887) WeiQing Chen 2025-09-04 03:14:30 +08:00
a742322092 [Attention] Blackwell FP8 MLA support with CUTLASS_MLA backend (#23289) Matthew Bonanni 2025-09-03 14:05:24 -04:00
731a6940e3 Migrate whisper inputs to TensorSchema (#23505) Benji Beck 2025-09-03 11:04:00 -07:00
e9b92dcd89 [Kernels] Overlap shared experts with send/recv (#23273) bnellnm 2025-09-03 12:35:18 -04:00
fa4311d85f [V1] v1 engine + full CUDA graph support for PLaMo2 (#23998) nopperl 2025-09-04 00:24:02 +09:00
6d80ae83e1 [Bugfix] Fixing division by zero in triton_attn if query_heads/kv_heads > 16 (#23424) Burkhard Ringlein 2025-09-03 17:01:09 +02:00
4ba0c587ba FIX: Add libnuma-dev to Dockerfile for dev stage (#20388) dongbo910220 2025-09-03 22:17:20 +08:00
6997a25ac6 [Model] Remove useless code from MiniMax implementation (#23982) qscqesze 2025-09-03 19:27:04 +08:00
28f350e147 Support add_generation_prompt in embeddings endpoint with chat request (#23931) Jakub Smid 2025-09-03 12:47:55 +02:00
51383bd472 [CI] Accelerate mteb test by setting SentenceTransformers mteb score to a constant (#24088) wang.yuqi 2025-09-03 17:23:56 +08:00
9c99e4871f [Misc] Clean up deadcode for legacy processing pipeline (#24153) Isotr0py 2025-09-03 16:34:29 +08:00
70549c1245 [CI/Build] Serve images used by multimodal tests through local HTTP Server (#23907) dsinghvi 2025-09-03 13:43:11 +05:30
f0c503f66e [Nixl] Heterogeneous TP support FlashInfer (#20189) Nicolò Lucchesi 2025-09-03 09:19:54 +02:00
f38035c123 [distributed][rl] remove nccl cumem env var override (#24141) youkaichao 2025-09-03 14:45:25 +08:00
426cc8629f [BugFix] Fix routed_scaling_factor double mul for dots1 and glm4 MoE models (#24132) Yong Hoon Shin 2025-09-02 21:57:59 -07:00
e81d4e69c1 [Misc] Add check for dual_chunk_attention (#24070) Jiangyun Zhu 2025-09-03 12:19:14 +08:00
02d411fdb2 [Doc]: fix typos in Python comments (#24115) Didier Durand 2025-09-03 06:14:07 +02:00
d7e1e59972 [Doc]: fix typos in Python comments (#24093) Didier Durand 2025-09-03 06:05:45 +02:00
c4ed78b14f [Compile] Fix Compile Warning for w4a8_mm_entry.cu (#23660) Wentao Ye 2025-09-02 23:45:52 -04:00
1bd007f234 fix some typos (#24071) co63oc 2025-09-03 11:44:50 +08:00
136d853e65 [V1] Wrapper which plumbs request-level logits processors into vLLM batch-level logits processing (#23656) afeldman-nm 2025-09-02 22:52:51 -04:00
e32a0e8678 Upgrade xgrammar to 0.1.23 (#22988) Russell Bryant 2025-09-02 22:32:59 -04:00
42dc59dbac Update release pipeline post PyTorch 2.8.0 update (#24073) youkaichao 2025-09-03 10:09:19 +08:00
862f2ef893 [XPU] Fix the bug of LoRA logits on the XPU platform (#24081) Chaojun Zhang 2025-09-03 08:21:18 +08:00
2fd1a40a54 [CI/Build] Disable SiluMul NVFP4 quant fusion tests (#24121) Matthew Bonanni 2025-09-02 19:50:28 -04:00
930a24144c [Bug] R1 Accuracy: Fix routed_scaling_factor Double Mul Issue (#24119) Wentao Ye 2025-09-02 18:22:30 -04:00
457e471971 [AMD][Kernel][Bugfix] Cast offsets tensor bn to tl.int64 to avoid GPU segfault (#23692) rasmith 2025-09-02 17:13:57 -05:00
d328f7894f [CI] Enable all hf transformers baselines in test_hybrid (#23936) Thomas Parnell 2025-09-02 22:15:06 +02:00
98aee612aa [Log] Only Print Profiler Results on Rank 0 (#23370) Wentao Ye 2025-09-02 14:53:34 -04:00
598bd74cf8 Fix weights loading for Apertus (#24100) nathan 2025-09-02 20:34:28 +02:00
2417798471 [Metrics] Deprecate TPOT in favor of ITL (#24110) Mark McLoughlin 2025-09-02 19:10:10 +01:00
9480ae24e3 [Bugfix] Fix packed_factor missing attribute error (#23902) Kyuyeun Kim 2025-09-02 10:56:31 -07:00
f399182e8c Run ruff format on a few files. (#24075) Chenheli Hua 2025-09-02 10:55:32 -07:00
1c41310584 [Bugfix] Fix transform_config parsing in Compressed Tensors (#23945) Kyle Sayers 2025-09-02 13:54:10 -04:00
c83c4ff815 [Benchmark] Add support for local hf dataset path in benchmark (#23999) Jiangyun Zhu 2025-09-03 01:49:16 +08:00
0e1759cd54 [docs] add SYS_NICE cap & security-opt for docker/k8s (#24017) Peter Pan 2025-09-03 01:27:20 +08:00
e66ed3e675 [CI Failure] Skip failing nvfp4 silu test (#23959) Michael Goin 2025-09-02 13:18:15 -04:00
e0653f6c0b [Model] Classification models support logit_bias / sigmoid_normalize (#24031) wang.yuqi 2025-09-03 00:48:57 +08:00
38ba061f6f [BugFix] Fix EXAONE4 rotary embeddings (#23918) Kyungmin Lee 2025-09-02 23:40:55 +09:00
0a74e9d0f2 [Gemma3n] Fix audio batching (#24052) Nicolò Lucchesi 2025-09-02 16:23:35 +02:00
8bd5844989 correct LWS deployment yaml (#23104) Christian Berge 2025-09-02 14:04:59 +02:00
ce30dca5c4 [CI]: reduce HTTP calls inside entrypoints openai tests (#23646) Aziz 2025-09-02 12:49:32 +02:00
2f0bab3f26 [Model] Support dp on ViT on GLM-4.5V (#23168) WeiQing Chen 2025-09-02 18:48:18 +08:00
fad73be1a5 [Doc]: fix typos in Python comments (#24077) Didier Durand 2025-09-02 11:38:55 +02:00
56d04089ef Migrate Interns1 inputs to TensorSchema (#23510) Benji Beck 2025-09-01 21:35:45 -07:00
7be0cb8e9e [XPU][Feature] fp8 online quantization support for XPU (#23148) Yan Ma 2025-09-02 12:06:53 +08:00
1fa1d6a9a0 Migrate OvisImagePatchInputs to TensorSchema (#22024) Benji Beck 2025-09-01 21:01:36 -07:00
d59c986444 Remove runtime checks based on pooling params (#24051) Maximilien de Bayser 2025-09-02 00:54:37 -03:00
04d0c60770 [Bugfix] Fix the issue that Blip2ForConditionalGeneration' object has… (#24028) damon 2025-09-02 11:54:20 +08:00
2b41cbbf03 [V1][Mamba1] - FP32 SSM Kernel Support (#23506) Asaf Joseph Gardin 2025-09-02 06:53:00 +03:00
0235103cbb [Doc]: fix typos in Python comments (#24042) Didier Durand 2025-09-02 04:07:45 +02:00
a344a5aa0a [bugfix]fix MTP hidden states (#24056) Lucia Fang 2025-09-01 14:09:37 -07:00
5685370271 [Chore][V0 Deprecation] Move LogProb to a separate file (#24055) Woosuk Kwon 2025-09-01 12:07:53 -07:00
a0e0efd6bd [Model] Support DP for ViT on Kimi-VL-A3B-Thinking-2506 (#23817) WeiQing Chen 2025-09-02 00:56:56 +08:00
cf91a89dd2 [docs][misc] IOProcessor plugins fixes (#24046) Christian Pinto 2025-09-01 17:17:41 +01:00

... 66 67 68 69 70 ...