Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

ba464e6ae2 Add ORCA endpoint load metrics support (#24905) Misha Efimov 2025-11-03 03:21:31 -05:00
7f4bdadb92 [XPU]Refine Dockerfile.xpu, avoid oneccl dependency issue (#27964) Kunshang Ji 2025-11-03 15:36:59 +08:00
cec7c28833 [Bugfix] Padded Eagle Specdec with Chunked Prefill (#26263) Rémi Delacourt 2025-11-03 08:22:46 +01:00
18961c5ea6 [Hybrid] Pass kernel block size to builders (#27753) Thomas Parnell 2025-11-03 06:48:03 +01:00
470ad118b6 [Frontend] Align finish_reason when tool is called with OpenAI (#25054) Sungyoon Jeong 2025-11-03 13:21:18 +09:00
1bf43ae35d [BugFix][LoRA] use adapter_id instead of id field of lora_request (#27728) Biswa Panda 2025-11-02 18:08:08 -08:00
0ce743f4e1 Fix(llm): Abort orphaned requests when llm.chat() batch fails Fixes #26081 (#27420) Vensen 2025-11-03 00:24:01 +08:00
6c317a656e [Misc] Provide Siglip2 chat template (#27939) Cyrus Leung 2025-11-02 21:42:38 +08:00
00b31a36a2 [V1] [Hybrid] Mamba1 Automatic Prefix Caching (#26377) Asaf Joseph Gardin 2025-11-02 14:16:23 +02:00
73444b7b56 Performance fix MistralTokenizer: cache special ids and tokens (#27925) Julien Denize 2025-11-02 09:48:33 +01:00
853a8eb53b [Bugfix] Fix Qwen Omni audio inference (#27920) Cyrus Leung 2025-11-02 13:06:05 +08:00
758ea2e980 [CI/Build] Fix flaky test_transcription_validation.py::test_basic_audio_gemma (#27924) Ben Browning 2025-11-01 23:45:02 -04:00
685c99ee77 [KV offload] Offloading connector async scheduling support (#27648) Yue Zhang 2025-11-02 05:08:56 +08:00
1e88fb751b Adds anthropic /v1/messages endpoint to openai api_server (#27882) Benjamin Bartels 2025-11-01 19:45:42 +00:00
c2ed069b32 [BugFix] Fix mixed penalties batch with async scheduling (#27910) Nick Hill 2025-11-01 10:51:24 -07:00
af6e19f50f [Core][TPU] Support TPU Data Parallalism (#27365) wenxindongwork 2025-11-01 11:14:44 -06:00
99d69af9ec [Bugfix] Python 3.10 compatibility for Self (#27918) Cyrus Leung 2025-11-01 23:28:54 +08:00
d811b442d3 [Bugfix] DeepSeek V3.2 MTP metadata & CUDA graph issues (#26779) Haco 2025-11-01 22:52:43 +08:00
30a14b034f [V0 deprecation] Remove VLLM_USE_V1 usage in platform and v1 module (#27798) wangxiyuan 2025-11-01 18:17:45 +08:00
799ce45cc1 [Docs] Mock all imports for docs (#27873) Harry Mellor 2025-11-01 10:02:23 +00:00
2c0c7c39bd feat(benchmarks): support HF model names in multi-turn benchmark (#27850) ai-jz 2025-11-01 01:04:52 -07:00
e675118849 [Add] cmdline argument parsing for KV cache offloading modules (#27621) Yihua Cheng 2025-11-01 00:17:07 -07:00
e2347dbf58 [Bugfix] [Model] Missing MRoPE function definition from KeyeForConditionalGeneration (#27895) TJian 2025-10-31 22:45:23 -07:00
879a06579e [CI/Build] Bump transformers version (#27528) Cyrus Leung 2025-11-01 13:11:07 +08:00
29de3cdee4 Adding SplitK in fused_moe_lora kernel (#27818) yugong333 2025-10-31 21:55:46 -07:00
7e2729b57e [Multimodal][XPU]Enable vision attn backend for xpu platform (#27525) Yan Ma 2025-11-01 12:45:02 +08:00
3a5de7d2d6 [Bugfix] Fix KDA output (#27905) Jee Jee Li 2025-11-01 11:54:36 +08:00
bc4486d609 [Kernel] Enable FusedMoEModularKernel support bias (#27754) Jee Jee Li 2025-11-01 10:05:12 +08:00
0cdbe7b744 [Core] Async scheduling + structured outputs compatibility (#26866) Nick Hill 2025-10-31 17:35:04 -07:00
df334868ca [Hybrid] A simpler algorithm to find kernel_block_size (#26476) Chen Zhang 2025-10-31 14:30:28 -07:00
0e0a638c3b Batch invariance doc (#27839) Bram Wasti 2025-10-31 17:22:19 -04:00
f29aeb5a25 Add FLASHINFER_MLA to test_mla_backends and add B200 CI run (#27663) Matthew Bonanni 2025-10-31 14:12:19 -04:00
5e8862e9e0 [Feature] Pydantic validation for scheduler.py and structured_outputs.py (#26519) Vinay R Damodaran 2025-10-31 11:05:50 -07:00
9e5bd3076e [Cleanup] Remove no-longer-used SpeculativeConfig.enable_chunked_prefill (#27826) Nick Hill 2025-10-31 10:57:45 -07:00
fc16f1c477 Flashinfer_CUTLASS_MOE fuses quantization for TP (#27223) Shu Wang 2025-10-31 10:54:29 -07:00
bc306fe5e9 fix incorrect type annotation in KimiMLP (#27885) ZiTian Zhao 2025-11-01 01:38:02 +08:00
103a468bbf [bugfix] Missing cached item in beam search (#27874) Chenguang Zheng 2025-11-01 01:34:27 +08:00
70bfbd7b16 Docs update tpu install instructions (#27824) Rob Mulla 2025-10-31 13:29:55 -04:00
d6517be3cd [Bugfix] Missing NIXL metadata for handshake initialization if instance spans multi-node (#26338) GuanLuo 2025-11-01 01:16:00 +08:00
7e06c40e63 [Bugfix] Fix broken MRoPE for GLM-4.1V/GLM-4.5V (#27860) Isotr0py 2025-11-01 01:04:51 +08:00
675704ac01 [Bugfix] Allow 64-bit integer values for LoRA IDs to avoid overflow/truncation (#27876) Madeesh Kannan 2025-10-31 17:58:42 +01:00
0384aa7150 [CI/Build] Add gpt-oss LoRA test (#27870) Jee Jee Li 2025-10-31 22:17:21 +08:00
3857eb8725 [Perf] Decouple torch op from GDA to leverage torch.compile (#27871) Jiangyun Zhu 2025-10-31 21:35:52 +08:00
933cdea440 [BugFix] Don’t compute reorder threshold when there are no attention groups (#27861) Huamin Li 2025-10-31 04:36:18 -07:00
3933f18a5e [Bugfix] Avoid too small block m/n for FlexAttention kernel option (#27853) Isotr0py 2025-10-31 19:33:12 +08:00
e5ef4dfc11 [Kimi-Linear] Correct prefixes and add compatibility to AWQ quants (#27834) toncao 2025-10-31 16:36:37 +07:00
36960501d3 [Hardware][Powerpc] Fix VLLM_CPU_OMP_THREADS_BIND="auto" low CPU utilization for Power (#27734) Akash kaothalkar 2025-10-31 13:15:26 +05:30
b2e65cb4a7 [benchmark] Make request IDs unique across clients by default (#27723) Seiji Eicher 2025-10-30 19:40:35 -05:00
2bf0bcc1fc [CI Test] Add Scheduled Integration Test (#27765) Wentao Ye 2025-10-30 20:29:26 -04:00
697f507a8e [CI/Build][Intel] Enable performance benchmarks for Intel Gaudi 3 (#26919) Jakub Sochacki 2025-10-31 00:57:22 +01:00
d5d2a0fe74 [Misc] Make all tool scripts executable (#27831) Matthew Bonanni 2025-10-30 19:46:02 -04:00
c9791f1813 [BugFix] Fix broken import in initialize_ray_cluster() (#27838) Nick Hill 2025-10-30 16:26:13 -07:00
e7acb20076 [Feature] Batch invariant torch.compile (#27660) Paul Zhang 2025-10-30 16:11:29 -04:00
4b68c4a55b [Core][Perf] Only invoke save_new_computed_blocks when computed blocks are not empty (#27799) Jialin Ouyang 2025-10-30 12:47:30 -07:00
a8141fa649 [Refactor] Remove VLLM_DEEPEP_LOW_LATENCY_ALLOW_NVLINK (#27750) Wentao Ye 2025-10-30 15:32:39 -04:00
4917002523 [Fix] Skip record_sleep_state logic in PrometheusStatsLogger if not in dev mode (#27789) Sumanth R Hegde 2025-10-30 12:26:27 -07:00
a2981c4272 [EP/DP][API Server] Enable DP-aware routing in OpenAI API requests (#24945) cong-meta 2025-10-30 12:10:16 -07:00
4574d48bab [Core][Bookkeeping] Update cu_num_accepted_tokens for all req_index (#27629) Jialin Ouyang 2025-10-30 11:52:36 -07:00
ab98f6556f [Bugfix] Fix 2 precommit issues - (mamba_block_size, kv_cache_config) (#27811) Tyler Michael Smith 2025-10-30 14:52:18 -04:00
2918c1b49c [Model] Use the same fused_moe configs for all H200 devices (#23642) v0.11.1rc5 Roger Meier 2025-10-31 01:36:56 +08:00
1004205795 [MTP] Refactor mtp predictor to avoid d2h operation (#27643) Mengqing Cao 2025-10-31 01:27:39 +08:00
ba33e8830d Reapply "Install pre-built xformers-0.0.32.post2 built with pt-2.9.0" (#27768) Huy Do 2025-10-30 10:22:30 -07:00
33a0ea5f32 [Docs] add Shanghai Meetup - 2025/10 (#27545) Kebe 2025-10-31 01:33:13 +09:00
60f76baa66 [Misc] Replace CUDA_VISIBLE_DEVICES in DP with torch.cuda.set_device for device selection on cuda-like devices (#27564) Ilya Markov 2025-10-30 16:41:44 +01:00
e5e076cad7 [BugFix] Stopgap - Flashinfer Autotuner + GPT-OSS + DP/TP (#27762) Varun Sundar Rabindranath 2025-10-30 11:24:31 -04:00
eebf00cb0c [Bugfix][CPU] Fix MRoPE dispatch on the CPU backend (#27800) Li, Jiang 2025-10-30 23:12:05 +08:00
9956aae4ea [Model][Ouro] Support Ouro Model (#27794) Fan Yin 2025-10-30 22:34:41 +08:00
0fe0140408 [KV offload] Enable CPU KV offload on CUDA alike Platforms (#27770) Zhewen Li 2025-10-30 07:10:29 -07:00
4e68cc9b6a [Model] Introduce Kimi Linear to vLLM (#27809) Zhiyuan Li 2025-10-30 21:02:27 +08:00
1994de99ea [CI Failure] Fix test_kv_cache_model_load_and_run (#27717) Huamin Li 2025-10-30 05:27:53 -07:00
4464723f22 [Frontend][Doc][5/N] Improve all pooling task | Polish encode (pooling) api & Document. (#25524) wang.yuqi 2025-10-30 20:13:05 +08:00
74374386e2 [Bugfix] Improve GPU validation logging in Ray fallback scenarios (#25775) Sairam Pillai 2025-10-30 17:27:59 +05:30
c01f6e525f [CI] Fix mypy for vllm/v1/core and vllm/v1/engine (#27108) Wentao Ye 2025-10-30 07:32:17 -04:00
c7d2a554ba [CI Failure] fix test_default_mm_loras (#27795) Huamin Li 2025-10-30 03:13:03 -07:00
af826e0820 [V0 deprecation] Remove VLLM_USE_V1 usage in config module (#27784) wangxiyuan 2025-10-30 17:42:49 +08:00
e806178d2a [BugFix][VL] Fix FA selection on Qwen2.5-VL (#27790) Zhewen Li 2025-10-30 00:54:44 -07:00
5be1bed790 [CI/Build]Add eval config for Qwen3-235B-A22B-Instruct-2507-FP8 (#27113) Huamin Li 2025-10-30 00:50:56 -07:00
31b55ffc62 use stringData in secret yaml to store huggingface token (#25685) yitingdc 2025-10-30 15:47:36 +08:00
ded8ada86a Add more dims for batch invariant shims (#27489) Bram Wasti 2025-10-30 01:28:45 -04:00
8bff831f0a [Benchmark] Cleanup deprecated nightly benchmark and adjust the docstring for performance benchmark (#25786) Kuntai Du 2025-10-29 21:43:37 -07:00
b5d70751d8 [BugFix] Reordering extend logic fix (#27739) Lucas Wilkinson 2025-10-30 12:39:34 +08:00
b8c48c5d72 kernels/moe test pruning (#27053) Fardin Hoque 2025-10-29 21:10:34 -07:00
17d055f527 [Feat] Adds runai distributed streamer (#27230) Benjamin Bartels 2025-10-30 04:09:10 +00:00
2ce5c5d3d6 [BugFix] Handle unscheduled requests properly when async scheduling (#27756) Nick Hill 2025-10-29 21:04:25 -07:00
b5bae42f91 [XPU] Update latest IPEX 2.8 release (#27735) Kunshang Ji 2025-10-30 11:17:13 +08:00
d7fb10c574 [Bugfix] mamba-block-size is set for vision language model (#27773) Chen Zhang 2025-10-29 19:39:57 -07:00
b798e39f93 [XPU][bugfix] fix rope for llama4 and deepseek (#25145) Yan Ma 2025-10-30 09:43:13 +08:00
48eb8eba58 [Temp fix] Disable torch.compile for Qwen2.5 VL's VisionBlock temporarily. (#27760) Chenheli Hua 2025-10-29 16:17:48 -07:00
b5d90f7400 [Bug] Fix DBO IMA issue for DeepEPHT (#27666) Wentao Ye 2025-10-29 16:28:27 -04:00
d4aa144343 [BugFix] Fix handling of resumed reqs in SharedStorageConnector (#27719) Nick Hill 2025-10-29 13:16:52 -07:00
fcb1d570bb [Bug] Fix DeepEP low latency assert self.batched_router_logits.size(-1) == full_router_logits.size(-1) Bug (#27682) Wentao Ye 2025-10-29 14:50:39 -04:00
accb8fab07 [KVConnector] Add metrics to Prometheus-Grafana dashboard (#26811) Nicolò Lucchesi 2025-10-29 19:44:49 +01:00
5b0448104f [Bug] Raise error explicitly if using incompatible backend (#27424) Wentao Ye 2025-10-29 13:29:20 -04:00
f7a6682872 [CI/Build] Test torchrun with 8 cards (#27548) 22quinn 2025-10-29 10:26:06 -07:00
a9fe0793f2 use_aot_compile should respect VLLM_DISABLE_COMPILE_CACHE (#27698) Boyuan Feng 2025-10-29 10:08:54 -07:00
7568a282b9 [FIXBUG] Qwen3VL hallucinations without Contiguous on Torch.SDPA (#27744) JartX 2025-10-29 17:55:35 +01:00
1da3309ace [Core] Exposing engine sleep & wake_up state as prometheus metrics (#24176) Braulio Dumba 2025-10-29 12:32:01 -04:00
5522fb274b [Chore] Optimize P2PNCCLEngine http_address (#27488) Wentao Ye 2025-10-29 12:05:09 -04:00
0f95a1c3f2 [CI] Fix flaky test_two_responses_with_same_prev_id test (#27745) Nicolò Lucchesi 2025-10-29 16:10:35 +01:00
ded24e3e54 [ROCm][Platform] Add MI308X device id in _ROCM_DEVICE_ID_NAME_MAP (#27623) Xiake Sun 2025-10-29 22:44:03 +08:00

... 48 49 50 51 52 ...