Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

32d4b669d0 [BugFix][V1] Fix int32 token index overflow when preparing input ids (#16806) Yong Hoon Shin 2025-04-23 12:12:35 -07:00
3cde34a4a4 [Frontend] Support guidance:no-additional-properties for compatibility with xgrammar (#15949) Travis Johnson 2025-04-23 12:34:41 -06:00
bdb3660312 Use @property and private field for data_parallel_rank_local (#17053) Harry Mellor 2025-04-23 16:50:08 +01:00
f3a21e9c68 CacheConfig.block_size should always be int when used (#17052) Harry Mellor 2025-04-23 16:50:05 +01:00
8e630d680e Improve Transformers backend model loading QoL (#17039) Harry Mellor 2025-04-23 15:33:51 +01:00
af869f6dff [CI] Update structured-output label automation (#17055) Russell Bryant 2025-04-23 10:33:14 -04:00
53c0fa1e25 Ensure that pid passed to kill_process_tree is int for mypy (#17051) Harry Mellor 2025-04-23 15:32:26 +01:00
f7912cba3d [Doc] Add top anchor and a note to quantization/bitblas.md (#17042) Michael Yao 2025-04-23 22:32:16 +08:00
6317a5174a Categorize tests/kernels/ based on kernel type (#16799) Michael Goin 2025-04-23 07:21:07 -06:00
aa72d9a4ea Mistral-format support for compressed-tensors (#16803) Michael Goin 2025-04-23 06:46:23 -06:00
ce17db8085 [CI] Run v1/test_serial_utils.py in CI (#16996) Russell Bryant 2025-04-23 04:13:34 -04:00
8c87a9ad46 [Bugfix] Fix AssertionError: skip_special_tokens=False is not supported for Mistral tokenizers (#16964) Chauncey 2025-04-23 15:24:09 +08:00
ec69124eb4 [Misc] Improve readability of get_open_port function. (#17024) huafeng 2025-04-23 14:16:53 +08:00
d0da99fb70 [BugFix] llama4 fa3 fix - RuntimeError: scheduler_metadata must have shape (metadata_size) (#16998) Lucas Wilkinson 2025-04-23 00:49:24 -04:00
b2f195c429 [V1] Avoid socket errors during shutdown when requests are in in-flight (#16807) Nick Hill 2025-04-22 21:36:29 -07:00
047797ef90 [Bugfix] Triton FA function takes no keyword arguments (#16902) vllmellm 2025-04-23 12:35:24 +08:00
eb8ef4224d [doc] add download path tips (#17013) Reid 2025-04-23 12:06:30 +08:00
56a735261c [INTEL-HPU][v0] Port delayed sampling to upstream (#16949) Chendi.Xue 2025-04-22 22:14:11 -05:00
e1cf90e099 [misc] tune some env vars for GB200 (#16992) youkaichao 2025-04-23 10:59:48 +08:00
6bc1e30ef9 Revert "[Misc] Add S3 environment variables for better support of MinIO." (#17021) Chauncey 2025-04-23 10:22:29 +08:00
7e081ba7ca [BugFix] Revert ROCm Custom Paged Attention Env Flag Check (#17022) vllmellm 2025-04-23 10:17:48 +08:00
1e013fa388 [V1][DP] More robust DP/EP dummy request coordination (#16277) Nick Hill 2025-04-22 19:12:15 -07:00
bc7c4d206b [Kernel][ROCM] Upstream prefix prefill speed up for vLLM V1 (#13305) Aleksandr Malyshev 2025-04-22 19:11:56 -07:00
f67e9e9f22 add Dockerfile build vllm against torch nightly (#16936) Yang Wang 2025-04-22 19:08:27 -07:00
36fe78769f [Bugfix] validate urls object for multimodal content parts (#16990) Guillaume Calmettes 2025-04-23 03:43:06 +02:00
83d933718c [Core][V1][TPU] Enable structured decoding on TPU V1 (#16499) Chenyaaang 2025-04-22 17:05:23 -07:00
5175b884f7 [BugFix] Remove default multiproc executor collective_rpc timeout (#17000) Nick Hill 2025-04-22 16:27:14 -07:00
5536b30a4c Fencing Kernels Tests for enabling on AMD (#16929) Alexei-V-Ivanov-AMD 2025-04-22 11:32:40 -05:00
7f58fb9718 Add assertion for no objects while hashing hf_config (#16930) Richard Zou 2025-04-22 12:32:22 -04:00
30bc3e0f66 [FEAT][ROCm]: Support AITER MLA (#15893) vllmellm 2025-04-23 00:31:13 +08:00
f34410715f [frontend] enhance tool_calls type check (#16882) Reid 2025-04-22 23:40:24 +08:00
68d4c33202 [Misc] Add S3 environment variables for better support of MinIO. (#16977) Chauncey 2025-04-22 22:27:36 +08:00
f961d7f6ef [BugFix] Pass in correct VLLM config in FlashInfer backend (#13207) (#16973) Zhengyuan Su (苏政渊) 2025-04-22 21:44:10 +08:00
d059110498 Improve configs - SpeculativeConfig (#16971) Harry Mellor 2025-04-22 13:55:36 +01:00
571e8dd65e [Bugfix] Fix distributed bug again in Qwen2.5-VL & Qwen2.5-Omni (#16974) Yang Fan 2025-04-22 20:23:17 +08:00
4b91c927f6 [Misc] refactor example series (#16972) Reid 2025-04-22 19:44:21 +08:00
0e237f0035 [FEAT][ROCm] Integrate Paged Attention Kernel from AITER (#15001) vllmellm 2025-04-22 17:46:28 +08:00
8f7bace7c3 [Doc] Improve documentation for multimodal CLI args (#16960) Cyrus Leung 2025-04-22 16:35:35 +08:00
e4d6144232 [BugFix] Fix incremental detokenization perf issue (#16963) Nick Hill 2025-04-22 01:16:19 -07:00
8d32dc603d [Kernel] Support Microsoft Runtime Kernel Lib for our Low Precision Computation - BitBLAS (#6036) Lei Wang 2025-04-22 16:01:36 +08:00
c4ab9f3e71 [V1] Remove pre-allocation for KV cache (#16941) Woosuk Kwon 2025-04-22 00:52:18 -07:00
2689d5c027 [Model] Use autoweightloader for mamba (#16950) Flora Feng 2025-04-22 00:48:15 -07:00
acba33a0f1 [Bugfix] Fix the issue where llm.generate cannot be called repeatedly after setting GuidedDecodingParams (#16767) Chauncey 2025-04-22 14:02:20 +08:00
a114bf20a3 [Perf] Optimize _update_states for GPU model runner (#16910) SnowCharm 2025-04-22 14:01:54 +08:00
3097ce3a32 [Doc] Update ai_accelerator/hpu-gaudi.inc.md (#16956) Michael Yao 2025-04-22 13:33:27 +08:00
d6da9322c8 [Bugfix] Fix f-string for Python 3.9-3.11 (#16962) Cyrus Leung 2025-04-22 12:45:55 +08:00
71ce44047f Support S3 Sharded loading with RunAI Model Streamer (#16317) omer-dayan 2025-04-22 07:21:49 +03:00
188b7f9b8c [Performance][ROCm] Add skinny gemms for unquantized linear on ROCm (#15830) Charlie Fu 2025-04-21 22:46:22 -05:00
b9b4746950 [V1] Remove additional_config check (#16710) wangxiyuan 2025-04-22 11:45:27 +08:00
7b8a2ab76f [Kernel] Add expert_map support to Cutlass FP8 MOE (#16861) Varun Sundar Rabindranath 2025-04-21 23:44:32 -04:00
c9acbf1141 [Misc] Remove the chunked prefill warning for LoRA (#16925) Jee Jee Li 2025-04-22 11:44:24 +08:00
5b794cae8d [ROCm] Add aiter tkw1 kernel for Llama4 fp8 (#16727) kliuae 2025-04-22 11:42:34 +08:00
0e4254492f [Bugfix]: fix issue with n>1 sampling on v1 requests overriding each other (#16863) Jeffrey Li 2025-04-21 23:40:19 -04:00
1311913f55 [BugFix][Spec Decode] No in-place update to draft probs (#16952) Woosuk Kwon 2025-04-21 19:54:19 -07:00
29f395c97c [Doc] Remove unnecessary V1 flag (#16924) Cyrus Leung 2025-04-22 09:04:38 +08:00
fa3bba2a53 [TPU][V1] Enable Top-P (#16843) Nicolò Lucchesi 2025-04-22 02:46:07 +02:00
986537f1c3 [V1] V1 FlashInfer Attention (#16684) Michael Goin 2025-04-21 18:38:41 -06:00
210207525e [TPU][V1] Capture multimodal encoder during model compilation (#15051) Nicolò Lucchesi 2025-04-22 02:36:59 +02:00
71eda0bb76 Update Qwen1.5-MoE-W4A16-compressed-tensors.yaml (#16946) Michael Goin 2025-04-21 18:35:32 -06:00
471fe65630 [TPU][V1] Implicitly adjust page size when there's SMEM OOM (#16871) Chengji Yao 2025-04-21 14:43:13 -07:00
3a0fba5cf4 [V1][Spec Decode] Handle draft tokens beyond max_model_len (#16087) Woosuk Kwon 2025-04-21 12:38:50 -07:00
299ebb62b2 [Core] Speed up decode by remove synchronizing operation in sampler (#16436) Chanh Nguyen 2025-04-21 11:18:22 -07:00
f728ab8e35 [Doc] mention how to install in CPU editable mode (#16923) David Xia 2025-04-21 13:45:51 -04:00
63e26fff78 [doc] install required python3-dev apt package (#16888) David Xia 2025-04-21 12:15:18 -04:00
fe3462c774 [XPU][Bugfix] minor fix for XPU (#15591) Yan Ma 2025-04-22 00:02:57 +08:00
3b34fd5273 Raise error for data-parallel with benchmark_throughput (#16737) Kartik Ramesh 2025-04-21 10:51:43 -05:00
55d6d3fdb8 [Bugfix] Fix GLM rotary_dim issue and support v1 (#16912) Isotr0py 2025-04-21 22:26:34 +08:00
7272bfae77 [Misc] Refactor platform to get device specific stream and event (#14411) Shanshan Shen 2025-04-21 21:25:49 +08:00
d9ac9e3dc5 [Misc] fix collect_env version parse (#15267) wangxiyuan 2025-04-21 20:29:40 +08:00
d41faaf9df Restore buffers when wake up from level 2 sleep (#16564) (#16889) Han Zhang 2025-04-21 20:18:28 +08:00
b34f33438a [Doc] Split dummy_processor_inputs() in Multimodal Docs (#16915) Alex Brooks 2025-04-21 05:10:01 -06:00
26c0406555 [Bugfix] Fix distributed bug in Qwen2.5-VL & Qwen2.5-Omni (#16907) Yang Fan 2025-04-21 18:25:21 +08:00
4c41278b77 [CI/CD][V1] Add spec decode tests to CI (#16900) Woosuk Kwon 2025-04-20 22:37:16 -07:00
bb3605db85 [Bugfix] Fix v1/spec_decode/test_ngram.py (#16895) qizixi 2025-04-20 20:54:29 -07:00
fe742aef5a [easy] Pass compile_fx only the config patches (#16845) Richard Zou 2025-04-20 00:25:19 -04:00
4b07d36891 Improve configs - CacheConfig (#16835) Harry Mellor 2025-04-20 05:25:04 +01:00
87aaadef73 Serialize tensors using int8 views (#16866) Staszek Paśko 2025-04-19 19:28:34 +02:00
682e0b6d2f Log how much time loading a compiled artifact takes (#16848) Richard Zou 2025-04-19 12:50:46 -04:00
d6195a748b [doc] update hyperlink (#16877) Reid 2025-04-20 00:40:38 +08:00
205d84aaa9 [VLM] Clean up models (#16873) Cyrus Leung 2025-04-19 20:13:06 +08:00
5124f5bf51 [Model] Qwen2.5-Omni Cleanup (#16872) Roger Wang 2025-04-19 02:37:02 -07:00
83f3c3bd91 [Model] Refactor Phi-4-multimodal to use merged processor and support V1 (#15477) Isotr0py 2025-04-19 17:26:11 +08:00
d9737ca1c6 [V1][Misc] stop update prefix cache stats when logs_stats is disabled (#16460) vie-serendipity 2025-04-19 17:25:19 +08:00
9d4ca19d50 [Misc] Benchmarks for audio models (#16505) Nicolò Lucchesi 2025-04-19 11:24:14 +02:00
2ef0dc53b8 [Frontend] Add sampling params to v1/audio/transcriptions endpoint (#16591) Nicolò Lucchesi 2025-04-19 09:03:54 +02:00
1d4680fad2 [rocm][MI300] llama4 maverick fp8 moe config tp8 (#16847) Divakar Verma 2025-04-19 01:21:43 -05:00
2c1bd848a6 [Model][VLM] Add Qwen2.5-Omni model support (thinker only) (#15130) Yang Fan 2025-04-19 14:14:36 +08:00
5c9121203c [release] Publish neuron docker image (#16733) omrishiv 2025-04-19 01:11:25 +01:00
490b1698a5 [Doc] Updated Llama section in tool calling docs to have llama 3.2 config info (#16857) Justin Ho 2025-04-18 19:28:53 -04:00
5a5e29de88 [Misc] refactor examples series - Chat Completion Client With Tools (#16829) Reid 2025-04-19 07:24:42 +08:00
3d3ab3689f [New Model]: Snowflake Arctic Embed (Family) (#16649) wang.yuqi 2025-04-18 23:11:57 +08:00
686623c5e7 Fix nullable_kvs fallback (#16837) Harry Mellor 2025-04-18 13:58:39 +01:00
aadb656562 [Misc] Clean up Kimi-VL (#16833) Cyrus Leung 2025-04-18 20:15:09 +08:00
87e067de41 [Model] use AutoWeightsLoader for BigCode, GPT-J (#16823) Jonghyun Choe 2025-04-18 19:42:41 +09:00
26507f8973 [Docs] Fix a link and grammar issue in production-stack.md (#16809) Michael Yao 2025-04-18 14:42:58 +08:00
9c1d5b456d [Doc] add podman setup instructions for official image (#16796) Nathan Weinberg 2025-04-18 02:10:49 -04:00
e31045f95c [Bugfix] fix pp for llama4 (#16746) Lucia Fang 2025-04-17 22:51:30 -07:00
aaec845f8e [ROCm] [Attention] Cleanup ROCm output passing (#16431) Luka Govedič 2025-04-18 01:46:45 -04:00
7bdfd29a35 [Misc] add collect_env to cli and docker image (#16759) rongfu.leng 2025-04-18 13:13:35 +08:00
e78587a64c Improve-mm-and-pooler-and-decoding-configs (#16789) Harry Mellor 2025-04-18 06:13:32 +01:00

... 98 99 100 101 102 ...