Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

56c76c2e0e [Bugfix] clean up duplicated code (#16485) rongfu.leng 2025-04-12 07:19:40 +08:00
c09632a66c Update openai_compatible_server.md (#16507) Christian Sears 2025-04-11 18:54:58 -04:00
a3bf8d4a2b [Kernel] Add tuned FusedMoE kernel config for Llama4 Scout, TP=8 on H100 (#16488) Yong Hoon Shin 2025-04-11 15:26:55 -07:00
16eda8c43a [Frontend] Added chat templates for LLaMa4 pythonic tool calling (#16463) Ye (Charlotte) Qi 2025-04-11 15:26:17 -07:00
cd77382ac1 Improve configs - LoadConfig (#16422) Harry Mellor 2025-04-11 21:27:27 +01:00
71b9cde010 [Bugfix] handle alignment of encoder_seq_lens in mllama.py (#14784) Travis Johnson 2025-04-11 13:59:50 -06:00
5285589f37 [Doc] Document InternVL3 support (#16495) Isotr0py 2025-04-12 03:41:09 +08:00
f41647ee6b [Kernel] Support W8A8 channel-wise weights and per-token activations in triton fused_moe_kernel (#16366) Michael Goin 2025-04-11 11:54:08 -06:00
4d022cbc75 [TPU][V1] Make --disable_chunked_mm_input mandatory for serving MM models (#16483) Nicolò Lucchesi 2025-04-11 19:06:14 +02:00
70de35a881 Fix erroneous "model doesn't support compile" warning (#16486) Richard Zou 2025-04-11 12:24:36 -04:00
34b2cf3b33 [Hardware][Intel-Gaudi] Multi-step scheduling implementation for HPU (#12779) Tomasz Zielinski 2025-04-11 16:38:36 +02:00
9e90c9f73f [Bugfix] Fix bugs of running Quark quantized models (#16236) chaow-amd 2025-04-11 22:18:32 +08:00
e9528f6dc6 [Kernel] support merge_attn_states CUDA kernel, 3x speedup (#16173) DefTruth 2025-04-11 20:50:50 +08:00
51baa9c333 Don't install triton on ppc64le platform (#16470) Harry Mellor 2025-04-11 11:11:00 +01:00
35e076b3a8 [Misc] update api_client example (#16459) Reid 2025-04-11 18:05:40 +08:00
a26f59ccbc [Misc] Raise error for V1 not supporting Long LoRA. (#16415) Jee Jee Li 2025-04-11 16:51:20 +08:00
aa3b3d76e0 Enforce valid max_num_batched_tokens when disable_chunked_mm_input=True (#16447) Michael Goin 2025-04-11 02:09:52 -06:00
f7030df3be [Core][LoRA][1/N] Add LoRA for EncoderDecoderModelRunner (#15990) Jee Jee Li 2025-04-11 15:32:37 +08:00
905e91e9ac Revert "[Model] use AutoWeightsLoader for deepseek_v2, internlm2" (#16453) DefTruth 2025-04-11 14:44:22 +08:00
f8f9c0ba62 [Bugfix] Don't set an upper bound on repetition penalty (#16403) Alex Brooks 2025-04-11 00:19:40 -06:00
dda811021a [CPU][Bugfix] Fix CPU docker issues (#16454) Li, Jiang 2025-04-11 14:19:07 +08:00
93195146ea [Bugfix][VLM] Fix failing Phi-4-MM multi-images tests and add vision-speech test (#16424) Isotr0py 2025-04-11 12:57:16 +08:00
ed37599544 Update supported_hardware.md for TPU INT8 (#16437) Michael Goin 2025-04-10 22:28:07 -06:00
99ef59cf7f [Llama4] Enable attention temperature tuning by default for long context (>32k) (#16439) Yong Hoon Shin 2025-04-10 21:26:07 -07:00
d544d141ec update benchmark_serving_structured_output to include auto backend (#16438) Chenyaaang 2025-04-10 21:25:52 -07:00
3e397a9484 check input length of sonnet samples (#16423) Alexey Belyakov 2025-04-11 03:15:06 +01:00
268c325078 Fix range_ratio Bug in RandomDataset (#16126) WWW 2025-04-11 06:31:17 +08:00
3cc9af88ff [TPU][V1] Disable per-request seed/Generator (#16172) Nicolò Lucchesi 2025-04-10 23:05:44 +02:00
7cd0bd7212 [Bugfix] Fix output token length check logic (#16419) look 2025-04-11 04:16:48 +08:00
56d4aefa33 [VLM] Avoid unnecessary dummy multimodal data during processing (#16416) Cyrus Leung 2025-04-11 03:32:14 +08:00
dd143ef541 [V1] Zero-copy tensor/ndarray serialization/transmission (#13790) Nick Hill 2025-04-10 12:23:14 -07:00
daefed052c [Model] Reduce redundant computations in mamba2 blocks for Bamba-9B (#15423) Chih-Chieh Yang 2025-04-10 15:07:07 -04:00
5fbab20e02 [Bugfix] Fix bug when dataset is json (#15899) Chenyaaang 2025-04-10 11:35:41 -07:00
e8224f3dca [V1][Spec Decode] Eagle Model loading (#16035) Lily Liu 2025-04-10 11:21:48 -07:00
9665313c39 [V1] Set structured output backend to auto by default (#15724) Russell Bryant 2025-04-10 13:53:26 -04:00
0c54fc7273 Improve configs - ParallelConfig (#16332) Harry Mellor 2025-04-10 18:34:37 +01:00
c1b57855ec [TPU][V1] Use language_model interface for getting text backbone in MM (#16410) Nicolò Lucchesi 2025-04-10 19:32:04 +02:00
83b824c8b4 [VLM] Remove BaseProcessingInfo.get_mm_max_tokens_per_item (#16408) Cyrus Leung 2025-04-11 00:06:58 +08:00
7678fcd5b6 Fix the torch version parsing logic (#15857) Lu Fang 2025-04-10 07:37:47 -07:00
8661c0241d [CI] Add auto update workflow for Dockerfile graph (#11879) wineandchord 2025-04-10 21:43:05 +08:00
ce8d6b75fc [doc] update the wrong link (#16401) Reid 2025-04-10 21:02:37 +08:00
61de3ef74b [Model] Remove image mm limit for LLaMa4 (#16365) Ye (Charlotte) Qi 2025-04-10 02:36:27 -07:00
ec1f9c8c91 Update Numba to 0.61.2 (#16376) cyyever 2025-04-10 15:59:37 +08:00
65e09094c4 [doc] add download model tips (#16389) Reid 2025-04-10 15:45:26 +08:00
c70cf0fe06 [Kernel] Use moe_wna16 kernel for compressed tensors wna16 moe models (#16038) Michael Goin 2025-04-10 01:08:47 -06:00
a5d11a54dc [Bugfix] Fix validation error for text-only Mllama 3.2 (#16377) Cyrus Leung 2025-04-10 14:19:42 +08:00
3d4c87758e [Misc] Update transformers version limits of multi-modal tests (#16381) Cyrus Leung 2025-04-10 14:03:33 +08:00
a9bd832fc5 [Model] use AutoWeightsLoader for deepseek_v2, internlm2 (#16383) Aaron Ang 2025-04-10 02:01:00 -04:00
417bcefbae fix sonnet dataset sample when prefix len is very small (#16379) Chenyaaang 2025-04-09 22:35:07 -07:00
baada0e737 [Bugfix][TPU] Fix TPU validate_request (#16369) Michael Goin 2025-04-09 22:55:12 -06:00
82eb61dd4c [misc] use tqdm.auto where appropriate (#16290) Benjamin Kitor 2025-04-09 21:54:54 -07:00
0d4d06fe2f [CI][Bugfix] Pin triton version for CPU (#16384) Roger Wang 2025-04-09 21:35:00 -07:00
4aed0ca6a2 [bugfix] Avoid the time consumption caused by creating dummy videos. (#16371) Jintao 2025-04-10 12:30:05 +08:00
1621b25288 [TPU] Fix dummy loading OOM (#16372) Chengji Yao 2025-04-09 21:06:16 -07:00
a564797151 [Model] use AutoWeightsLoader for granite, granitemoe, granitemoeshared, grok1, mixtral (#16325) Aaron Ang 2025-04-09 23:07:40 -04:00
1da6a09274 [Bugfix]: do not shutdown server if skip_special_use=False for MistralTokenizer (#14094) Guillaume Calmettes 2025-04-10 04:43:09 +02:00
1e44ffc3ff Add GLM-4-0414 support (#16338) Yuxuan Zhang 2025-04-10 09:19:42 +08:00
a454748544 [TPU][V1] Refine tpu_model_runner to mitigate future recompilation issues (#16275) Chengji Yao 2025-04-09 17:51:51 -07:00
1bff42c4b7 [Misc] refactor Structured Outputs example (#16322) Reid 2025-04-10 07:32:42 +08:00
cb391d85dc [Hardware] add platform-specific request validation api (#16291) Joe Runde 2025-04-09 21:50:01 +02:00
fee5b8d37f [Build/CI] Add tracing deps to vllm container image (#15224) Russell Bryant 2025-04-09 15:14:06 -04:00
b2ce859bd2 Fix benchmark_throughput.py --backend=hf (#16352) Michael Goin 2025-04-09 13:09:28 -06:00
566f10a929 [CI]Fix hpu docker and numpy version for CI (#16355) Chendi.Xue 2025-04-09 12:52:26 -05:00
c3b5189137 [Bugfix] catch AssertionError in MistralTokenizer as ValueError (#16344) Guillaume Calmettes 2025-04-09 19:33:24 +02:00
a25866ac8d [Bugfix] Fix profiling.py (#16202) zh Wang 2025-04-10 01:03:34 +08:00
098900d7c2 Revert "Update label-tpu mergify and remove removal bot" (#16350) Michael Goin 2025-04-09 08:59:36 -06:00
98d01d3ce2 [Bugfix][Frontend] respect provided default guided decoding backend (#15476) Guillaume Calmettes 2025-04-09 14:11:10 +02:00
d55244df31 [Model] Add SupportsMultiModal.get_language_model interface (#16007) Nicolò Lucchesi 2025-04-09 13:12:54 +02:00
04149cce27 [BugFix] fix some typos found by typos. (#16314) yihong 2025-04-09 18:43:59 +08:00
24834f4894 update neuron config (#16289) ajayvohra2005 2025-04-09 06:43:22 -04:00
ec7da6fcf3 [BugFix] llama4 qknorm should be not shared across head (#16311) Lucia Fang 2025-04-09 00:59:14 -07:00
819d548e8a [BugFix] logger is not callable (#16312) yihong 2025-04-09 15:59:02 +08:00
477d2a8aa2 Update label-tpu mergify and remove removal bot (#16298) Michael Goin 2025-04-09 01:56:25 -06:00
e484e02857 [Bugfix] Avoid transferring cached multi-modal items from P0 to P1 (#16273) Cyrus Leung 2025-04-09 15:51:27 +08:00
24f6b9a713 [Misc] Fix test_sharded_state_loader.py(#16004) (#16005) Accelerator1996 2025-04-09 14:47:30 +08:00
9cdde47289 [BugFix] Fix fusion test and add them to CI (#16287) Luka Govedič 2025-04-09 02:46:45 -04:00
b1eb4ca152 [TPU] Update PyTorch/XLA (#16288) Chengji Yao 2025-04-08 23:46:32 -07:00
87b4ac56c2 [CI][Bugfix] Fix bad tolerance for test_batch_base64_embedding (#16221) Michael Goin 2025-04-08 22:14:46 -06:00
cb84e45ac7 [Core] Upgrade to xgrammar 0.1.18, add cache size limit (#16283) Russell Bryant 2025-04-08 22:13:22 -04:00
4716377fbc [Feature] Estimate max-model-len use available KV cache memory (#16168) rongfu.leng 2025-04-09 10:12:51 +08:00
4e9cf8c1dd [Bugfix] fix gettid method is not define (#16084) rongfu.leng 2025-04-09 10:12:44 +08:00
2976dc27e9 [Bug] [ROCm] Fix Llama 4 Enablement Bug on ROCm: V0 ROCmFlashAttentionImpl and Triton Fused MoE bugs (#16198) TJian 2025-04-09 10:12:34 +08:00
102bf967f0 [Model] Add smolvlm support (#16017) Chauncey 2025-04-09 10:12:17 +08:00
1f4b09b525 Add support to modelopt quantization of Mixtral model (#15961) yueshen2016 2025-04-08 18:53:31 -07:00
86c3369eb8 [CI/Build] Fix CI LoRA failure (#16270) Jee Jee Li 2025-04-09 09:13:56 +08:00
2755c34a8f [V1] Update structured output offline inference example (#15721) Russell Bryant 2025-04-08 18:34:09 -04:00
db10422184 [Bugfix] fix deepseek fp16 scale bug (#14809) Jinzhen Lin 2025-04-09 04:56:09 +08:00
e1a2c699dd [BugFix] Fix Llama4 - Index Error When Single Request Near Max Context (#16209) Lucas Wilkinson 2025-04-08 14:56:51 -04:00
0115ccd5c0 Add warning that content below line in template will be removed (#16276) Harry Mellor 2025-04-08 19:18:40 +01:00
40b4284fe3 [Bugfix] Handle process_weights_after_loading for QKVCrossParallelLinear (#15328) Isotr0py 2025-04-09 01:02:23 +08:00
4ebc0b9640 [Bugfix] Proper input validation for multi-modal encoder-decoder models (#16156) Cyrus Leung 2025-04-09 00:45:21 +08:00
dc96fd54c6 [Misc] Avoid stripping meaningful whitespace from nvidia-smi topo -m output in collect_env.py (#16272) Kero Liang 2025-04-09 00:08:09 +08:00
1f5d13ab9f [New Model]: jinaai/jina-embeddings-v3 (#16120) wang.yuqi 2025-04-08 23:39:12 +08:00
90cb44eb02 Update to transformers==4.51.1 (#16257) Harry Mellor 2025-04-08 14:53:39 +01:00
e11880deea [Bugfix] Remove triton do_bench fast_flush arg (#16256) Kebe 2025-04-08 21:51:06 +08:00
9351f91be9 [BugFix][ROCm] Fix GGUF MoE Dispatch Block_Dim for ROCm (#16247) TY-AMD 2025-04-08 20:10:26 +08:00
5a1e1c8353 [Model] use AutoWeightsLoader for phimoe,qwen2_moe,qwen3_moe (#16203) rongfu.leng 2025-04-08 19:05:47 +08:00
69ecaa7c79 [Misc] Add warning for multimodal data in LLM.beam_search (#16241) Alex Brooks 2025-04-08 05:05:27 -06:00
7f00899ff7 [Misc] format and refactor some examples (#16252) Reid 2025-04-08 18:42:32 +08:00
995e3d1f41 [Docs] Add Slides from Singapore Meetup (#16213) Simon Mo 2025-04-08 00:20:22 -07:00

... 100 101 102 103 104 ...