biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Keyun Tong	8db1b9d0a1	Support SSL Key Rotation in HTTP Server (#13495 )	2025-02-22 05:17:44 -08:00
youkaichao	2382ad29d1	[ci] fix linter (#13701 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-22 20:28:59 +08:00
youkaichao	3e472d882a	[core] set up data parallel communication (#13591 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-22 19:28:59 +08:00
Cyrus Leung	7f6bae561c	[CI/Build] Fix pre-commit errors (#13696 )	2025-02-22 00:31:26 -08:00
Jee Jee Li	105b8ce4c0	[Misc] Reduce LoRA-related static variable (#13166 )	2025-02-22 00:21:30 -08:00
Mark McLoughlin	2cb8c1540e	[Metrics] Add `--show-hidden-metrics-for-version` CLI arg (#13295 )	2025-02-22 00:20:45 -08:00
Mark McLoughlin	1cd981da4f	[V1][Metrics] Support `vllm:cache_config_info` (#13299 )	2025-02-22 00:20:00 -08:00
Yu Chin Fabian Lim	fca20841c2	Correction to TP logic for Mamba Mixer 2 when Num Groups not divisible by TP Size (#13660 )	2025-02-22 00:19:10 -08:00
Jennifer Zhao	da31b5333e	[Bugfix] V1 Memory Profiling: V0 Sampler Integration without Rejection Sampler (#13594 ) Signed-off-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2025-02-22 00:08:29 -08:00
Lu Fang	bb78fb318e	[v1] Support allowed_token_ids in v1 Sampler (#13210 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-02-22 14:13:05 +08:00
Robin	8aca27fa11	[Bugfix] Fix benchmark script bug: inaccurate stats for vllm backend when max_model_len < input_len + output_len (#13691 ) Signed-off-by: WangErXiao <863579016@qq.com>	2025-02-22 14:10:38 +08:00
Dipika Sikka	95c617e04b	[Misc] Bump compressed-tensors (#13619 )	2025-02-21 22:09:04 -08:00
Shane A	9a1f1da5d1	[Bugfix][Model] OLMo 2: split qkv correctly for GQA and MQA (#13687 )	2025-02-21 22:07:45 -08:00
Gordon Wong	68d630a0c7	[ROCM] fix native attention function call (#13650 )	2025-02-21 22:07:04 -08:00
Jun Duan	68d535ef44	[Misc] Capture and log the time of loading weights (#13666 )	2025-02-21 22:06:34 -08:00
Robin	c6ed93860f	[Bugfix][API Server] Fix invalid usage of 'ge' and 'le' in port valid… (#13672 )	2025-02-21 22:05:28 -08:00
Keyun Tong	0ffdf8ce0c	[HTTP Server] Make model param optional in request (#13568 )	2025-02-21 21:55:50 -08:00
Yuan Tang	8c0dd3d4df	docs: Add a note on full CI run in contributing guide (#13646 )	2025-02-21 21:53:59 -08:00
Isotr0py	ada7c780d5	[Misc] Fix yapf linting tools etc not running on pre-commit (#13695 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-02-22 13:10:43 +08:00
Lucas Wilkinson	288cc6c234	[Attention] MLA with chunked prefill (#12639 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Patrick Horn <patrick.horn@gmail.com> Co-authored-by: simon-mo <xmo@berkeley.edu> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-02-21 15:30:12 -08:00
John Zheng	900edbfa48	fix typo of grafana dashboard, with correct datasource (#13668 ) Signed-off-by: John Zheng <john.zheng@hp.com>	2025-02-21 18:21:05 +00:00
Isotr0py	b2c3fc5d65	[Bugfix][CPU] Fix cpu all-reduce using native pytorch implementation (#13586 )	2025-02-20 22:24:17 -08:00
leoneo	839b27c6cc	[Kernel]Add streamK for block-quantized CUTLASS kernels (#12978 )	2025-02-20 22:14:24 -08:00
Kevin H. Luu	34ad27fe83	[ci] Fix metrics test model path (#13635 )	2025-02-20 22:12:10 -08:00
Gabriel Marinho	1c3c975766	[FEATURE] Enables /score endpoint for embedding models (#12846 )	2025-02-20 22:09:47 -08:00
Szymon Ożóg	1cdc88614a	Missing comment explaining VDR variable in GGUF kernels (#13290 )	2025-02-20 22:06:54 -08:00
Nick Hill	31aa045c11	[V1][Sampler] Avoid an operation during temperature application (#13587 )	2025-02-20 22:05:56 -08:00
Roger Wang	a30c093502	[Bugfix] Add `mm_processor_kwargs` to chat-related protocols (#13644 )	2025-02-20 22:04:33 -08:00
Harry Mellor	c7b07a95a6	Use pre-commit to update `requirements-test.txt` (#13617 )	2025-02-20 22:03:27 -08:00
Kaixi Hou	27a09dc52c	[NVIDIA] Fix an issue to use current stream for the nvfp4 quant (#13632 )	2025-02-20 22:01:48 -08:00
Edwin Hernandez	981f3c831e	[Misc] Adding script to setup ray for multi-node vllm deployments (#12913 )	2025-02-20 21:16:40 -08:00
Kante Yin	44c33f01f3	Add llmaz as another integration (#13643 ) Signed-off-by: kerthcet <kerthcet@gmail.com>	2025-02-21 03:52:40 +00:00
Lingfan Yu	33170081f1	[Neuron][Kernel] Vectorize KV cache load in FlashPagedAttention to maximize DMA bandwidth (#13245 ) Signed-off-by: Lingfan Yu <lingfany@amazon.com>	2025-02-20 17:45:45 -08:00
Michael Goin	71face8540	[Bugfix] Fix max_num_batched_tokens for MLA (#13620 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-02-20 17:45:20 -08:00
Joe Runde	bfbc0b32c6	[Frontend] Add backend-specific options for guided decoding (#13505 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2025-02-20 15:07:58 -05:00
ajayvohra2005	6a417b8600	fix neuron performance issue (#13589 )	2025-02-20 10:59:36 -08:00
Woosuk Kwon	d3ea50113c	[V1][Minor] Print KV cache size in token counts (#13596 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-02-20 09:24:31 -08:00
Harry Mellor	34aad515c8	Update `pre-commit`'s `isort` version to remove warnings (#13614 )	2025-02-20 08:00:14 -08:00
chenxiaobing	ed6e9075d3	[Bugfix] Fix deepseekv3 grouped topk error (#13474 ) Some checks failed Create Release / Create Release (push) Has been cancelled Details Signed-off-by: Chen-XiaoBing <chenxb002@whu.edu.cn> v0.7.3	2025-02-20 06:47:01 -08:00
Harry Mellor	992e5c3d34	Merge similar examples in `offline_inference` into single `basic` example (#12737 )	2025-02-20 04:53:51 -08:00
Varun Sundar Rabindranath	b69692a2d8	[Kernel] LoRA - Refactor sgmv kernels (#13110 )	2025-02-20 07:28:06 -05:00
Kevin H. Luu	a64a84433d	[2/n][ci] S3: Use full model path (#13564 ) Signed-off-by: <>	2025-02-20 01:20:15 -08:00
Kevin H. Luu	aa1e62d0db	[ci] Fix spec decode test (#13600 )	2025-02-20 16:56:00 +08:00
Michael Goin	497bc83124	[CI/Build] Use uv in the Dockerfile (#13566 )	2025-02-19 23:05:44 -08:00
Yuan Tang	3738e6fa80	[API Server] Add port number range validation (#13506 ) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-20 15:05:13 +08:00
Gregory Shtrasberg	0023cd2b9d	[ROCm] MI300A compile targets deprecation (#13560 )	2025-02-19 23:05:00 -08:00
燃	041e294716	[Misc] add mm_processor_kwargs to extra_body for Qwen2.5-VL (#13533 )	2025-02-19 23:04:30 -08:00
Alex Brooks	9621667874	[Misc] Warn if the vLLM version can't be retrieved (#13501 ) Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>	2025-02-20 06:24:48 +00:00
Simon Mo	8c755c3b6d	[bugfix] spec decode worker get tp group only when initialized (#13578 )	2025-02-20 04:46:28 +00:00
youkaichao	ba81163997	[core] add sleep and wake up endpoint and v1 support (#12987 ) Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: cennn <2523403608@qq.com> Co-authored-by: cennn <2523403608@qq.com>	2025-02-20 12:41:17 +08:00

1 2 3 4 5 ...

4746 Commits