Commit Graph

7056 Commits

Author SHA1 Message Date
Cyrus Leung
43faa0461a [Bugfix] Fix hybrid model tests (#17182)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-04-25 15:14:37 -07:00
Daniel Li
48cb2109b6 [V1] Move usage stats to worker and start logging TPU hardware (#16211) 2025-04-25 14:06:01 -06:00
Russell Bryant
a5450f11c9 [Security] Use safe serialization and fix zmq setup for mooncake pipe (#17192)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
Co-authored-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-04-25 16:53:23 +00:00
Cyrus Leung
9d98ab5ec6 [Misc] Inline Molmo requirements (#17190)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-04-25 16:41:44 +00:00
Reid
df5c879527 [doc] update wrong hf model links (#17184)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-04-25 16:40:54 +00:00
Harry Mellor
423e9f1cbe Use Transformers helper get_text_config() instead of checking for text_config (#17105)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-25 08:47:35 -07:00
Harry Mellor
0bd7f8fca5 Bump Transformers to 4.51.3 (#17116)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-25 08:34:34 -07:00
Jasmond L
d5615af9ae [Bugfix] Fix Mistral ChatCompletionRequest Body Exception (#16769)
Signed-off-by: Jasmond Loh <Jasmond.Loh@hotmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-04-25 07:26:30 -07:00
Cyrus Leung
19dcc02a72 [Bugfix] Fix mistral model tests (#17181)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-04-25 06:03:34 -07:00
Alex Brooks
7feae92c1f [Doc] Move todo out of beam search docstring (#17183)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-04-25 04:44:58 -07:00
Michael Yao
f851b84266 [Doc] Add two links to disagg_prefill.md (#17168)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-04-25 10:23:57 +00:00
Lu Fang
fc966e9cc6 Only turn on FastIncrementalDetokenizer when tokenizers >= 0.21.1 (#17158) 2025-04-25 17:10:32 +08:00
Michael Yao
ef19e67d2c [Doc] Add headings to improve gptqmodel.md (#17164)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-04-25 01:13:13 -07:00
rasmith
a41351f363 [Quantization][FP8] Add support for FP8 models with input_scale for output projection and QK quantization (#15734)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
2025-04-25 00:45:02 -07:00
Sangyeon Cho
6aae216b4e [Bugfix] remove fallback in guided_json (int range, patterns) (#16725)
Signed-off-by: csy1204 <josang1204@gmail.com>
Co-authored-by: 조상연[플레이스 AI] <sang-yeon.cho@navercorp.com>
2025-04-25 06:54:43 +00:00
yexin(叶鑫)
b22980a1dc [Perf]Optimize rotary_emb implementation to use Triton operator for improved inference performance (#16457)
Signed-off-by: cynthieye <yexin93@qq.com>
Co-authored-by: MagnetoWang <magnetowang@outlook.com>
2025-04-25 14:52:28 +08:00
Lucas Wilkinson
881f735827 [Misc] Benchmark Serving Script Support Appending Results (#17028)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
2025-04-24 22:53:55 -07:00
Mengqing Cao
2f54045508 [Bugfix][Misc] Use TritonPlaceholderModule to defensively import triton (#15099)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
2025-04-24 22:51:02 -07:00
Lifu Huang
5aa6efb9a5 [Misc] Clean up redundant code in uniproc_executor.py (#16762)
Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>
2025-04-24 22:49:30 -07:00
Harry Mellor
6ca0234478 Move missed SchedulerConfig args into scheduler config group in EngineArgs (#17131)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-24 22:48:53 -07:00
Michael Goin
649818995f [Docs] Fix True->true in supported_models.md (#17141) 2025-04-25 04:20:04 +00:00
Varun Sundar Rabindranath
7a0a9da72b [Doc] V1 : Update LoRA status (#17133)
Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com>
Co-authored-by: varun sundar rabindranath <vsundarr@redhat.com>
2025-04-24 20:17:22 -07:00
Zaida Zhou
69bff9bc89 fix float16 support for kimi-vl (#17156)
Co-authored-by: zhouzaida <zhouzaida@msh.team>
2025-04-24 20:16:32 -07:00
Lucas Wilkinson
41ca7eb491 [Attention] FA3 decode perf improvement - single mma warp group support for head dim 128 (#16864)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
2025-04-24 20:12:21 -07:00
vllmellm
eef364723c [FEAT] [ROCm]: AITER Fused MOE V1 Support (#16752)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-04-25 11:06:50 +08:00
jglaser
0d6e187e88 Use custom address for listening socket (#15988)
Signed-off-by: Jens Glaser <glaserj@ornl.gov>
2025-04-25 01:57:16 +00:00
Michael Goin
9420a1fc30 Better error message for missing mistral params.json (#17132)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-04-24 23:43:08 +00:00
Rui Qiao
583e900996 [Misc] Add example to run DeepSeek with Ray Serve LLM (#17134)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2025-04-24 22:25:21 +00:00
Maximilien de Bayser
05e1fbfc52 Add chat template for Llama 4 models (#16428)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2025-04-24 20:19:36 +00:00
Yinghai Lu
fe92176321 Add collective_rpc to llm engine (#16999)
Signed-off-by: Yinghai Lu <yinghai@thinkingmachines.ai>
2025-04-24 20:16:52 +00:00
Russell Bryant
6d0df0ebeb [Docs] Generate correct github links for decorated functions (#17125)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-04-24 10:39:43 -07:00
Harry Mellor
0fa939e2d1 Improve configs - LoRAConfig + PromptAdapterConfig (#16980)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-24 10:29:34 -07:00
Harry Mellor
0422ce109f Add :markdownhelp: to EngineArgs docs so markdown docstrings render properly (#17124)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-24 10:28:45 -07:00
Eyshika Agarwal
47bdee409c Molmo Requirements (#17026)
Signed-off-by: Eyshika Agarwal <eyshikaengineer@gmail.com>
Signed-off-by: eyshika <eyshikaengineer@gmail.com>
2025-04-24 10:08:37 -07:00
Atilla
49f189439d existing torch installation pip command fix for docs (#17059) 2025-04-24 10:07:21 -07:00
Aaruni Aggarwal
5adf6f6b7f Updating builkite job for IBM Power (#17111)
Signed-off-by: Aaruni Aggarwal <aaruniagg@gmail.com>
2025-04-24 10:06:17 -07:00
Russell Bryant
4115f19958 [CI] Add automation for the tool-calling github label (#17118)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-04-24 09:22:00 -07:00
Mark McLoughlin
340d7b1b21 [V1][Spec Decoding] Add num_drafts and num_accepted_tokens_per_position metrics (#16665)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-04-24 08:57:40 -07:00
Reid
1bcbcbf574 [Misc] refactor example series - structured outputs (#17040)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-04-24 07:49:48 -07:00
Michael Goin
82e43b2d7e Add missing rocm_skinny_gemms kernel test to CI (#17060)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-04-24 07:49:37 -07:00
wang.yuqi
67309a1cb5 [Frontend] Using matryoshka_dimensions control the allowed output dimensions. (#16970) 2025-04-24 07:06:28 -07:00
Shanshan Shen
b724afe343 [V1][Structured Output] Clear xgrammar compiler object when engine core shut down to avoid nanobind leaked warning (#16954)
Signed-off-by: shen-shanshan <467638484@qq.com>
2025-04-24 06:15:03 -07:00
Harry Mellor
21f4f1c9a4 Improve static type checking in LoRAModelRunnerMixin (#17104)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-24 06:14:47 -07:00
Isotr0py
b0c1f6202d [Misc] Remove OLMo2 config copy (#17066)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-04-24 06:14:32 -07:00
Rui Qiao
c0dfd97519 [V1][PP] Optimization: continue scheduling prefill chunks (#17080)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2025-04-24 05:27:08 -07:00
Harry Mellor
a9138e85b1 Fix OOT registration test (#17099)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-24 04:44:12 -07:00
Harry Mellor
0a05ed57e6 Simplify TokenizerGroup (#16790)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-24 04:43:56 -07:00
Michael Goin
14288d1332 Disable enforce_eager for V1 TPU sampler and structured output tests (#17016)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-04-24 02:50:09 -07:00
Woosuk Kwon
b411418ff0 [Chore] Remove Sampler from Model Code (#17084)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-04-24 02:49:33 -07:00
omer-dayan
2bc0f72ae5 Add docs for runai_streamer_sharded (#17093)
Signed-off-by: Omer Dayan (SW-GPU) <omer@run.ai>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-04-24 01:03:21 -07:00