Cyrus Leung
|
43faa0461a
|
[Bugfix] Fix hybrid model tests (#17182)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-25 15:14:37 -07:00 |
|
Daniel Li
|
48cb2109b6
|
[V1] Move usage stats to worker and start logging TPU hardware (#16211)
|
2025-04-25 14:06:01 -06:00 |
|
Russell Bryant
|
a5450f11c9
|
[Security] Use safe serialization and fix zmq setup for mooncake pipe (#17192)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
Co-authored-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-04-25 16:53:23 +00:00 |
|
Cyrus Leung
|
9d98ab5ec6
|
[Misc] Inline Molmo requirements (#17190)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-25 16:41:44 +00:00 |
|
Reid
|
df5c879527
|
[doc] update wrong hf model links (#17184)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-25 16:40:54 +00:00 |
|
Harry Mellor
|
423e9f1cbe
|
Use Transformers helper get_text_config() instead of checking for text_config (#17105)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-25 08:47:35 -07:00 |
|
Harry Mellor
|
0bd7f8fca5
|
Bump Transformers to 4.51.3 (#17116)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-25 08:34:34 -07:00 |
|
Jasmond L
|
d5615af9ae
|
[Bugfix] Fix Mistral ChatCompletionRequest Body Exception (#16769)
Signed-off-by: Jasmond Loh <Jasmond.Loh@hotmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-04-25 07:26:30 -07:00 |
|
Cyrus Leung
|
19dcc02a72
|
[Bugfix] Fix mistral model tests (#17181)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-25 06:03:34 -07:00 |
|
Alex Brooks
|
7feae92c1f
|
[Doc] Move todo out of beam search docstring (#17183)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2025-04-25 04:44:58 -07:00 |
|
Michael Yao
|
f851b84266
|
[Doc] Add two links to disagg_prefill.md (#17168)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-04-25 10:23:57 +00:00 |
|
Lu Fang
|
fc966e9cc6
|
Only turn on FastIncrementalDetokenizer when tokenizers >= 0.21.1 (#17158)
|
2025-04-25 17:10:32 +08:00 |
|
Michael Yao
|
ef19e67d2c
|
[Doc] Add headings to improve gptqmodel.md (#17164)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-04-25 01:13:13 -07:00 |
|
rasmith
|
a41351f363
|
[Quantization][FP8] Add support for FP8 models with input_scale for output projection and QK quantization (#15734)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
|
2025-04-25 00:45:02 -07:00 |
|
Sangyeon Cho
|
6aae216b4e
|
[Bugfix] remove fallback in guided_json (int range, patterns) (#16725)
Signed-off-by: csy1204 <josang1204@gmail.com>
Co-authored-by: 조상연[플레이스 AI] <sang-yeon.cho@navercorp.com>
|
2025-04-25 06:54:43 +00:00 |
|
yexin(叶鑫)
|
b22980a1dc
|
[Perf]Optimize rotary_emb implementation to use Triton operator for improved inference performance (#16457)
Signed-off-by: cynthieye <yexin93@qq.com>
Co-authored-by: MagnetoWang <magnetowang@outlook.com>
|
2025-04-25 14:52:28 +08:00 |
|
Lucas Wilkinson
|
881f735827
|
[Misc] Benchmark Serving Script Support Appending Results (#17028)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2025-04-24 22:53:55 -07:00 |
|
Mengqing Cao
|
2f54045508
|
[Bugfix][Misc] Use TritonPlaceholderModule to defensively import triton (#15099)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
|
2025-04-24 22:51:02 -07:00 |
|
Lifu Huang
|
5aa6efb9a5
|
[Misc] Clean up redundant code in uniproc_executor.py (#16762)
Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>
|
2025-04-24 22:49:30 -07:00 |
|
Harry Mellor
|
6ca0234478
|
Move missed SchedulerConfig args into scheduler config group in EngineArgs (#17131)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-24 22:48:53 -07:00 |
|
Michael Goin
|
649818995f
|
[Docs] Fix True->true in supported_models.md (#17141)
|
2025-04-25 04:20:04 +00:00 |
|
Varun Sundar Rabindranath
|
7a0a9da72b
|
[Doc] V1 : Update LoRA status (#17133)
Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com>
Co-authored-by: varun sundar rabindranath <vsundarr@redhat.com>
|
2025-04-24 20:17:22 -07:00 |
|
Zaida Zhou
|
69bff9bc89
|
fix float16 support for kimi-vl (#17156)
Co-authored-by: zhouzaida <zhouzaida@msh.team>
|
2025-04-24 20:16:32 -07:00 |
|
Lucas Wilkinson
|
41ca7eb491
|
[Attention] FA3 decode perf improvement - single mma warp group support for head dim 128 (#16864)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2025-04-24 20:12:21 -07:00 |
|
vllmellm
|
eef364723c
|
[FEAT] [ROCm]: AITER Fused MOE V1 Support (#16752)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-04-25 11:06:50 +08:00 |
|
jglaser
|
0d6e187e88
|
Use custom address for listening socket (#15988)
Signed-off-by: Jens Glaser <glaserj@ornl.gov>
|
2025-04-25 01:57:16 +00:00 |
|
Michael Goin
|
9420a1fc30
|
Better error message for missing mistral params.json (#17132)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-24 23:43:08 +00:00 |
|
Rui Qiao
|
583e900996
|
[Misc] Add example to run DeepSeek with Ray Serve LLM (#17134)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-04-24 22:25:21 +00:00 |
|
Maximilien de Bayser
|
05e1fbfc52
|
Add chat template for Llama 4 models (#16428)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
|
2025-04-24 20:19:36 +00:00 |
|
Yinghai Lu
|
fe92176321
|
Add collective_rpc to llm engine (#16999)
Signed-off-by: Yinghai Lu <yinghai@thinkingmachines.ai>
|
2025-04-24 20:16:52 +00:00 |
|
Russell Bryant
|
6d0df0ebeb
|
[Docs] Generate correct github links for decorated functions (#17125)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-04-24 10:39:43 -07:00 |
|
Harry Mellor
|
0fa939e2d1
|
Improve configs - LoRAConfig + PromptAdapterConfig (#16980)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-24 10:29:34 -07:00 |
|
Harry Mellor
|
0422ce109f
|
Add :markdownhelp: to EngineArgs docs so markdown docstrings render properly (#17124)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-24 10:28:45 -07:00 |
|
Eyshika Agarwal
|
47bdee409c
|
Molmo Requirements (#17026)
Signed-off-by: Eyshika Agarwal <eyshikaengineer@gmail.com>
Signed-off-by: eyshika <eyshikaengineer@gmail.com>
|
2025-04-24 10:08:37 -07:00 |
|
Atilla
|
49f189439d
|
existing torch installation pip command fix for docs (#17059)
|
2025-04-24 10:07:21 -07:00 |
|
Aaruni Aggarwal
|
5adf6f6b7f
|
Updating builkite job for IBM Power (#17111)
Signed-off-by: Aaruni Aggarwal <aaruniagg@gmail.com>
|
2025-04-24 10:06:17 -07:00 |
|
Russell Bryant
|
4115f19958
|
[CI] Add automation for the tool-calling github label (#17118)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-04-24 09:22:00 -07:00 |
|
Mark McLoughlin
|
340d7b1b21
|
[V1][Spec Decoding] Add num_drafts and num_accepted_tokens_per_position metrics (#16665)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-04-24 08:57:40 -07:00 |
|
Reid
|
1bcbcbf574
|
[Misc] refactor example series - structured outputs (#17040)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-24 07:49:48 -07:00 |
|
Michael Goin
|
82e43b2d7e
|
Add missing rocm_skinny_gemms kernel test to CI (#17060)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-24 07:49:37 -07:00 |
|
wang.yuqi
|
67309a1cb5
|
[Frontend] Using matryoshka_dimensions control the allowed output dimensions. (#16970)
|
2025-04-24 07:06:28 -07:00 |
|
Shanshan Shen
|
b724afe343
|
[V1][Structured Output] Clear xgrammar compiler object when engine core shut down to avoid nanobind leaked warning (#16954)
Signed-off-by: shen-shanshan <467638484@qq.com>
|
2025-04-24 06:15:03 -07:00 |
|
Harry Mellor
|
21f4f1c9a4
|
Improve static type checking in LoRAModelRunnerMixin (#17104)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-24 06:14:47 -07:00 |
|
Isotr0py
|
b0c1f6202d
|
[Misc] Remove OLMo2 config copy (#17066)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-04-24 06:14:32 -07:00 |
|
Rui Qiao
|
c0dfd97519
|
[V1][PP] Optimization: continue scheduling prefill chunks (#17080)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-04-24 05:27:08 -07:00 |
|
Harry Mellor
|
a9138e85b1
|
Fix OOT registration test (#17099)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-24 04:44:12 -07:00 |
|
Harry Mellor
|
0a05ed57e6
|
Simplify TokenizerGroup (#16790)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-24 04:43:56 -07:00 |
|
Michael Goin
|
14288d1332
|
Disable enforce_eager for V1 TPU sampler and structured output tests (#17016)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-24 02:50:09 -07:00 |
|
Woosuk Kwon
|
b411418ff0
|
[Chore] Remove Sampler from Model Code (#17084)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-04-24 02:49:33 -07:00 |
|
omer-dayan
|
2bc0f72ae5
|
Add docs for runai_streamer_sharded (#17093)
Signed-off-by: Omer Dayan (SW-GPU) <omer@run.ai>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-04-24 01:03:21 -07:00 |
|