Cyrus Leung
|
6879cd80ae
|
[Refactor] Pass tokenizer explicitly instead of binding to prompt update (#23542)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-25 06:31:57 -07:00 |
|
Cyrus Leung
|
e269be2ba2
|
[Doc] Add caution for API server scale-out (#23550)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-25 06:14:15 -07:00 |
|
Ayush Satyam
|
5c4b6e66fe
|
[Attention] Unify mamba and attention backend selection (#23171)
Signed-off-by: Ayush Satyam <ayushsatyam146@gmail.com>
|
2025-08-25 09:09:36 +00:00 |
|
youkaichao
|
d0a4a3f645
|
[misc] add shanghai meetup (#23535)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-08-25 17:00:03 +08:00 |
|
Cyrus Leung
|
ebafb0936d
|
[Bugfix] Allow dynamic number of patches for llava_onevision (#23525)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-25 08:34:54 +00:00 |
|
Breno Baldas Skuk
|
0cb7b065c3
|
Feature/benchmark/random mm data/images (#23119)
Signed-off-by: breno.skuk <breno.skuk@hcompany.ai>
|
2025-08-25 01:28:35 -07:00 |
|
ZiTian Zhao
|
2da02dd0d8
|
[Fix] DeepSeek V3.1 tool parser error message (#23492)
Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com>
|
2025-08-25 00:56:39 -07:00 |
|
Chenguang Zheng
|
d765cf01fe
|
[Core][Multimodal] Track encode cache entries by mm_hash and enable embedding sharing between requests (#22711)
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-08-25 00:41:17 -07:00 |
|
Cyrus Leung
|
712d0f88d8
|
[Refactor] Dynamic target and content for prompt updates (#23411)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-24 23:39:58 -07:00 |
|
Yu Guo
|
49ab23b3cc
|
[gpt-oss] use reasoning channel for reasoning text in serving_chat (#22920)
Signed-off-by: Yu Guo <yuguo@meta.com>
|
2025-08-25 06:29:34 +00:00 |
|
LIYIFAN_liyifan
|
c9abb10489
|
[Bugfix] Fix Dense module loading for sentence-transformers embedding models (simplified V2) (#23408)
Signed-off-by: FFFfff1FFFfff <yifanli0919@gmail.com>
|
2025-08-25 05:39:24 +00:00 |
|
Benji Beck
|
787cdb3829
|
Migrate DonutImagePixelInputs to TensorSchema (#23509)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-08-25 05:02:15 +00:00 |
|
Benji Beck
|
a5203d04df
|
Migrate skyworkr1v inputs to TensorSchema (#23499)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-08-25 04:43:21 +00:00 |
|
Benji Beck
|
99f8094400
|
Migrate tarsier inputs to TensorSchema (#23500)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-08-25 04:42:36 +00:00 |
|
Jee Jee Li
|
170e8ea9ea
|
[Misc] Unified linear print info (#23516)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-24 20:13:51 -07:00 |
|
zifeitong
|
a71e4765cc
|
[Bugfix] Fix Qwen2.5-VL quantized model weights loading (#23512)
Signed-off-by: Zifei Tong <zifeitong@gmail.com>
|
2025-08-25 10:40:22 +08:00 |
|
Noam Gat
|
39971db3aa
|
Frontend: Adding LM Format Enforcer support to V1 engine (#22564)
Signed-off-by: Noam Gat <noamgat@gmail.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-08-24 19:31:22 -07:00 |
|
Ming Yang
|
504d914314
|
[Perf] Add Triton config for DeepSeek V3 FP8 EP32 H200 (#23504)
Signed-off-by: Ming Yang <minos.future@gmail.com>
|
2025-08-24 18:06:35 -07:00 |
|
Didier Durand
|
47455c424f
|
[Doc: ]fix various typos in multiple files (#23487)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-25 00:04:04 +00:00 |
|
Lucia Fang
|
c7fc6b1354
|
fix incompatibililty with non cuda platform for nvfp4 (#23478)
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>
|
2025-08-24 15:35:41 -07:00 |
|
Woosuk Kwon
|
ad78868450
|
[Misc] Remove unused slot_mapping buffer (#23502)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-24 14:03:36 -07:00 |
|
Cyrus Leung
|
e2db1164a1
|
[Model] Enable BLOOM on V1 (#23488)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-24 13:30:47 +00:00 |
|
汪志鹏
|
416f05929a
|
[New Model]Donut model (#23229)
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
|
2025-08-24 12:52:24 +00:00 |
|
TeeKen Lau
|
5e021b4981
|
(Misc): add missing test for zero truncation size. (#23457)
Signed-off-by: teekenl <teekenlau@gmail.com>
|
2025-08-24 18:12:47 +08:00 |
|
rongfu.leng
|
1b9b16649c
|
[Misc] update dict parse to EPLBConfig from json dumps to dict unpacking (#23305)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-08-24 08:06:34 +00:00 |
|
czhu-cohere
|
e76e233540
|
[kernel] Support W4A8 on Hopper (#23198)
Signed-off-by: czhu-cohere <conway.zhu@cohere.com>
|
2025-08-24 06:18:04 +00:00 |
|
Benji Beck
|
a75277285b
|
Migrate Paligemma inputs to TensorSchema (#23470)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-08-24 04:56:56 +00:00 |
|
22quinn
|
9dc30b7068
|
[Bugfix] Add strong reference to CUDA pluggable allocator callbacks (#23477)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Eric Marcus <eric.marcus@kaiko.ai>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-08-24 12:56:17 +08:00 |
|
Benji Beck
|
053278a5dc
|
Migrate Pixtral inputs to TensorSchema (#23472)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-08-24 04:55:53 +00:00 |
|
Jiangyun Zhu
|
c55c028998
|
[gpt-oss] Streaming Output for Python Tool (#23409)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2025-08-24 04:42:38 +00:00 |
|
Jee Jee Li
|
65197a5fb3
|
[Misc] Modify CacheConfig import (#23459)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-23 06:05:27 +00:00 |
|
Xu Wenqing
|
b8f17f5d98
|
Support DeepSeek-V3.1 tool call (#23454)
Signed-off-by: Xu Wenqing <xuwq1993@qq.com>
|
2025-08-23 05:50:16 +00:00 |
|
Aziz
|
d9a55204ba
|
fix(tests): Correct unreachable assertion in truncation test (#23425)
Signed-off-by: AzizCode92 <azizbenothman76@gmail.com>
|
2025-08-23 05:23:54 +00:00 |
|
Cyrus Leung
|
b4e9fd811f
|
Revert "[PERF] Use faster way of decode in tokenizer: avoid useless list-to-list conversion (#20000)" (#23396)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-23 04:16:48 +00:00 |
|
Chenxi Yang
|
308fa287a8
|
Add glm4.5v tp2,4 fp8 config on H100_80GB (#23443)
Co-authored-by: Chenxi Yang <cxyang@meta.com>
|
2025-08-23 02:54:19 +00:00 |
|
Daifeng Li
|
fa78de9dc3
|
Quantization: support FP4 quantized models on AMD CDNA2/CDNA3 GPUs (#22527)
Signed-off-by: feng <fengli1702@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-08-22 20:53:21 -06:00 |
|
Michael Goin
|
f6818a92cb
|
[UX] Move Dockerfile DeepGEMM install to tools/install_deepgemm.sh (#23360)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-22 20:52:50 -06:00 |
|
WeiQing Chen
|
23c939fd30
|
[Model] Support DP for ViT on MiniCPM-V-4 (#23327)
Signed-off-by: ycyaw66 <497410282@qq.com>
Co-authored-by: ycyaw66 <497410282@qq.com>
|
2025-08-23 02:14:41 +00:00 |
|
Nick Hill
|
add1adfec7
|
[BugFix] Fix MinPLogitsProcessor.update_states() (#23401)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-08-23 08:22:11 +08:00 |
|
Nick Hill
|
c80c53a30f
|
[BugFix] Fix batch updates for pooling models (#23398)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-08-23 08:20:41 +08:00 |
|
elvischenv
|
24d0c9e6ed
|
[NVIDIA][torch.compile] Support Flashinfer TRTLLM FP8-q/kv NVFP4-out Attention Kernel (#22703)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-08-22 22:09:05 +00:00 |
|
rasmith
|
cc7ae5e7ca
|
[BugFix][AMD][Quantization] Fix torch.compile issue where wvSplitKQ not being called when it should when using quantized FP8 model (#22281)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2025-08-22 21:47:57 +00:00 |
|
Ilya Markov
|
0313cf854d
|
[PERF] PyTorch Symmetric Memory All-Reduce (#20759)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-08-22 15:39:08 -06:00 |
|
Zhewen Li
|
0483fabc74
|
[CI/Build] add EP dependencies to docker (#21976)
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2025-08-22 13:34:40 -07:00 |
|
Shiyan Deng
|
da65bec309
|
add an env var for path to pre-downloaded flashinfer cubin files (#22675)
|
2025-08-22 19:25:45 +00:00 |
|
Isotr0py
|
4645024d3a
|
[Quantization] Allow GGUF quantization to skip unquantized layer (#23188)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-08-22 13:04:22 -06:00 |
|
Isotr0py
|
cd7a3df26f
|
[Bugfix] Fix broken Florence-2 model (#23426)
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com>
|
2025-08-22 17:50:52 +00:00 |
|
Isotr0py
|
32d2b4064f
|
[Model] Add Ovis2.5 PP support (#23405)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-08-22 17:46:34 +00:00 |
|
Didier Durand
|
22cf679aad
|
[Doc]: fix various typos in multiple files (#23179)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-08-22 10:38:46 -07:00 |
|
Yong Hoon Shin
|
b6d7d34fc6
|
Add unit tests for batched guided and non-guided requests (#23389)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-08-22 10:31:24 -07:00 |
|