rongfu.leng
|
242a637aea
|
[Model] use AutoWeightsLoader for stablelm,starcoder2,zamba2 (#16103)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-04-06 05:52:01 -07:00 |
|
Isotr0py
|
c2a9671510
|
[Misc] Improve model redirect to accept json dictionary (#16119)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-04-06 05:51:45 -07:00 |
|
Paul Schweigert
|
d5ae4f7f42
|
[Doc][Bugfix] Add missing EOF in k8s deploy doc (#16025)
|
2025-04-06 12:10:57 +00:00 |
|
Reid
|
b6c502a150
|
[Misc] refactor example eagle (#16100)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-06 09:42:48 +00:00 |
|
Roger Wang
|
9ca710e525
|
[CI][V1] Fix passing tokenizer as kwarg to validate_guidance_grammar (#16117)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-04-06 16:18:00 +08:00 |
|
Ben Jackson
|
eb07c8cb5b
|
[Frontend] Fix typo in tool chat templates for llama3.2 and toolace (#14501)
Signed-off-by: Ben Jackson <ben@ben.com>
|
2025-04-06 07:44:36 +00:00 |
|
Hyesoo Yang
|
ba10801961
|
[Benchmark] Add sampling parameters to benchmark_serving. (#16022)
Signed-off-by: Hyesoo Yang <hyeygit@gmail.com>
|
2025-04-06 12:30:35 +08:00 |
|
Lucia Fang
|
620fc2d09e
|
[Model] fix model testing for TeleChat2ForCausalLM and V0 llama4 (#16112)
Signed-off-by: Lu Fang <fanglu@fb.com>
|
2025-04-05 21:23:40 -07:00 |
|
Jonghyun Choe
|
29283eaa7e
|
[Model] use AutoWeightsLoader for phi, gemma, deepseek (#16088)
Signed-off-by: Jonghyun Choe <andy.choe729@gmail.com>
|
2025-04-05 20:34:38 -07:00 |
|
Jinzhen Lin
|
2fa66ef713
|
[Bugfix] fix use_atomic_add support of marlin kernel when using v1 engine (#15946)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
|
2025-04-05 20:04:22 -07:00 |
|
Chauncey
|
13affc432d
|
[Misc] Remove redundant code (#16098)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-04-05 20:03:50 -07:00 |
|
Reid
|
d8f094a92a
|
[Misc] format output for encoder_decoder.py (#16095)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-05 19:57:18 -07:00 |
|
Harry Mellor
|
97ae6d777f
|
Fix some capitalisations in generated examples doc titles (#16094)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-05 13:44:03 +00:00 |
|
yihong
|
6baeee70d1
|
Revert "doc: add info for macos clang errors (#16049)" (#16091)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
|
2025-04-05 11:51:51 +00:00 |
|
Reid
|
d2517a4939
|
[doc] fix 404 (#16082)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-05 11:39:18 +00:00 |
|
yihong
|
6342adc438
|
fix: support clang17 for macos and fix the real libomp (#16086)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
|
2025-04-05 11:00:12 +00:00 |
|
Kevin H. Luu
|
0adba91547
|
[CI] Fix benchmark script level (#16089)
|
2025-04-05 03:36:01 -07:00 |
|
Tristan Leclercq
|
4285e423a6
|
[Misc] Auto detect bitsandbytes pre-quantized models (#16027)
Signed-off-by: Tristan Leclercq <tristanleclercq@gmail.com>
|
2025-04-04 23:30:45 -07:00 |
|
Woosuk Kwon
|
63375f0cdb
|
[V1][Spec Decode] Update N-gram Proposer Interface (#15750)
Create Release / Create Release (push) Has been cancelled
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
v0.8.3rc1
|
2025-04-04 16:32:54 -07:00 |
|
Michael Goin
|
70ad3f9e98
|
[Bugfix][TPU] Fix V1 TPU worker for sliding window (#16059)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
|
2025-04-04 23:31:19 +00:00 |
|
bnellnm
|
d6fc629f4d
|
[Kernel][Minor] Re-fuse triton moe weight application (#16071)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-04-04 23:27:34 +00:00 |
|
Roger Wang
|
af51d80fa1
|
Revert "[V1] Scatter and gather placeholders in the model runner" (#16075)
|
2025-04-04 14:50:57 -07:00 |
|
Cyrus Leung
|
f5722a5052
|
[V1] Scatter and gather placeholders in the model runner (#15712)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2025-04-04 21:26:44 +00:00 |
|
Nick Hill
|
651cf0fec1
|
[V1] DP scale-out (1/N): Use zmq ROUTER/DEALER sockets for input queue (#15906)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-04-04 12:56:43 -07:00 |
|
Kevin H. Luu
|
4dc52e1c53
|
[CI] Reorganize .buildkite directory (#16001)
Signed-off-by: kevin <kevin@anyscale.com>
|
2025-04-04 12:16:20 -07:00 |
|
Michael Goin
|
4708f13a9c
|
[Bugfix] Fix default behavior/fallback for pp in v1 (#16057)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-04 17:58:08 +00:00 |
|
Gregory Shtrasberg
|
a6d042df0a
|
[ROCm][Bugfix] Bring back fallback to eager mode removed in #14917, but for ROCm only (#15413)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-04-04 09:40:37 -07:00 |
|
Gregory Shtrasberg
|
40a36ccfeb
|
[ROCm][Bugfix] Use platform specific FP8 dtype (#15717)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-04-04 09:40:20 -07:00 |
|
Ilya Markov
|
ef608c37a7
|
[Distributed] [ROCM] Fix custom allreduce enable checks (#16010)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
|
2025-04-04 09:39:08 -07:00 |
|
Li, Jiang
|
2386803f2a
|
[CPU] Change default block_size for CPU backend (#16002)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-04-04 09:39:05 -07:00 |
|
Ziji Shi (Steven)
|
95862f7b4d
|
[Benchmark][Doc] Update throughput benchmark and README (#15998)
Signed-off-by: StevenShi-23 <shi.ziji.sm@gmail.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2025-04-04 09:39:02 -07:00 |
|
Isotr0py
|
230b131b54
|
[Bugfix][kernels] Fix half2float conversion in gguf kernels (#15995)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-04-04 09:38:58 -07:00 |
|
liuzhenwei
|
0812d8dd41
|
[Hardware][Gaudi][BugFix] fix arguments of hpu fused moe (#15945)
Signed-off-by: zhenwei <zhenweiliu@habana.ai>
|
2025-04-04 09:38:55 -07:00 |
|
Jonghyun Choe
|
bf7e3c51ae
|
[Model] use AutoWeightsLoader for baichuan, gpt-neox, mpt (#15939)
Signed-off-by: Jonghyun Choe <andy.choe729@gmail.com>
|
2025-04-04 09:38:52 -07:00 |
|
Mark McLoughlin
|
a35a8a8392
|
[V1][Spec Decode] Avoid logging useless nan metrics (#16023)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-04-04 08:52:41 -07:00 |
|
yihong
|
4ef0bb1fcf
|
doc: add info for macos clang errors (#16049)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
|
2025-04-04 14:58:16 +00:00 |
|
Chengji Yao
|
fadc59c0e6
|
[TPU][V1] Remove ragged attention kernel parameter hard coding (#16041)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-04-04 07:48:50 -04:00 |
|
Reid
|
86cbd2eee9
|
[Misc] improve gguf check (#15974)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-04 01:33:36 +00:00 |
|
Huy Do
|
092475f738
|
[ROCm] Tweak the benchmark script to run on ROCm (#14252)
|
2025-04-03 17:12:48 -07:00 |
|
bnellnm
|
dcc56d62da
|
[Bugfix] Fix function names in test_block_fp8.py (#16033)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-04-03 23:01:34 +00:00 |
|
Robert Shaw
|
f15e70d906
|
[TPU] Switch Test to Non-Sliding Window (#15981)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2025-04-03 14:28:45 -07:00 |
|
iefgnoix
|
b6be6f8d1e
|
[TPU] Support sliding window and logit soft capping in the paged attention kernel for TPU. (#15732)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
|
2025-04-03 14:23:28 -07:00 |
|
Alexei-V-Ivanov-AMD
|
03a70eacaf
|
Re-enable the AMD Testing for the passing tests. (#15586)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
|
2025-04-03 11:05:17 -07:00 |
|
yarongmu-google
|
45b1ff7a25
|
[Misc][Performance] Advance tpu.txt to the most recent nightly torch … (#16024)
|
2025-04-03 17:32:54 +00:00 |
|
bnellnm
|
15ba07ef25
|
[Minor] Fused experts refactor (#15914)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-04-03 10:19:38 -07:00 |
|
Liangfu Chen
|
d2b58ca203
|
[Neuron][kernel] Fuse kv cache into a single tensor (#15911)
Signed-off-by: Liangfu Chen <liangfc@amazon.com>
|
2025-04-03 09:51:32 -07:00 |
|
Kyle Sayers
|
82e7e19a6e
|
[SupportsQuant] Chameleon, Chatglm, Commandr (#15952)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2025-04-03 08:25:22 -07:00 |
|
Kyle Sayers
|
421c462948
|
[SupportsQuant] Bert, Blip, Blip2, Bloom (#15573)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2025-04-03 08:23:19 -07:00 |
|
yihong
|
84884cd9ac
|
fix: tiny fix make format.sh excutable (#16015)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
|
2025-04-03 15:18:05 +00:00 |
|
Reid
|
a43aa183dc
|
[doc] update contribution link (#15922)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-03 10:47:31 +00:00 |
|