Isotr0py
|
ec2dcd80bc
|
[Misc] Update WeightsMapper for qwen2-vl/qwen2.5-vl (#19054)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-06-03 09:08:20 +00:00 |
|
Jee Jee Li
|
42243fbda0
|
[Doc] Add InternVL LoRA support (#19055)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-06-03 09:08:03 +00:00 |
|
Michael Goin
|
6d18ed2a2e
|
Update docker docs with ARM CUDA cross-compile (#19037)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2025-06-03 08:21:53 +00:00 |
|
Chen Zhang
|
f32fcd9444
|
[v1][KVCacheManager] Rename BlockHashType to BlockHash (#19015)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-06-03 08:01:48 +00:00 |
|
Lu Fang
|
d32aa2e670
|
[Bugfix] Use cmake 3.26.1 instead of 3.26 to avoid build failure (#19019)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-06-03 00:16:17 -07:00 |
|
Michael Goin
|
cc977286e7
|
Reduce logs in CLI scripts and plugin loader (#18970)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-03 06:00:45 +00:00 |
|
Reid
|
17430e3653
|
[bugfix] small fix logic issue (#18999)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-03 05:35:12 +00:00 |
|
汪志鹏
|
1282bd812e
|
Add tarsier model support (#18985)
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
|
2025-06-03 13:13:13 +08:00 |
|
Rui Qiao
|
bdce64f236
|
[V1] Support DP with Ray (#18779)
|
2025-06-02 21:15:13 -07:00 |
|
Gregory Shtrasberg
|
9e6f61e8c3
|
[ROCm][Build] Clean up the ROCm build (#19040)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-06-02 20:47:47 -07:00 |
|
Li, Jiang
|
8655f47f37
|
[CPU][CI] Re-enable the CPU CI tests (#19046)
Signed-off-by: jiang.li <jiang1.li@intel.com>
|
2025-06-02 20:46:47 -07:00 |
|
Concurrensee
|
4ce42f9204
|
Adding "LoRA Test %N" to AMD production tests (#18929)
Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu>
|
2025-06-02 20:46:44 -07:00 |
|
Tyler Michael Smith
|
8a57872b2a
|
[Bugfix][EP+DP] Use pplx-kernel internode instead of intranode (#19034)
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-06-03 11:36:51 +08:00 |
|
Hyogeun Oh (오효근)
|
5bc1ad6cee
|
[Doc] Remove duplicate TOCs during MkDocs migration (#19021)
Signed-off-by: Zerohertz <ohg3417@gmail.com>
|
2025-06-02 19:49:48 -07:00 |
|
Siyuan Liu
|
9112b443a0
|
[Hardware][TPU] Initial support of model parallelism with single worker using SPMD (#18011)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
Co-authored-by: Hossein Sarshar <hossein.sarshar@gmail.com>
Co-authored-by: Chengji Yao <chengjiyao@google.com>
|
2025-06-03 00:06:20 +00:00 |
|
Calvin Chen
|
c57d577e8d
|
add an absolute path for run.sh (#18258)
Signed-off-by: calvin chen <120380290@qq.com>
|
2025-06-02 19:38:23 +00:00 |
|
Gregory Shtrasberg
|
ca2f6b9c30
|
[Bugfix][Model] Attempt to fix eagle in V0. (#18978)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-06-02 08:15:53 -07:00 |
|
Frαnçois
|
20133cfee2
|
[Frontend] enable custom logging for the uvicorn server (OpenAI API server) (#18403)
Signed-off-by: François Paupier <francois.paupier@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-06-02 15:04:23 +00:00 |
|
jennyyyyzhen
|
ebb1ec9318
|
[Model] enable data parallel for Llama4 vision encoder (#18368)
Signed-off-by: yzhen <yzhen@devgpu093.cco2.facebook.com>
Co-authored-by: yZhen <yZhen@fb.com>
Co-authored-by: yzhen <yzhen@devgpu093.cco2.facebook.com>
|
2025-06-02 19:22:54 +08:00 |
|
Reid
|
5b168b6d7a
|
[doc] add pytest tips (#19010)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-02 11:07:26 +00:00 |
|
22quinn
|
9760fd8f6a
|
[Core] Support inplace model weights loading (#18745)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-06-02 17:38:50 +08:00 |
|
Robert Shaw
|
b9f61e1387
|
[Bugfix][Nixl] Fix DP Metadata Handshake (#19008)
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-06-02 03:30:41 +00:00 |
|
zhrrr
|
d6fd3a33b8
|
[Misc] reuse num_tokens_across_dp of get_dp_padding to avoid unnecessary dp all reduce in set_forward_context (#18935)
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
|
2025-06-01 19:41:18 +00:00 |
|
Reid
|
432ec9926e
|
[doc] wrong output (#19000)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-01 11:26:14 +00:00 |
|
Nick Hill
|
2b102d51ad
|
[BugFix] Fix incorrect metrics shutdown error log message (#18992)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-06-01 11:42:23 +08:00 |
|
rongfu.leng
|
aa54a7bf7b
|
[BugFix] fix data parallel construct ipv6 url addres (#18991)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-06-01 11:42:10 +08:00 |
|
Michael Goin
|
2ad6194a02
|
Let max_num_batched_tokens use human_readable_int for large numbers (#18968)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-01 11:41:29 +08:00 |
|
Reid
|
c594cbf565
|
[doc] small fix - mkdocs (#18996)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-31 20:23:43 -07:00 |
|
Isotr0py
|
a35ca765a5
|
[LoRA] Support dynamically initialize packed_modules_mapping for VLM with arbitrary components (#18987)
Signed-off-by: isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-06-01 11:06:57 +08:00 |
|
Cyrus Leung
|
6aa8f9a4e7
|
[Core] Rework dtype resolution (#18751)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-06-01 11:04:23 +08:00 |
|
Benjamin Chislett
|
1bc86a3da1
|
[Bugfix] Fix EAGLE3 broken logits (#18909)
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
|
2025-05-31 19:58:07 -07:00 |
|
Ekagra Ranjan
|
bbfa0c61d1
|
[Misc][Benchmark] Add support for CustomDataset (#18511)
|
2025-05-31 19:07:38 +00:00 |
|
Reid
|
20079c6e36
|
[Misc] add return token strs for tokenize (#18941)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-31 18:00:11 +00:00 |
|
Nick Hill
|
9a1b9b99d7
|
[BugFix] Fix multi-node offline data-parallel (#18981)
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Yizhou Liu <liu_yizhou@outlook.com>
|
2025-05-31 08:34:52 -07:00 |
|
ptarasiewiczNV
|
8bf507d766
|
[P/D] NixlConnector use cache device index for memory registration (#18969)
Signed-off-by: Piotr Tarasiewicz <ptarasiewicz@nvidia.com>
|
2025-05-31 11:19:18 -04:00 |
|
Charlie Fu
|
306d60401d
|
[ROCm][Kernel] Add gfx950 support for skinny gemms (#18010)
Signed-off-by: charlifu <charlifu@amd.com>
|
2025-05-31 07:40:05 -07:00 |
|
Fred Reiss
|
f2c3f66d59
|
[Bugfix] Fix for issue 17396 (#18773)
Signed-off-by: Fred Reiss <frreiss@us.ibm.com>
|
2025-05-31 11:58:17 +00:00 |
|
vllmellm
|
0f5e0d567e
|
[FEAT][ROCm] Add AITER grouped topk for DeepSeekV2 (#18825)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-05-31 03:39:31 -07:00 |
|
Luka Govedič
|
c55d804672
|
[BugFix] Pydantic part 2 (#18911)
Signed-off-by: luka <luka@neuralmagic.com>
|
2025-05-31 03:39:28 -07:00 |
|
Reid
|
749f5bdd38
|
[doc] fix the list rendering issue - security.md (#18982)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-31 10:39:21 +00:00 |
|
Satyajith Chilappagari
|
2a50ef5760
|
[Neuron] Add Multi-Modal model support for Neuron (#18921)
Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>
Co-authored-by: Ashraf Mahgoub <ashymahg@amazon.com>
Co-authored-by: Rohith Nallamaddi <nalrohit@amazon.com>
Co-authored-by: FeliciaLuo <luof@amazon.com>
Co-authored-by: Elaine Zhao <elaineyz@amazon.com>
|
2025-05-31 10:39:11 +00:00 |
|
Lucia Fang
|
b8b904795d
|
fix security issue of logging llm output (#18980)
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>
|
2025-05-31 10:38:56 +00:00 |
|
Chauncey
|
ba5111f237
|
[Bugfix]: Fix the incompatibility issue with Structured Outputs when Thinking is disabled (#18879)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-05-31 09:20:54 +00:00 |
|
Yong Hoon Shin
|
1e123529d7
|
[Misc] Fix estimated max model len msg (#18966)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-05-31 16:43:44 +08:00 |
|
Pooya Davoodi
|
dff80b0e42
|
[Frontend] Add rerank support to run_batch endpoint (#16278)
Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>
|
2025-05-31 07:40:01 +00:00 |
|
Yu Guo
|
7782464a17
|
create util function for batched arange (#18937)
|
2025-05-31 13:50:38 +08:00 |
|
Lukas Geiger
|
0f71e24034
|
[Docs] Correct multiprocessing design doc (#18964)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-05-31 01:30:15 +00:00 |
|
Will Eaton
|
1dab4d5718
|
Tool parser regex timeout handling (#18960)
Signed-off-by: Will Eaton <weaton@redhat.com>
|
2025-05-30 21:02:54 +00:00 |
|
rongfu.leng
|
7f21e8052b
|
[Misc] add group_size is -1 in awq quantization (#18910)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-05-30 17:34:22 +00:00 |
|
Isotr0py
|
5a8641638a
|
[VLM] Add PP support and fix GPTQ inference for Ovis models (#18958)
Signed-off-by: isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-05-30 17:11:44 +00:00 |
|