Cody Yu
|
2c97eca1ff
|
[Misc] Validate grammar and fail early (#11119)
|
2024-12-12 18:34:26 +00:00 |
|
Jeff Cook
|
5d712571af
|
[Bugfix] Quick fix to make Pixtral-HF load correctly again after 39e227c7ae. (#11024)
|
2024-12-12 18:09:20 +00:00 |
|
Ramon Ziai
|
d4d5291cc2
|
fix(docs): typo in helm install instructions (#11141)
Signed-off-by: Ramon Ziai <ramon.ziai@bettermarks.com>
|
2024-12-12 17:36:32 +00:00 |
|
Roger Wang
|
4816d20aa4
|
[V1] Fix torch profiling for offline inference (#11125)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2024-12-12 15:51:53 +00:00 |
|
Jiaxin Shan
|
85362f028c
|
[Misc][LoRA] Ensure Lora Adapter requests return adapter name (#11094)
Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-12 09:25:16 +00:00 |
|
youkaichao
|
62de37a38e
|
[core][distributed] initialization from StatelessProcessGroup (#10986)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-12 09:04:19 +00:00 |
|
Sanju C Sudhakaran
|
8195824206
|
[Hardware][Intel-Gaudi] Enable LoRA support for Intel Gaudi (HPU) (#10565)
Signed-off-by: Sanju C Sudhakaran <scsudhakaran@habana.ai>
|
2024-12-12 08:09:28 +00:00 |
|
Woosuk Kwon
|
f092153fbe
|
[V1] Use more persistent buffers to optimize input preparation overheads (#11111)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-12-11 23:14:20 -08:00 |
|
Pooya Davoodi
|
1da8f0e1dd
|
[Model] Add support for embedding model GritLM (#10816)
Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>
|
2024-12-12 06:39:16 +00:00 |
|
Russell Bryant
|
ccede2b264
|
[Core] cleanup zmq ipc sockets on exit (#11115)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-12-11 19:12:24 -08:00 |
|
Yuan Tang
|
24a36d6d5f
|
Update link to LlamaStack remote vLLM guide in serving_with_llamastack.rst (#11112)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
|
2024-12-12 02:39:21 +00:00 |
|
Simon Mo
|
8fb26dac61
|
[Docs] Add media kit (#11121)
|
2024-12-11 17:33:11 -08:00 |
|
Clayton
|
7439a8b5fc
|
[Bugfix] Multiple fixes to tool streaming with hermes and mistral (#10979)
Signed-off-by: cedonley <clayton@donley.io>
|
2024-12-12 01:10:12 +00:00 |
|
Alexander Matveev
|
4e11683368
|
[V1] VLM preprocessor hashing (#11020)
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: Alexander Matveev <alexm@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-12-12 00:55:30 +00:00 |
|
Tyler Michael Smith
|
452a723bf2
|
[V1][Core] Remove should_shutdown to simplify core process termination (#11113)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-12-11 23:34:54 +00:00 |
|
Cyrus Leung
|
d1e21a979b
|
[CI/Build] Split up VLM tests (#11083)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-12 06:18:16 +08:00 |
|
Rui Qiao
|
72ff3a9686
|
[core] Bump ray to use _overlap_gpu_communication in compiled graph tests (#10410)
Signed-off-by: Rui Qiao <ubuntu@ip-172-31-15-128.us-west-2.compute.internal>
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
Co-authored-by: Rui Qiao <ubuntu@ip-172-31-15-128.us-west-2.compute.internal>
|
2024-12-11 11:36:35 -08:00 |
|
youkaichao
|
66aaa7722d
|
[torch.compile] remove graph logging in ci (#11110)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-11 10:59:50 -08:00 |
|
Woosuk Kwon
|
d643c2aba1
|
[V1] Use input_ids as input for text-only models (#11032)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-12-11 10:49:23 -08:00 |
|
youkaichao
|
91642db952
|
[torch.compile] use depyf to dump torch.compile internals (#10972)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-11 10:43:05 -08:00 |
|
bingps
|
fd22220687
|
[Doc] Installed version of llmcompressor for int8/fp8 quantization (#11103)
Signed-off-by: Guangda Liu <bingps@users.noreply.github.com>
Co-authored-by: Guangda Liu <bingps@users.noreply.github.com>
|
2024-12-11 15:43:24 +00:00 |
|
hissu-hyvarinen
|
b2f775456e
|
[CI/Build] Enable prefix caching test for AMD (#11098)
Signed-off-by: Hissu Hyvarinen <hissu.hyvarinen@amd.com>
|
2024-12-11 15:23:37 +00:00 |
|
Cyrus Leung
|
cad5c0a6ed
|
[Doc] Update docs to refer to pooling models (#11093)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-11 13:36:27 +00:00 |
|
Cyrus Leung
|
8f10d5e393
|
[Misc] Split up pooling tasks (#10820)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-11 01:28:00 -08:00 |
|
Rafael Vasquez
|
40766ca1b8
|
[Bugfix]: Clamp -inf logprob values in prompt_logprobs (#11073)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
|
2024-12-11 01:27:39 -08:00 |
|
B-201
|
2e32f5d28d
|
[Bugfix] Fix Idefics3 fails during multi-image inference (#11080)
Signed-off-by: B-201 <Joy25810@foxmail.com>
|
2024-12-11 01:27:07 -08:00 |
|
Russell Bryant
|
61b1d2f6ae
|
[Core] v1: Use atexit to handle engine core client shutdown (#11076)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-12-11 01:26:36 -08:00 |
|
Kevin H. Luu
|
9974fca047
|
[ci/build] Fix entrypoints test and pin outlines version (#11088)
|
2024-12-11 01:01:53 -08:00 |
|
Kevin H. Luu
|
3fb4b4f163
|
[ci/build] Fix AMD CI dependencies (#11087)
|
2024-12-11 00:39:53 -08:00 |
|
Cyrus Leung
|
2e33fe4191
|
[CI/Build] Check transformers v4.47 (#10991)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-11 05:02:02 +00:00 |
|
Maximilien de Bayser
|
e39400a4b6
|
Fix streaming for granite tool call when <|tool_call|> is present (#11069)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
|
2024-12-11 04:51:40 +00:00 |
|
Mor Zusman
|
ffa48c9146
|
[Model] PP support for Mamba-like models (#10992)
Signed-off-by: mzusman <mor.zusmann@gmail.com>
|
2024-12-10 21:53:37 -05:00 |
|
Aurick Qiao
|
d5c5154fcf
|
[Misc] LoRA + Chunked Prefill (#9057)
|
2024-12-11 10:09:20 +08:00 |
|
Tyler Michael Smith
|
9a93973708
|
[Bugfix] Fix Mamba multistep (#11071)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-12-11 00:16:22 +00:00 |
|
Woosuk Kwon
|
134810b3d9
|
[V1][Bugfix] Always set enable_chunked_prefill = True for V1 (#11061)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-12-10 14:41:23 -08:00 |
|
youkaichao
|
75f89dc44c
|
[torch.compile] add a flag to track batchsize statistics (#11059)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-10 12:40:52 -08:00 |
|
Russell Bryant
|
e739194926
|
[Core] Update to outlines >= 0.1.8 (#10576)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-12-10 12:08:16 -08:00 |
|
Flávia Béo
|
250ee65d72
|
[BUG] Remove token param #10921 (#11022)
Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
|
2024-12-10 17:38:15 +00:00 |
|
Joe Runde
|
9b9cef3145
|
[Bugfix] Backport request id validation to v0 (#11036)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-12-10 16:38:23 +00:00 |
|
Jee Jee Li
|
d05f88679b
|
[Misc][LoRA] Add PEFTHelper for LoRA (#11003)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-10 11:12:01 +00:00 |
|
Travis Johnson
|
beb16b2c81
|
[Bugfix] Handle <|tool_call|> token in granite tool parser (#11039)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2024-12-10 10:27:11 +00:00 |
|
Maxime Fournioux
|
fe2e10c71b
|
Add example of helm chart for vllm deployment on k8s (#9199)
Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>
|
2024-12-10 09:19:27 +00:00 |
|
Gene Der Su
|
82c73fd510
|
[Bugfix] cuda error running llama 3.2 (#11047)
|
2024-12-10 07:41:11 +00:00 |
|
Diego Marinho
|
bfd610430c
|
Update README.md (#11034)
|
2024-12-09 23:08:10 -08:00 |
|
Jeff Cook
|
e35879c276
|
[Bugfix] Fix xgrammar failing to read a vocab_size from LlavaConfig on PixtralHF. (#11043)
|
2024-12-10 14:54:22 +08:00 |
|
youkaichao
|
ebf778061d
|
monitor metrics of tokens per step using cudagraph batchsizes (#11031)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-09 22:35:36 -08:00 |
|
Tyler Michael Smith
|
28b3a1c7e5
|
[V1] Multiprocessing Tensor Parallel Support for v1 (#9856)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-12-10 06:28:14 +00:00 |
|
Patrick von Platen
|
bc192a2b09
|
[Pixtral] Improve loading (#11040)
|
2024-12-10 06:09:32 +00:00 |
|
Joe Runde
|
980ad394a8
|
[Frontend] Use request id from header (#10968)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-12-10 13:46:29 +08:00 |
|
Cyrus Leung
|
391d7b2763
|
[Bugfix] Fix usage of deprecated decorator (#11025)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-10 13:45:47 +08:00 |
|