Isotr0py
|
e56bf27741
|
[Bugfix] Fix InternVL2 inference with various num_patches (#8375)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-09-12 10:10:35 -07:00 |
|
Roger Wang
|
520ca380ae
|
[Hotfix][VLM] Fixing max position embeddings for Pixtral (#8399)
|
2024-09-12 09:28:37 -07:00 |
|
youkaichao
|
7de49aa86c
|
[torch.compile] hide slicing under custom op for inductor (#8384)
|
2024-09-12 00:11:55 -07:00 |
|
Woosuk Kwon
|
42ffba11ad
|
[Misc] Use RoPE cache for MRoPE (#8396)
|
2024-09-11 23:13:14 -07:00 |
|
Kevin Lin
|
295c4730a8
|
[Misc] Raise error when using encoder/decoder model with cpu backend (#8355)
|
2024-09-12 05:45:24 +00:00 |
|
Blueyo0
|
1bf2dd9df0
|
[Gemma2] add bitsandbytes support for Gemma2 (#8338)
|
2024-09-11 21:53:12 -07:00 |
|
tomeras91
|
5a60699c45
|
[Bugfix]: Fix the logic for deciding if tool parsing is used (#8366)
|
2024-09-12 03:55:30 +00:00 |
|
Michael Goin
|
b6c75e1cf2
|
Fix the AMD weight loading tests (#8390)
|
2024-09-11 20:35:33 -07:00 |
|
Woosuk Kwon
|
b71c956deb
|
[TPU] Use Ray for default distributed backend (#8389)
|
2024-09-11 20:31:51 -07:00 |
|
youkaichao
|
f842a7aff1
|
[misc] remove engine_use_ray (#8126)
|
2024-09-11 18:23:36 -07:00 |
|
Cody Yu
|
a65cb16067
|
[MISC] Dump model runner inputs when crashing (#8305)
|
2024-09-12 01:12:25 +00:00 |
|
Simon Mo
|
3fd2b0d21c
|
Bump version to v0.6.1 (#8379)
Create Release / Create Release (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.10, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.11, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.12, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.8, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.9, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.10, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.11, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.12, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.8, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.9, 2.4.0) (push) Has been cancelled
v0.6.1
|
2024-09-11 14:42:11 -07:00 |
|
Patrick von Platen
|
d394787e52
|
Pixtral (#8377)
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-09-11 14:41:55 -07:00 |
|
Lily Liu
|
775f00f81e
|
[Speculative Decoding] Test refactor (#8317)
Co-authored-by: youkaichao <youkaichao@126.com>
|
2024-09-11 14:07:34 -07:00 |
|
Aarni Koskela
|
8baa454937
|
[Misc] Move device options to a single place (#8322)
|
2024-09-11 13:25:58 -07:00 |
|
bnellnm
|
73202dbe77
|
[Kernel][Misc] register ops to prevent graph breaks (#6917)
Co-authored-by: Sage Moore <sage@neuralmagic.com>
|
2024-09-11 12:52:19 -07:00 |
|
Cyrus Leung
|
7015417fd4
|
[Bugfix] Add missing attributes in mistral tokenizer (#8364)
|
2024-09-11 11:36:54 -07:00 |
|
Alexey Kondratiev(AMD)
|
aea02f30de
|
[CI/Build] Excluding test_moe.py from AMD Kernels tests for investigation (#8373)
|
2024-09-11 18:31:41 +00:00 |
|
Li, Jiang
|
0b952af458
|
[Hardware][Intel] Support compressed-tensor W8A8 for CPU backend (#7257)
|
2024-09-11 09:46:46 -07:00 |
|
Yang Fan
|
3b7fea770f
|
[Model][VLM] Add Qwen2-VL model support (#7905)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-09-11 09:31:19 -07:00 |
|
Pooya Davoodi
|
cea95dfb94
|
[Frontend] Create ErrorResponse instead of raising exceptions in run_batch (#8347)
|
2024-09-11 05:30:11 +00:00 |
|
Yangshen⚡Deng
|
6a512a00df
|
[model] Support for Llava-Next-Video model (#7559)
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-09-10 22:21:36 -07:00 |
|
Pavani Majety
|
efcf946a15
|
[Hardware][NV] Add support for ModelOpt static scaling checkpoints. (#6112)
|
2024-09-11 00:38:40 -04:00 |
|
Isotr0py
|
1230263e16
|
[Bugfix] Fix InternVL2 vision embeddings process with pipeline parallel (#8299)
|
2024-09-11 10:11:01 +08:00 |
|
Jee Jee Li
|
e497b8aeff
|
[Misc] Skip loading extra bias for Qwen2-MOE GPTQ models (#8329)
|
2024-09-10 20:59:19 -04:00 |
|
Tyler Michael Smith
|
94144e726c
|
[CI/Build][Kernel] Update CUTLASS to 3.5.1 tag (#8043)
|
2024-09-10 23:51:58 +00:00 |
|
William Lin
|
1d5e397aa4
|
[Core/Bugfix] pass VLLM_ATTENTION_BACKEND to ray workers (#8172)
|
2024-09-10 23:46:08 +00:00 |
|
Alexander Matveev
|
22f3a4bc6c
|
[Bugfix] lookahead block table with cuda graph max capture (#8340)
[Bugfix] Ensure multistep lookahead allocation is compatible with cuda graph max capture (#8340)
|
2024-09-10 16:00:35 -07:00 |
|
Cody Yu
|
b1f3e18958
|
[MISC] Keep chunked prefill enabled by default with long context when prefix caching is enabled (#8342)
|
2024-09-10 22:28:28 +00:00 |
|
Prashant Gupta
|
04e7c4e771
|
[Misc] remove peft as dependency for prompt models (#8162)
|
2024-09-10 17:21:56 -04:00 |
|
Kevin Lin
|
5faedf1b62
|
[Spec Decode] Move ops.advance_step to flash attn advance_step (#8224)
|
2024-09-10 13:18:14 -07:00 |
|
sumitd2
|
02751a7a42
|
Fix ppc64le buildkite job (#8309)
|
2024-09-10 12:58:34 -07:00 |
|
Alexey Kondratiev(AMD)
|
f421f3cefb
|
[CI/Build] Enabling kernels tests for AMD, ignoring some of then that fail (#8130)
|
2024-09-10 11:51:15 -07:00 |
|
Cyrus Leung
|
8c054b7a62
|
[Frontend] Clean up type annotations for mistral tokenizer (#8314)
|
2024-09-10 16:49:11 +00:00 |
|
Daniele
|
6234385f4a
|
[CI/Build] enable ccache/scccache for HIP builds (#8327)
|
2024-09-10 08:55:08 -07:00 |
|
Cyrus Leung
|
da1a844e61
|
[Bugfix] Fix missing post_layernorm in CLIP (#8155)
|
2024-09-10 08:22:50 +00:00 |
|
Simon Mo
|
a1d874224d
|
Add NVIDIA Meetup slides, announce AMD meetup, and add contact info (#8319)
|
2024-09-09 23:21:00 -07:00 |
|
Dipika Sikka
|
6cd5e5b07e
|
[Misc] Fused MoE Marlin support for GPTQ (#8217)
|
2024-09-09 23:02:52 -04:00 |
|
Kyle Sayers
|
c7cb5c3335
|
[Misc] GPTQ Activation Ordering (#8135)
|
2024-09-09 16:27:26 -04:00 |
|
Vladislav Kruglikov
|
f9b4a2d415
|
[Bugfix] Correct adapter usage for cohere and jamba (#8292)
|
2024-09-09 11:20:46 -07:00 |
|
Adam Lugowski
|
58fcc8545a
|
[Frontend] Add progress reporting to run_batch.py (#8060)
Co-authored-by: Adam Lugowski <adam.lugowski@parasail.io>
|
2024-09-09 11:16:37 -07:00 |
|
Kyle Mistele
|
08287ef675
|
[Bugfix] Streamed tool calls now more strictly follow OpenAI's format; ensures Vercel AI SDK compatibility (#8272)
|
2024-09-09 10:45:11 -04:00 |
|
Alexander Matveev
|
4ef41b8476
|
[Bugfix] Fix async postprocessor in case of preemption (#8267)
|
2024-09-07 21:01:51 -07:00 |
|
Joe Runde
|
cfe712bf1a
|
[CI/Build] Use python 3.12 in cuda image (#8133)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-09-07 13:03:16 -07:00 |
|
sumitd2
|
b962ee1470
|
ppc64le: Dockerfile fixed, and a script for buildkite (#8026)
|
2024-09-07 11:18:40 -07:00 |
|
Isotr0py
|
36bf8150cc
|
[Model][VLM] Decouple weight loading logic for Paligemma (#8269)
|
2024-09-07 17:45:44 +00:00 |
|
Isotr0py
|
e807125936
|
[Model][VLM] Support multi-images inputs for InternVL2 models (#8201)
|
2024-09-07 16:38:23 +08:00 |
|
Cyrus Leung
|
9f68e00d27
|
[Bugfix] Fix broken OpenAI tensorizer test (#8258)
|
2024-09-07 08:02:39 +00:00 |
|
youkaichao
|
ce2702a923
|
[tpu][misc] fix typo (#8260)
|
2024-09-06 22:40:46 -07:00 |
|
Wei-Sheng Chin
|
795b662cff
|
Enable Random Prefix Caching in Serving Profiling Tool (benchmark_serving.py) (#8241)
|
2024-09-06 20:18:16 -07:00 |
|