Commit Graph

2057 Commits

Author SHA1 Message Date
科英
67a6882da4 [Misc] SpecDecodeWorker supports profiling (#9719)
Signed-off-by: Abatom <abatom@163.com>
2024-10-27 04:18:03 +00:00
kakao-kevin-us
6650e6a930 [Model] Add classification Task with Qwen2ForSequenceClassification (#9704)
Signed-off-by: Kevin-Yang <ykcha9@gmail.com>
Co-authored-by: Kevin-Yang <ykcha9@gmail.com>
2024-10-26 17:53:35 +00:00
Vasiliy Alekseev
07e981fdf4 [Frontend] Bad words sampling parameter (#9717)
Signed-off-by: Vasily Alexeev <alvasian@yandex.ru>
2024-10-26 16:29:38 +00:00
ErkinSagiroglu
55137e8ee3 Fix: MI100 Support By Bypassing Custom Paged Attention (#9560) 2024-10-26 12:12:57 +00:00
Mengqing Cao
5cbdccd151 [Hardware][openvino] is_openvino --> current_platform.is_openvino (#9716) 2024-10-26 10:59:06 +00:00
Sam Stoelinga
067e77f9a8 [Bugfix] Steaming continuous_usage_stats default to False (#9709)
Signed-off-by: Sam Stoelinga <sammiestoel@gmail.com>
2024-10-26 05:05:47 +00:00
Travis Johnson
6567e13724 [Bugfix] Fix crash with llama 3.2 vision models and guided decoding (#9631)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: pavlo-ruban <pavlo.ruban@servicenow.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2024-10-25 15:42:56 -07:00
Michael Goin
ca0d92227e [Bugfix] Fix compressed_tensors_moe bad config.strategy (#9677) 2024-10-25 12:40:33 -07:00
Woosuk Kwon
9645b9f646 [V1] Support sliding window attention (#9679)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-10-24 22:20:37 -07:00
Will Johnson
a6f3721861 [Model] add a lora module for granite 3.0 MoE models (#9673) 2024-10-24 22:00:17 -07:00
Michael Goin
c91ed47c43 [Bugfix] Remove xformers requirement for Pixtral (#9597)
Signed-off-by: mgoin <michael@neuralmagic.com>
2024-10-24 15:38:05 -07:00
Charlie Fu
59449095ab [Performance][Kernel] Fused_moe Performance Improvement (#9384)
Signed-off-by: charlifu <charlifu@amd.com>
2024-10-24 15:37:52 -07:00
Michael Goin
e26d37a185 [Log][Bugfix] Fix default value check for image_url.detail (#9663) 2024-10-24 10:44:38 -07:00
Alex Brooks
722d46edb9 [Model] Compute Llava Next Max Tokens / Dummy Data From Gridpoints (#9650)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2024-10-24 10:42:24 -07:00
Yongzao
d27cfbf791 [torch.compile] Adding torch compile annotations to some models (#9641)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2024-10-24 09:31:42 -07:00
litianjian
f58454968f [Bugfix]Disable the post_norm layer of the vision encoder for LLaVA models (#9653) 2024-10-24 07:52:07 -07:00
Yongzao
ad6f78053e [torch.compile] expanding support and fix allgather compilation (#9637)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2024-10-24 01:32:15 -07:00
Jee Jee Li
295a061fb3 [Kernel] add kernel for FATReLU (#9610)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-10-24 16:18:27 +08:00
Yongzao
8a02cd045a [torch.compile] Adding torch compile annotations to some models (#9639)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2024-10-24 00:54:57 -07:00
youkaichao
4fdc581f9e [core] simplify seq group code (#9569)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2024-10-24 00:16:44 -07:00
Woosuk Kwon
3770071eb4 [V1][Bugfix] Clean up requests when aborted (#9629)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-10-23 23:33:22 -07:00
Cyrus Leung
836e8ef6ee [Bugfix] Fix PP for ChatGLM and Molmo (#9422) 2024-10-24 06:12:05 +00:00
Yan Ma
056a68c7db [XPU] avoid triton import for xpu (#9440)
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-10-24 05:14:00 +00:00
Vinay R Damodaran
33bab41060 [Bugfix]: Make chat content text allow type content (#9358)
Signed-off-by: Vinay Damodaran <vrdn@hey.com>
2024-10-24 05:05:49 +00:00
Michael Goin
b7df53cd42 [Bugfix] Use "vision_model" prefix for MllamaVisionModel (#9628)
Signed-off-by: mgoin <michael@neuralmagic.com>
2024-10-24 10:07:44 +08:00
Michael Goin
bb01f2915e [Bugfix][Model] Fix Mllama SDPA illegal memory access for batched multi-image (#9626)
Signed-off-by: mgoin <michael@neuralmagic.com>
2024-10-24 10:03:44 +08:00
Yunfei Chu
fc6c274626 [Model] Add Qwen2-Audio model support (#9248)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-10-23 17:54:22 +00:00
Alex Brooks
150b779081 [Frontend] Enable Online Multi-image Support for MLlama (#9393)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-10-23 17:28:57 +00:00
Yongzao
9013e24f7b [torch.compile] Adding torch compile annotations to some models (#9614) 2024-10-23 10:07:48 -07:00
Tyler Michael Smith
e5ac6a4199 [Bugfix] Fix divide by zero when serving Mamba models (#9617)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
2024-10-23 16:40:43 +00:00
youkaichao
dbdd3b5e5a [misc] comment to avoid future confusion about baichuan (#9620)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-10-23 09:14:44 -07:00
Cyrus Leung
e7116c017c [Bugfix] Fix _init_vision_model in NVLM_D model (#9611)
Co-authored-by: Isotr0py <2037008807@qq.com>
2024-10-23 14:09:04 +00:00
Alex Brooks
31a08f5bd2 [Model] Add min_pixels / max_pixels to Qwen2VL as mm_processor_kwargs (#9612)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2024-10-23 14:05:18 +00:00
Cyrus Leung
c18e1a3418 [VLM] Enable overriding whether post layernorm is used in vision encoder + fix quant args (#9217)
Co-authored-by: Isotr0py <2037008807@qq.com>
2024-10-23 11:27:37 +00:00
Isotr0py
3ff57ebfca [Model] Initialize Florence-2 language backbone support (#9555) 2024-10-23 10:42:47 +00:00
Mengqing Cao
2394962d70 [Hardware][XPU] using current_platform.is_xpu (#9605) 2024-10-23 08:28:21 +00:00
Cyrus Leung
831540cf04 [Model] Support E5-V (#9576) 2024-10-23 11:35:29 +08:00
Flex Wang
29061ed9df [Misc] Add an env var VLLM_LOGGING_PREFIX, if set, it will be prepend to all logging messages (#9590) 2024-10-23 11:17:28 +08:00
yulei
b17046e298 [BugFix] Fix metrics error for --num-scheduler-steps > 1 (#8234) 2024-10-22 15:43:03 -07:00
Aurick Qiao
23b899a8e6 [Bugfix] fix detokenizer shallow copy (#5919) 2024-10-22 15:38:12 -07:00
youkaichao
17c79f3c36 [torch.compile] auto infer dynamic_arg_dims from type annotation (#9589) 2024-10-22 13:43:37 -07:00
Ronen Schaffer
cd5601ac37 [BugFix] Prevent exporting duplicate OpenTelemetry spans (#9017) 2024-10-22 11:11:53 -07:00
Yuhong Guo
434984e665 [Frontend] Support custom request_id from request (#9550)
Co-authored-by: Yuhong Guo <yuhong.gyh@antgroup.com>
2024-10-22 18:07:30 +00:00
gopalsarda
08075c3448 [Bugfix] Eagle: change config name for fc bias (#9580) 2024-10-22 16:14:22 +00:00
Isotr0py
bb392ea2d2 [Model][VLM] Initialize support for Mono-InternVL model (#9528) 2024-10-22 16:01:46 +00:00
xendo
9dbcce84a7 [Neuron] [Bugfix] Fix neuron startup (#9374)
Co-authored-by: Jerzy Zagorski <jzagorsk@amazon.com>
2024-10-22 12:51:41 +00:00
Woosuk Kwon
6c5af09b39 [V1] Implement vLLM V1 [1/N] (#9289) 2024-10-22 01:24:07 -07:00
wangshuai09
3ddbe25502 [Hardware][CPU] using current_platform.is_cpu (#9536) 2024-10-22 00:50:43 -07:00
chenqianfzh
0d02747f2e support TP in qwen2 bnb (#9574) 2024-10-22 07:13:23 +00:00
Kuntai Du
ca30c3c84b [Core] Remove evictor_v1 (#9572) 2024-10-22 04:55:49 +00:00