Simon Mo
|
3185fb0cca
|
Revert "[Core] Rename PromptInputs to PromptType, and inputs to prompt" (#8750)
|
2024-09-24 05:45:20 +00:00 |
|
youkaichao
|
0250dd68c5
|
re-implement beam search on top of vllm core (#8726)
Co-authored-by: Brendan Wong <bjwpokemon@gmail.com>
|
2024-09-23 22:08:12 -07:00 |
|
sroy745
|
88577ac928
|
Fix tests in test_scheduler.py that fail with BlockManager V2 (#8728)
|
2024-09-24 04:43:13 +00:00 |
|
Hongxia Yang
|
530821d00c
|
[Hardware][AMD] ROCm6.2 upgrade (#8674)
|
2024-09-23 18:52:39 -07:00 |
|
Alexander Matveev
|
1a2aef3e59
|
Add output streaming support to multi-step + async while ensuring RequestOutput obj reuse (#8335)
|
2024-09-23 15:38:04 -07:00 |
|
jiqing-feng
|
5f7bb58427
|
Fix typical acceptance sampler with correct recovered token ids (#8562)
|
2024-09-23 12:32:27 -07:00 |
|
Russell Bryant
|
b05f5c9238
|
[Core] Allow IPv6 in VLLM_HOST_IP with zmq (#8575)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-09-23 12:15:41 -07:00 |
|
Jee Jee Li
|
9b0e3ec970
|
[Kernel][LoRA] Add assertion for punica sgmv kernels (#7585)
|
2024-09-23 18:57:42 +00:00 |
|
Lucas Wilkinson
|
86e9c8df29
|
[Kernel] (2/N) Machete - Integrate into CompressedTensorsWNA16 and GPTQMarlin (#7701)
Co-authored-by: mgoin <michael@neuralmagic.com>
Co-authored-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-09-23 13:46:26 -04:00 |
|
Daniele
|
ee5f34b1c2
|
[CI/Build] use setuptools-scm to set __version__ (#4738)
Co-authored-by: youkaichao <youkaichao@126.com>
|
2024-09-23 09:44:26 -07:00 |
|
Jani Monoses
|
f2bd246c17
|
[VLM] Fix paligemma, fuyu and persimmon with transformers 4.45 : use config.text_config.vocab_size (#8707)
|
2024-09-23 14:43:09 +00:00 |
|
Yanyi Liu
|
a79e522984
|
[Model] Support pp for qwen2-vl (#8696)
|
2024-09-23 13:46:59 +00:00 |
|
Li, Jiang
|
3e83c12b5c
|
[Bugfix][CPU] fix missing input intermediate_tensors in the cpu_model_runner (#8733)
|
2024-09-23 13:15:16 +00:00 |
|
Isotr0py
|
e551ca1555
|
[Hardware][CPU] Refactor CPU model runner (#8729)
|
2024-09-23 20:12:20 +08:00 |
|
Alex Brooks
|
9b8c8ba119
|
[Core][Frontend] Support Passing Multimodal Processor Kwargs (#8657)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2024-09-23 07:44:48 +00:00 |
|
Yan Ma
|
d23679eb99
|
[Bugfix] fix docker build for xpu (#8652)
|
2024-09-22 22:54:18 -07:00 |
|
Luka Govedič
|
57a0702e63
|
[Bugfix] Fix CPU CMake build (#8723)
Co-authored-by: Yuan <yuan.zhou@intel.com>
|
2024-09-22 20:40:46 -07:00 |
|
Tyler Michael Smith
|
3dda7c2250
|
[Bugfix] Avoid some bogus messages RE CUTLASS's revision when building (#8702)
|
2024-09-22 22:24:59 -04:00 |
|
youkaichao
|
92ba7e7477
|
[misc] upgrade mistral-common (#8715)
|
2024-09-22 15:41:59 -07:00 |
|
youkaichao
|
d4a2ac8302
|
[build] enable existing pytorch (for GH200, aarch64, nightly) (#8713)
|
2024-09-22 12:47:54 -07:00 |
|
Lily Liu
|
c6bd70d772
|
[SpecDec][Misc] Cleanup, remove bonus token logic. (#8701)
|
2024-09-22 12:34:14 -07:00 |
|
litianjian
|
5b59532760
|
[Model][VLM] Add LLaVA-Onevision model support (#8486)
Co-authored-by: litianjian <litianjian@bytedance.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-09-22 10:51:44 -07:00 |
|
Huazhong Ji
|
ca2b628b3c
|
[MISC] rename CudaMemoryProfiler to DeviceMemoryProfiler (#8703)
|
2024-09-22 10:44:09 -07:00 |
|
Alex Brooks
|
8ca5051b9a
|
[Misc] Use NamedTuple in Multi-image example (#8705)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2024-09-22 20:56:20 +08:00 |
|
Cyrus Leung
|
06ed2815e2
|
[Model] Refactor BLIP/BLIP-2 to support composite model loading (#8407)
|
2024-09-22 12:24:21 +00:00 |
|
youkaichao
|
0e40ac9b7b
|
[ci][build] fix vllm-flash-attn (#8699)
|
2024-09-21 23:24:58 -07:00 |
|
Isotr0py
|
13d88d4137
|
[Bugfix] Refactor composite weight loading logic (#8656)
|
2024-09-22 04:33:27 +00:00 |
|
Tyler Michael Smith
|
d66ac62854
|
[Kernel][Bugfix] Delete some more useless code in marlin_moe_ops.cu (#8643)
|
2024-09-21 23:45:02 +00:00 |
|
Divakar Verma
|
9dc7c6c7f3
|
[dbrx] refactor dbrx experts to extend FusedMoe class (#8518)
|
2024-09-21 15:09:39 -06:00 |
|
rasmith
|
ec4aaad812
|
[Kernel][Triton][AMD] Remove tl.atomic_add from awq_gemm_kernel, 2-5x speedup MI300, minor improvement for MI250 (#8646)
|
2024-09-21 09:20:54 +00:00 |
|
Andy Dai
|
4dfdf43196
|
[Doc] Fix typo in AMD installation guide (#8689)
|
2024-09-21 00:24:12 -07:00 |
|
Cyrus Leung
|
5e85f4f82a
|
[VLM] Use SequenceData.from_token_counts to create dummy data (#8687)
|
2024-09-20 23:28:56 -07:00 |
|
Luka Govedič
|
71c60491f2
|
[Kernel] Build flash-attn from source (#8245)
|
2024-09-20 23:27:10 -07:00 |
|
youkaichao
|
0faab90eb0
|
[beam search] add output for manually checking the correctness (#8684)
|
2024-09-20 19:55:33 -07:00 |
|
Cyrus Leung
|
0455c46ed4
|
[Core] Factor out common code in SequenceData and Sequence (#8675)
|
2024-09-21 02:30:39 +00:00 |
|
Kunshang Ji
|
d4bf085ad0
|
[MISC] add support custom_op check (#8557)
Co-authored-by: youkaichao <youkaichao@126.com>
|
2024-09-20 19:03:55 -07:00 |
|
Cyrus Leung
|
0057894ef7
|
[Core] Rename PromptInputs and inputs(#8673)
|
2024-09-20 19:00:54 -07:00 |
|
zyddnys
|
0f961b3ce9
|
[Bugfix] Fix incorrect llava next feature size calculation (#8496)
|
2024-09-20 22:48:32 +00:00 |
|
omrishiv
|
7f9c8902e3
|
[Hardware][AWS] update neuron to 2.20 (#8676)
Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>
|
2024-09-20 15:19:44 -07:00 |
|
omrishiv
|
7c8566aa4f
|
[Doc] neuron documentation update (#8671)
Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>
|
2024-09-20 15:04:37 -07:00 |
|
Patrick von Platen
|
b4e4eda92e
|
[Bugfix][Core] Fix tekken edge case for mistral tokenizer (#8640)
|
2024-09-20 14:33:03 -07:00 |
|
Pastel!
|
2874bac618
|
[Bugfix] Config got an unexpected keyword argument 'engine' (#8556)
|
2024-09-20 14:00:45 -07:00 |
|
Cyrus Leung
|
035fa895ec
|
[Misc] Show AMD GPU topology in collect_env.py (#8649)
|
2024-09-20 13:52:19 -07:00 |
|
saumya-saran
|
b28298f2f4
|
[Bugfix] Validate SamplingParam n is an int (#8548)
|
2024-09-20 12:46:02 -07:00 |
|
Alexey Kondratiev(AMD)
|
2940afa04e
|
[CI/Build] Removing entrypoints/openai/test_embedding.py test from ROCm build (#8670)
|
2024-09-20 10:27:44 -07:00 |
|
Niklas Muennighoff
|
3b63de9353
|
[Model] Add OLMoE (#7922)
|
2024-09-20 09:31:41 -07:00 |
|
Jiaxin Shan
|
260d40b5ea
|
[Core] Support Lora lineage and base model metadata management (#6315)
|
2024-09-20 06:20:56 +00:00 |
|
William Lin
|
9e5ec35b1f
|
[bugfix] [AMD] add multi-step advance_step to ROCmFlashAttentionMetadata (#8474)
|
2024-09-19 20:49:54 -07:00 |
|
Amit Garg
|
18ae428a0d
|
[Bugfix] Fix Phi3.5 mini and MoE LoRA inference (#8571)
|
2024-09-20 08:54:02 +08:00 |
|
bnellnm
|
de6f90a13d
|
[Misc] guard against change in cuda library name (#8609)
|
2024-09-20 06:36:30 +08:00 |
|