Commit Graph

14386 Commits

Author SHA1 Message Date
TJian
8a1f938e6f [Doc] Update Quantization Hardware Support Documentation (#12025)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-01-14 04:37:52 +00:00
Konrad Zawora
078da31903 [HPU][Bugfix] set_forward_context and CI test execution (#12014)
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
2025-01-14 11:04:18 +08:00
Woosuk Kwon
1a401252b5 [Docs] Add Sky Computing Lab to project intro (#12019)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-01-13 17:24:36 -08:00
Steve Luo
f35ec461fc [Bugfix] Fix deepseekv3 gate bias error (#12002)
Signed-off-by: mgoin <michael@neuralmagic.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
2025-01-13 13:43:51 -07:00
Yikun Jiang
289b5191d5 [Doc] Fix build from source and installation link in README.md (#12013)
Signed-off-by: Yikun <yikunkero@gmail.com>
2025-01-13 17:23:59 +00:00
elijah
c6db21313c bugfix: Fix signature mismatch in benchmark's get_tokenizer function (#11982)
Signed-off-by: elijah <f1renze.142857@gmail.com>
2025-01-13 15:22:07 +00:00
Shanshan Shen
a7d59688fb [Platform] Move get_punica_wrapper() function to Platform (#11516)
Signed-off-by: Shanshan Shen <467638484@qq.com>
2025-01-13 13:12:10 +00:00
youkaichao
458e63a2c6 [platform] add device_control env var (#12009)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-13 20:59:09 +08:00
Harry Mellor
e8c23ff989 [Doc] Organise installation documentation into categories and tabs (#11935)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-13 12:27:36 +00:00
Roger Wang
cd8249903f [Doc][V1] Update model implementation guide for V1 support (#11998)
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-01-13 11:58:54 +00:00
Chen Zhang
0f8cafe2d1 [Kernel] unified_attention for Attention.forward (#11967)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-01-13 19:28:53 +08:00
Alex Brooks
5340a30d01 Fix Max Token ID for Qwen-VL-Chat (#11980)
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
2025-01-13 08:37:48 +00:00
youkaichao
89ce62a316 [platform] add ray_device_key (#11948)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-13 16:20:52 +08:00
Chenguang Li
c3f05b09a0 [Misc]Minor Changes about Worker (#11555)
Signed-off-by: Chenguang Li <757486878@qq.com>
2025-01-13 15:47:05 +08:00
Concurrensee
cf6bbcb493 [Misc] Fix Deepseek V2 fp8 kv-scale remapping (#11947)
Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu>
2025-01-12 23:05:06 -08:00
Sungjae Lee
80ea3af1a0 [CI][Spec Decode] fix: broken test for EAGLE model (#11972)
Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com>
2025-01-13 06:50:35 +00:00
Siyuan Li
9dd02d85ca [Bug] Fix usage of .transpose() and .view() consecutively. (#11979) 2025-01-13 06:24:10 +00:00
Yangcheng Li
f7b3ba82c3 [MISC] fix typo in kv transfer send recv test (#11983) 2025-01-13 05:07:48 +00:00
Robert Shaw
619ae268c3 [V1] [2/n] Logging and Metrics - OutputProcessor Abstraction (#11973)
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
2025-01-13 04:54:10 +00:00
Isotr0py
d14e98d924 [Model] Support GGUF models newly added in transformers 4.46.0 (#9685)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-01-13 00:13:44 +00:00
Robert Shaw
9597a095f2 [V1][Core][1/n] Logging and Metrics (#11962)
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
2025-01-12 21:02:02 +00:00
Avshalom Manevich
263a870ee1 [Hardware][TPU] workaround fix for MoE on TPU (#11764) 2025-01-12 10:53:51 -05:00
Akshat Tripathi
8bddb73512 [Hardware][CPU] Multi-LoRA implementation for the CPU backend (#11100)
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Oleg Mosalov <oleg@krai.ai>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Oleg Mosalov <oleg@krai.ai>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2025-01-12 13:01:52 +00:00
Isotr0py
f967e51f38 [Model] Initialize support for Deepseek-VL2 models (#11578)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-01-12 00:17:24 -08:00
Rafael Vasquez
43f3d9e699 [CI/Build] Add markdown linter (#11857)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
2025-01-12 00:17:13 -08:00
Roger Wang
b25cfab9a0 [V1] Avoid sending text prompt to core engine (#11963)
Signed-off-by: Roger Wang <ywang@roblox.com>
2025-01-12 06:36:38 +00:00
sixgod
4b657d3292 [Model] Add cogagent model support vLLM (#11742)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2025-01-11 19:05:56 +00:00
Nicolò Lucchesi
d697dc01b4 [Bugfix] Fix RobertaModel loading (#11940)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-01-11 14:05:09 +00:00
Cyrus Leung
a991f7d508 [Doc] Basic guide for writing unit tests for new models (#11951)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-11 21:27:24 +08:00
Cyrus Leung
7a3a83e3b8 [CI/Build] Move model-specific multi-modal processing tests (#11934)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-11 13:50:05 +08:00
shaochangxu
c32a7c7c0c [Bugfix] fused_experts_impl wrong compute type for float32 (#11921)
Signed-off-by: shaochangxu.scx <shaochangxu.scx@antgroup.com>
Co-authored-by: shaochangxu.scx <shaochangxu.scx@antgroup.com>
2025-01-11 13:49:39 +08:00
Sungjae Lee
2118d0565c [Bugfix][SpecDecode] Adjust Eagle model architecture to align with intended design (#11672)
Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com>
2025-01-10 20:49:38 -08:00
youkaichao
899136b857 [ci] fix broken distributed-tests-4-gpus (#11937)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-11 09:07:24 +08:00
Fred Reiss
c9f09a4fe8 [mypy] Fix mypy warnings in api_server.py (#11941)
Signed-off-by: Fred Reiss <frreiss@us.ibm.com>
2025-01-11 01:04:58 +00:00
Travis Johnson
d45cbe70f5 [Bugfix] Check that number of images matches number of <|image|> tokens with mllama (#11939)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
2025-01-10 23:26:00 +00:00
minmin
8a579408f3 [Misc] Update benchmark_prefix_caching.py fixed example usage (#11920)
Signed-off-by: Ren MinMin <renmm6@chinaunicom.cn>
Co-authored-by: Ren MinMin <renmm6@chinaunicom.cn>
2025-01-10 20:39:22 +00:00
Isotr0py
46fa98ccad [Misc] Clean up debug code in Deepseek-V3 (#11930)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-01-10 19:19:15 +00:00
Li, Jiang
aa1e77a19c [Hardware][CPU] Support MOE models on x86 CPU (#11831)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-01-10 11:07:58 -05:00
Kuntai Du
5959564f94 Doc fix in benchmark_long_document_qa_throughput.py (#11933)
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
2025-01-10 23:51:43 +08:00
Kuntai Du
f33e033e27 [Docs] Fix docstring in get_ip function (#11932)
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
2025-01-10 23:51:02 +08:00
Harry Mellor
482cdc494e [Doc] Rename offline inference examples (#11927)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-10 23:50:29 +08:00
wangxiyuan
20410b2fda [platform] support custom torch.compile backend key (#11318)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2025-01-10 23:46:51 +08:00
Cyrus Leung
12664ddda5 [Doc] [1/N] Initial guide for merged multi-modal processor (#11925)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-10 14:30:25 +00:00
youkaichao
241ad7b301 [ci] Fix sampler tests (#11922)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-10 20:45:33 +08:00
Harry Mellor
d85c47d6ad Replace "online inference" with "online serving" (#11923)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-10 12:05:56 +00:00
wangxiyuan
ef725feafc [platform] support pytorch custom op pluggable (#11328)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-01-10 10:02:38 +00:00
cennn
d907be7dc7 [misc] remove python function call for custom activation op (#11885)
Co-authored-by: youkaichao <youkaichao@gmail.com>
2025-01-10 17:18:25 +08:00
youkaichao
d53575a5f0 [ci] fix gh200 tests (#11919)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-10 16:25:17 +08:00
Kunshang Ji
61af633256 [BUGFIX] Fix UnspecifiedPlatform package name (#11916)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-01-10 16:20:46 +08:00
Joe Runde
ac2f3f7fee [Bugfix] Validate lora adapters to avoid crashing server (#11727)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-01-10 15:56:36 +08:00