Commit Graph

  • 3ea7b94523 Move linting to pre-commit (#11975) Harry Mellor 2025-01-20 06:58:01 +00:00
  • 51ef828f10 [torch.compile] fix sym_tensor_indices (#12191) youkaichao 2025-01-20 11:37:50 +08:00
  • df450aa567 [Bugfix] Fix num_heads value for simple connector when tp enabled (#12074) shangmingc 2025-01-20 10:56:43 +08:00
  • bbe5f9de7d [Model] Support for fairseq2 Llama (#11442) Martin Gleize 2025-01-19 19:40:40 +01:00
  • 81763c58a0 [V1] Add V1 support of Qwen2-VL (#12128) Roger Wang 2025-01-19 03:52:13 -08:00
  • edaae198e7 [Misc] Add BNB support to GLM4-V model (#12184) Isotr0py 2025-01-19 19:49:22 +08:00
  • 936db119ed benchmark_serving support --served-model-name param (#12109) gujing 2025-01-19 17:59:56 +08:00
  • e66faf4809 [torch.compile] store inductor compiled Python file (#12182) youkaichao 2025-01-19 16:27:26 +08:00
  • 630eb5b5ce [Bugfix] Fix multi-modal processors for transformers 4.48 (#12187) Cyrus Leung 2025-01-19 11:16:34 +08:00
  • 4e94951bb1 [BUGFIX] Move scores to float32 in case of running xgrammar on cpu (#12152) Michal Adamczyk 2025-01-19 04:12:05 +01:00
  • 7a8a48d51e [V1] Collect env var for usage stats (#12115) Simon Mo 2025-01-18 19:07:15 -08:00
  • 32eb0da808 [Misc] Support register quantization method out-of-tree (#11969) yancong 2025-01-19 08:13:16 +08:00
  • 6d0e3d3724 [core] clean up executor class hierarchy between v1 and v0 (#12171) youkaichao 2025-01-18 14:35:15 +08:00
  • 02798ecabe [Model] Port deepseek-vl2 processor, remove dependency (#12169) Isotr0py 2025-01-18 13:59:39 +08:00
  • 813f249f02 [Docs] Fix broken link in SECURITY.md (#12175) Russell Bryant 2025-01-17 23:35:21 -05:00
  • da02cb4b27 [core] further polish memory profiling (#12126) youkaichao 2025-01-18 12:25:08 +08:00
  • c09503ddd6 [AMD][CI/Build][Bugfix] use pytorch stale wheel (#12172) Hongxia Yang 2025-01-17 22:15:53 -05:00
  • 2b83503227 [misc] fix cross-node TP (#12166) youkaichao 2025-01-18 10:53:27 +08:00
  • 7b98a65ae6 [torch.compile] disable logging when cache is disabled (#12043) youkaichao 2025-01-18 04:29:31 +08:00
  • b5b57e301e [AMD][FP8] Using MI300 FP8 format on ROCm for block_quant (#12134) Gregory Shtrasberg 2025-01-17 12:12:26 -05:00
  • 54cacf008f [Bugfix] Mistral tokenizer encode accept list of str (#12149) Kunshang Ji 2025-01-18 00:47:53 +08:00
  • 58fd57ff1d [Bugfix] Fix score api for missing max_model_len validation (#12119) Wallas Henrique 2025-01-17 13:24:22 -03:00
  • 87a0c076af [core] allow callable in collective_rpc (#12151) youkaichao 2025-01-17 20:47:01 +08:00
  • d4e6194570 [CI/Build][CPU][Bugfix] Fix CPU CI (#12150) Li, Jiang 2025-01-17 19:39:52 +08:00
  • 07934cc237 [Misc][LoRA] Improve the readability of LoRA error messages (#12102) Jee Jee Li 2025-01-17 19:32:28 +08:00
  • 69d765f5a5 [V1] Move more control of kv cache initialization from model_executor to EngineCore (#11960) Chen Zhang 2025-01-17 15:39:35 +08:00
  • 8027a72461 [ROCm][MoE] moe tuning support for rocm (#12049) Divakar Verma 2025-01-17 00:49:16 -06:00
  • d75ab55f10 [Misc] Add deepseek_vl2 chat template (#12143) Isotr0py 2025-01-17 14:34:48 +08:00
  • d1adb9b403 [BugFix] add more is not None check in VllmConfig.__post_init__ (#12138) Chen Zhang 2025-01-17 13:33:22 +08:00
  • b8bfa46a18 [Bugfix] Fix issues in CPU build Dockerfile (#12135) Yuan Tang 2025-01-16 23:54:01 -05:00
  • 1475847a14 [Doc] Add instructions on using Podman when SELinux is active (#12136) Yuan Tang 2025-01-16 23:45:36 -05:00
  • fead53ba78 [CI]add genai-perf benchmark in nightly benchmark (#10704) Kunshang Ji 2025-01-17 12:15:09 +08:00
  • ebc73f2828 [Bugfix] Fix a path bug in disaggregated prefill example script. (#12121) Kuntai Du 2025-01-17 11:12:41 +08:00
  • d06e824006 [Bugfix] Set enforce_eager automatically for mllama (#12127) Chen Zhang 2025-01-17 04:30:08 +08:00
  • 62b06ba23d [Model] Add support for deepseek-vl2-tiny model (#12068) Isotr0py 2025-01-17 01:14:48 +08:00
  • 5fd24ec02e [misc] Add LoRA kernel micro benchmarks (#11579) Varun Sundar Rabindranath 2025-01-16 21:21:40 +05:30
  • 874f7c292a [Bugfix] Fix max image feature size for Llava-one-vision (#12104) Roger Wang 2025-01-16 06:54:06 -08:00
  • 92e793d91a [core] LLM.collective_rpc interface and RLHF example (#12084) youkaichao 2025-01-16 20:19:52 +08:00
  • bf53e0c70b Support torchrun and SPMD-style offline inference (#12071) youkaichao 2025-01-16 19:58:53 +08:00
  • dd7c9ad870 [Bugfix] Remove hardcoded head_size=256 for Deepseek v2 and v3 (#12067) Isotr0py 2025-01-16 18:11:54 +08:00
  • 9aa1519f08 Various cosmetic/comment fixes (#12089) Michael Goin 2025-01-16 04:59:06 -05:00
  • f8ef146f03 [Doc] Add documentation for specifying model architecture (#12105) Cyrus Leung 2025-01-16 15:53:43 +08:00
  • fa0050db08 [Core] Default to using per_token quantization for fp8 when cutlass is supported. (#8651) Elfie Guo 2025-01-15 20:31:27 -08:00
  • cd9d06fb8d Allow hip sources to be directly included when compiling for rocm. (#12087) tvirolai-amd 2025-01-15 23:46:03 +02:00
  • ebd8c669ef [Bugfix] Fix _get_lora_device for HQQ marlin (#12090) Varun Sundar Rabindranath 2025-01-16 01:29:42 +05:30
  • 70755e819e [V1][Core] Autotune encoder cache budget (#11895) Roger Wang 2025-01-15 11:29:00 -08:00
  • edce722eaa [Bugfix] use right truncation for non-generative tasks (#12050) Joe Runde 2025-01-15 09:31:01 -07:00
  • 57e729e874 [Doc]: Update OpenAI-Compatible Server documents (#12082) maang-h 2025-01-16 00:07:45 +08:00
  • de0526f668 [Misc][Quark] Upstream Quark format to VLLM (#10765) kewang-xlnx 2025-01-16 00:05:15 +08:00
  • 5ecf3e0aaf Misc: allow to use proxy in HTTPConnection (#12042) Yuan 2025-01-15 21:16:40 +08:00
  • 97eb97b5a4 [Model]: Support internlm3 (#12037) RunningLeon 2025-01-15 19:35:17 +08:00
  • 3adf0ffda8 [Platform] Do not raise error if _Backend is not found (#12023) wangxiyuan 2025-01-15 18:14:15 +08:00
  • ad388d25a8 Type-fix: make execute_model output type optional (#12020) Keyun Tong 2025-01-15 01:44:56 -08:00
  • cbe94391eb Fix: cases with empty sparsity config (#12057) Rahul Tuli 2025-01-15 04:41:24 -05:00
  • 994fc655b7 [V1][Prefix Cache] Move the logic of num_computed_tokens into KVCacheManager (#12003) Chen Zhang 2025-01-15 15:55:30 +08:00
  • 3f9b7ab9f5 [Doc] Update examples to remove SparseAutoModelForCausalLM (#12062) Kyle Sayers 2025-01-15 01:36:01 -05:00
  • ad34c0df0f [core] platform agnostic executor via collective_rpc (#11256) youkaichao 2025-01-15 13:45:21 +08:00
  • f218f9c24d [core] Turn off GPU communication overlap for Ray executor (#12051) Rui Qiao 2025-01-14 21:19:55 -08:00
  • 0794e7446e [Misc] Add multipstep chunked-prefill support for FlashInfer (#10467) Elfie Guo 2025-01-14 20:47:49 -08:00
  • b7ee940a82 [V1][BugFix] Fix edge case in VLM scheduling (#12065) Woosuk Kwon 2025-01-14 20:21:28 -08:00
  • 9ddac56311 [Platform] move current_memory_usage() into platform (#11369) Shanshan Shen 2025-01-15 11:38:25 +08:00
  • 1a51b9f872 [HPU][Bugfix] Don't use /dev/accel/accel0 for HPU autodetection in setup.py (#12046) Konrad Zawora 2025-01-15 03:59:18 +01:00
  • 42f5e7c52a [Kernel] Support MulAndSilu (#11624) Jee Jee Li 2025-01-15 10:29:53 +08:00
  • a3a3ee4e6f [Misc] Merge bitsandbytes_stacked_params_mapping and packed_modules_mapping (#11924) Jee Jee Li 2025-01-15 07:49:49 +08:00
  • 87054a57ab [Doc]: Update the Json Example of the Engine Arguments document (#12045) maang-h 2025-01-15 01:03:04 +08:00
  • c9d6ff530b Explain where the engine args go when using Docker (#12041) Harry Mellor 2025-01-14 16:05:50 +00:00
  • a2d2acb4c8 [Bugfix][Kernel] Give unique name to BlockSparseFlashAttention (#12040) Chen Zhang 2025-01-14 23:45:05 +08:00
  • 2e0e017610 [Platform] Add output for Attention Backend (#11981) wangxiyuan 2025-01-14 21:27:04 +08:00
  • 1f18adb245 [Kernel] Revert the API change of Attention.forward (#12038) Chen Zhang 2025-01-14 20:59:32 +08:00
  • bb354e6b2d [Bugfix] Fix various bugs in multi-modal processor (#12031) Cyrus Leung 2025-01-14 20:16:11 +08:00
  • ff39141a49 [HPU][misc] add comments for explanation (#12034) youkaichao 2025-01-14 19:24:06 +08:00
  • 8a1f938e6f [Doc] Update Quantization Hardware Support Documentation (#12025) TJian 2025-01-14 12:37:52 +08:00
  • 078da31903 [HPU][Bugfix] set_forward_context and CI test execution (#12014) Konrad Zawora 2025-01-14 04:04:18 +01:00
  • 1a401252b5 [Docs] Add Sky Computing Lab to project intro (#12019) Woosuk Kwon 2025-01-13 17:24:36 -08:00
  • f35ec461fc [Bugfix] Fix deepseekv3 gate bias error (#12002) Steve Luo 2025-01-14 04:43:51 +08:00
  • 289b5191d5 [Doc] Fix build from source and installation link in README.md (#12013) Yikun Jiang 2025-01-14 01:23:59 +08:00
  • c6db21313c bugfix: Fix signature mismatch in benchmark's get_tokenizer function (#11982) elijah 2025-01-13 23:22:07 +08:00
  • a7d59688fb [Platform] Move get_punica_wrapper() function to Platform (#11516) Shanshan Shen 2025-01-13 21:12:10 +08:00
  • 458e63a2c6 [platform] add device_control env var (#12009) youkaichao 2025-01-13 20:59:09 +08:00
  • e8c23ff989 [Doc] Organise installation documentation into categories and tabs (#11935) Harry Mellor 2025-01-13 12:27:36 +00:00
  • cd8249903f [Doc][V1] Update model implementation guide for V1 support (#11998) Roger Wang 2025-01-13 03:58:54 -08:00
  • 0f8cafe2d1 [Kernel] unified_attention for Attention.forward (#11967) Chen Zhang 2025-01-13 19:28:53 +08:00
  • 5340a30d01 Fix Max Token ID for Qwen-VL-Chat (#11980) Alex Brooks 2025-01-13 01:37:48 -07:00
  • 89ce62a316 [platform] add ray_device_key (#11948) youkaichao 2025-01-13 16:20:52 +08:00
  • c3f05b09a0 [Misc]Minor Changes about Worker (#11555) Chenguang Li 2025-01-13 15:47:05 +08:00
  • cf6bbcb493 [Misc] Fix Deepseek V2 fp8 kv-scale remapping (#11947) Concurrensee 2025-01-13 01:05:06 -06:00
  • 80ea3af1a0 [CI][Spec Decode] fix: broken test for EAGLE model (#11972) Sungjae Lee 2025-01-13 15:50:35 +09:00
  • 9dd02d85ca [Bug] Fix usage of .transpose() and .view() consecutively. (#11979) Siyuan Li 2025-01-13 14:24:10 +08:00
  • f7b3ba82c3 [MISC] fix typo in kv transfer send recv test (#11983) Yangcheng Li 2025-01-13 13:07:48 +08:00
  • 619ae268c3 [V1] [2/n] Logging and Metrics - OutputProcessor Abstraction (#11973) Robert Shaw 2025-01-12 23:54:10 -05:00
  • d14e98d924 [Model] Support GGUF models newly added in transformers 4.46.0 (#9685) Isotr0py 2025-01-13 08:13:44 +08:00
  • 9597a095f2 [V1][Core][1/n] Logging and Metrics (#11962) Robert Shaw 2025-01-12 16:02:02 -05:00
  • 263a870ee1 [Hardware][TPU] workaround fix for MoE on TPU (#11764) Avshalom Manevich 2025-01-12 17:53:51 +02:00
  • 8bddb73512 [Hardware][CPU] Multi-LoRA implementation for the CPU backend (#11100) Akshat Tripathi 2025-01-12 13:01:52 +00:00
  • f967e51f38 [Model] Initialize support for Deepseek-VL2 models (#11578) Isotr0py 2025-01-12 16:17:24 +08:00
  • 43f3d9e699 [CI/Build] Add markdown linter (#11857) Rafael Vasquez 2025-01-12 03:17:13 -05:00
  • b25cfab9a0 [V1] Avoid sending text prompt to core engine (#11963) Roger Wang 2025-01-11 22:36:38 -08:00
  • 4b657d3292 [Model] Add cogagent model support vLLM (#11742) sixgod 2025-01-12 03:05:56 +08:00
  • d697dc01b4 [Bugfix] Fix RobertaModel loading (#11940) Nicolò Lucchesi 2025-01-11 15:05:09 +01:00
  • a991f7d508 [Doc] Basic guide for writing unit tests for new models (#11951) Cyrus Leung 2025-01-11 21:27:24 +08:00