This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
/
vllm
Watch
1
Star
0
Fork
0
You've already forked vllm
Code
Issues
Pull Requests
Actions
2
Packages
Projects
Releases
Wiki
Activity
Files
8958217ad5a6830c4d911e5f15e6eb791df337b6
vllm
/
vllm
/
v1
/
worker
History
Hiroaki Sugiyama
8958217ad5
[Bugfix] Fix use_cascade_attention handling for Alibi-based models on vllm/v1 (
#15211
)
...
Signed-off-by: h-sugi <
h.sugi@ieee.org
> Co-authored-by: Woosuk Kwon <
woosuk.kwon@berkeley.edu
>
2025-03-27 22:29:29 +08:00
..
__init__.py
[V1] Implement vLLM V1 [1/N] (
#9289
)
2024-10-22 01:24:07 -07:00
block_table.py
Update deprecated Python 3.8 typing (
#13971
)
2025-03-02 17:34:51 -08:00
gpu_input_batch.py
[V1] Aggregate chunked prompt logprobs in model runner (
#14875
)
2025-03-24 12:27:57 -04:00
gpu_model_runner.py
[Bugfix] Fix use_cascade_attention handling for Alibi-based models on vllm/v1 (
#15211
)
2025-03-27 22:29:29 +08:00
gpu_worker.py
[v1] Refactor KVCacheConfig (
#14079
)
2025-03-21 04:56:27 -07:00
lora_model_runner_mixin.py
[Kernels] LoRA - Retire SGMV and BGMV Kernels (
#14685
)
2025-03-18 09:47:53 +00:00
tpu_model_runner.py
[TPU] [V1] fix cases when max_num_reqs is set smaller than MIN_NUM_SEQS (
#15583
)
2025-03-26 22:46:26 -07:00
tpu_worker.py
[TPU] support disabling xla compilation cache (
#15567
)
2025-03-27 00:09:28 +00:00
worker_base.py
[v1] Refactor KVCacheConfig (
#14079
)
2025-03-21 04:56:27 -07:00