vllm/vllm/model_executor at f2036734fbf6d4b119d9362dddb8b4a6954e3591 - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Files

History

Pavani Majety f2036734fb [ModelOpt] Introduce VLLM_MAX_TOKENS_PER_EXPERT_FP4_MOE env var to control blockscale tensor allocation (#18160 )

Signed-off-by: Pavani Majety <pmajety@nvidia.com>

2025-05-23 15:52:20 -07:00

..

guided_decoding

[V0][Bugfix] Fix parallel sampling performance regression when guided decoding is enabled (#17731 )

2025-05-23 03:38:23 -07:00

[ModelOpt] Introduce VLLM_MAX_TOKENS_PER_EXPERT_FP4_MOE env var to control blockscale tensor allocation (#18160 )

2025-05-23 15:52:20 -07:00

[Bugfix] Fix transformers model impl ignored for mixtral quant (#18602 )

2025-05-23 05:54:13 -07:00

[CI] Enable test_initialization to run on V1 (#16736 )

2025-05-23 15:09:44 -07:00

__init__.py

[Misc] Add SPDX-License-Identifier headers to python source files (#12628 )

2025-02-02 11:58:18 -08:00

custom_op.py

Update some more deprecated type hinting (#17998 )

2025-05-12 23:49:33 +00:00

parameter.py

[Kernel] Support Microsoft Runtime Kernel Lib for our Low Precision Computation - BitBLAS (#6036 )

2025-04-22 09:01:36 +01:00

pooling_metadata.py

Update some more deprecated type hinting (#17998 )

2025-05-12 23:49:33 +00:00

sampling_metadata.py

Update some more deprecated type hinting (#17998 )

2025-05-12 23:49:33 +00:00

utils.py

Update some more deprecated type hinting (#17998 )

2025-05-12 23:49:33 +00:00