This website requires JavaScript.
Explore
Help
Register
Sign In
biondizzle
/
vllm
Watch
1
Star
0
Fork
0
You've already forked vllm
Code
Issues
Pull Requests
Actions
2
Packages
Projects
Releases
Wiki
Activity
Files
9de25c294b92e42a12d1fbbb3ab3f633fa80291c
vllm
/
vllm
/
model_executor
History
Dipika Sikka
d272415e57
[Quantization] Expand compressed-tensors MoE matching logic to support NFP4 + FP8 MoEs (
#22674
)
...
Signed-off-by: Dipika Sikka <
dipikasikka1@gmail.com
> Signed-off-by: Dipika <
dipikasikka1@gmail.com
>
2025-08-27 05:00:21 +00:00
..
layers
[Quantization] Expand compressed-tensors MoE matching logic to support NFP4 + FP8 MoEs (
#22674
)
2025-08-27 05:00:21 +00:00
model_loader
[Quantization] Allow GGUF quantization to skip unquantized layer (
#23188
)
2025-08-22 13:04:22 -06:00
models
[Model] Add Ernie4.5 VL Model Support (
#22514
)
2025-08-26 21:02:55 -07:00
warmup
[Kernel] Add nvfp4 gemm flashinfer backends (
#22346
)
2025-08-14 16:03:55 -04:00
__init__.py
[Misc] Add SPDX-FileCopyrightText (
#19100
)
2025-06-03 11:20:17 -07:00
custom_op.py
Optimize configuration access with LRU cache in custom ops (
#22204
)
2025-08-04 21:43:24 -07:00
parameter.py
[Misc] Add SPDX-FileCopyrightText (
#19100
)
2025-06-03 11:20:17 -07:00
pooling_metadata.py
[Performance] V1 Pooling Models E2E Performance Optimization (
#23162
)
2025-08-21 13:26:09 +00:00
sampling_metadata.py
Revert "Update sampling_metadata.py (
#21937
)" (
#22088
)
2025-08-01 05:24:46 -07:00
utils.py
[Quantization] Enable BNB support for InternS1 (
#21953
)
2025-08-01 11:09:54 +00:00