模型汇总
Embedding
Qwen3-Embedding
- 上下文长度:32K
- 维度:
8B(32-4096维),4B(32-2560维度),0.6B(32-1024维) - 支持语言:100多种
- 日期:2025-06-05
- 链接: https://huggingface.co/Qwen/Qwen3-Embedding-4B
BAAI/bge-m3
- 上下文长度:8192
- 维度:1024
- 支持语言:多语言;基于bge-m3-unsupervised的统一微调(密集、稀疏和colbert)
- 日期:2024
- 链接: https://www.modelscope.cn/models/BAAI/bge-m3
Reranker
Qwen3-Reranker
- 基本同Qwen3-Embedding
- 链接: https://www.modelscope.cn/models/Qwen/Qwen3-Reranker-4B
- 方便部署:Qwen3-Reranker-4B(可vllm直接部署) · 模型库
BAAI/bge-reranker-v2-m3
TTS
- Qwen3-TTS:
- 官方版本: https://github.com/QwenLM/Qwen3-TTS ,但是实际测试下来,推理效率很低,有个老哥做了优化: https://github.com/dffdeeq/Qwen3-TTS-streaming
- MOSS-TTSD:
- Index-TTS:
- CosyTTS:
- Boson AI:
- VibeVoice:
- GLM-TTS:
ASR
- whisper:
- WhisperLiveKit:
- GLM-ASR-Nano-2512:
- https://www.modelscope.cn/models/ZhipuAI/GLM-ASR-Nano-2512
- 说是比whisper v3要好。
翻译
- Seed-X-PPO-7B · 模型库
- HY-MT1.5-7B · 模型库 - 最新的腾讯混元翻译模型。
- 2025-12-30
对口型
- LatentSync:bytedance/LatentSync: Taming Stable Diffusion for Lip Sync!
- InfiniteTalk:MeiGen-AI/InfiniteTalk: Unlimited-length talking video generation that supports image-to-video and video-to-video generation
- MultiTalk: https://github.com/MeiGen-AI/MultiTalk
- Ditto: https://github.com/antgroup/ditto-talkinghead
- InfinityYou:InfiniteYou | ByteDance Intelligent Creation
视频生成
- 通义wan系列:
- Wan-Video/Wan2.1: Wan: Open and Advanced Large-Scale Video Generative Models
- Wan-Video/Wan2.2: Wan: Open and Advanced Large-Scale Video Generative Models
- Wan2.2-S2V-14B:audio-driven cinematic video generation model.
- 腾讯,基于wan2.1:WeChatCV/Stand-In: Stand-In is a lightweight, plug-and-play framework for identity-preserving video generation.
图像生成
- int4/fp4量化加速:
- Z-Image:
音乐生成
- ASLP-lab/DiffRhythm: Di♪♪Rhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion
- https://suno.com