模型汇总

Embedding

Qwen3-Embedding

上下文长度：32K
维度：8B(32-4096维)，4B(32-2560维度)，0.6B(32-1024维)
支持语言：100多种
日期：2025-06-05
链接： https://huggingface.co/Qwen/Qwen3-Embedding-4B

BAAI/bge-m3

上下文长度：8192
维度：1024
支持语言：多语言；基于bge-m3-unsupervised的统一微调（密集、稀疏和colbert）
日期：2024
链接： https://www.modelscope.cn/models/BAAI/bge-m3

Reranker

Qwen3-Reranker

基本同Qwen3-Embedding
链接： https://www.modelscope.cn/models/Qwen/Qwen3-Reranker-4B
方便部署：Qwen3-Reranker-4B（可vllm直接部署） · 模型库

BAAI/bge-reranker-v2-m3

基本同BAAI/bge-m3
链接： https://www.modelscope.cn/models/BAAI/bge-reranker-v2-m3

TTS

Qwen3-TTS：
- 官方版本： https://github.com/QwenLM/Qwen3-TTS ，但是实际测试下来，推理效率很低，有个老哥做了优化： https://github.com/dffdeeq/Qwen3-TTS-streaming
MOSS-TTSD：
- 看他们给出的效果还挺惊艳的： https://mp.weixin.qq.com/s/GbqGNl6wW_T-C0ChQ49XFw
- 链接： https://github.com/OpenMOSS/MOSS-TTSD
Index-TTS：
- https://github.com/index-tts/index-tts.git
CosyTTS：
Boson AI：
- boson-ai/higgs-audio: Text-audio foundation model from Boson AI
VibeVoice：
- https://www.modelscope.cn/models/microsoft/VibeVoice-Realtime-0.5B
GLM-TTS：
- GLM-TTS · 模型库

ASR

whisper：
- openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision
WhisperLiveKit：
- QuentinFuxa/WhisperLiveKit: Real-time & local speech-to-text, translation, and speaker diarization. With server & web UI.
GLM-ASR-Nano-2512：
- https://www.modelscope.cn/models/ZhipuAI/GLM-ASR-Nano-2512
- 说是比whisper v3要好。

翻译

Seed-X-PPO-7B · 模型库
HY-MT1.5-7B · 模型库 - 最新的腾讯混元翻译模型。
- 2025-12-30

对口型

LatentSync：bytedance/LatentSync: Taming Stable Diffusion for Lip Sync!
InfiniteTalk：MeiGen-AI/InfiniteTalk: Unlimited-length talking video generation that supports image-to-video and video-to-video generation
MultiTalk: https://github.com/MeiGen-AI/MultiTalk
Ditto: https://github.com/antgroup/ditto-talkinghead
InfinityYou：InfiniteYou | ByteDance Intelligent Creation

视频生成

通义wan系列：
- Wan-Video/Wan2.1: Wan: Open and Advanced Large-Scale Video Generative Models
- Wan-Video/Wan2.2: Wan: Open and Advanced Large-Scale Video Generative Models
  - Wan2.2-S2V-14B：audio-driven cinematic video generation model.
腾讯，基于wan2.1：WeChatCV/Stand-In: Stand-In is a lightweight, plug-and-play framework for identity-preserving video generation.

图像生成

int4/fp4量化加速：
- ComfyUI-nunchaku Documentation — ComfyUI-nunchaku 1.0.0 documentation
Z-Image：
- https://www.modelscope.cn/models/Tongyi-MAI/Z-Image-Turbo
- 试了一下，效果和速度很不错

音乐生成

TalkingHead相关

FLAME相关生态：
- radekd91/inferno: 🔥🔥🔥 Set the world of 3D faces on fire with INFERNO 🔥🔥🔥