0在此处简要描述该镜像的核心功能、用途和主要特点。
本镜像构建和运行所需的基础环境。
注意:支持 0.6B, 1.7B两种模型,请自行选择替换命令
cd /workspace
conda activate qwen-asr
qwen-asr-serve ./Qwen3-ASR-0.6B --gpu-memory-utilization 0.8 --host 0.0.0.0 --port 8000
当命令行出现

即可使用api进行访问
curl请求示例
curl http://117.50.198.70:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
"messages": [
{"role": "user", "content": [
{"type": "audio_url", "audio_url": {"url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/asr_en.wav"}}
]}
]
}'
curl 返回示例
{"id":"chatcmpl-8883581480530eb3","object":"chat.completion","created":1770027449,"model":"Qwen/Qwen3-ASR-1.7B","choices":[{"index":0,"message":{"role":"assistant","content":"language English<asr_text>Uh huh. Oh yeah, yeah. He wasn't even that big when I started listening to him, but and his solo music didn't do overly well, but he did very well when he started writing for other people.","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning":null,"reasoning_content":null},"logprobs":null,"finish_reason":"stop","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":211,"total_tokens":260,"completion_tokens":49,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}
反应时间:
同时也支持openai接口进行访问
import base64
import httpx
from openai import OpenAI
# Initialize client
client = OpenAI(
base_url="http://ip:8000/v1",
api_key="EMPTY"
)
# Create multimodal chat completion request
response = client.chat.completions.create(
model="./Qwen3-ASR-0.6B",
messages=[
{
"role": "user",
"content": [
{
"type": "audio_url",
"audio_url": {
"url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/asr_en.wav"
}
}
]
}
],
)
反应时间:
支持通过 vLLM 的 OpenAI 转录 API 使用
import httpx
from openai import OpenAI
# Initialize client
client = OpenAI(
base_url="http://117.50.198.70:8000/v1",
api_key="EMPTY"
)
audio_url = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/asr_en.wav"
audio_file = httpx.get(audio_url).content
transcription = client.audio.transcriptions.create(
model="./Qwen3-ASR-0.6B",
file=audio_file,
)
print(transcription.text)
反应时间:
首先确保所在目录和conda环境正确
cd /workspace
conda activate qwen-asr
# vLLM backend + Forced Aligner (enable timestamps)
qwen-asr-demo \
--asr-checkpoint ./Qwen3-ASR-1.7B \
--aligner-checkpoint ./Qwen3-ForcedAligner-0.6B \
--backend vllm \
--cuda-visible-devices 0 \
--backend-kwargs '{"gpu_memory_utilization":0.7,"max_inference_batch_size":8,"max_new_tokens":2048}' \
--aligner-kwargs '{"device_map":"cuda:0","dtype":"bfloat16"}' \
--ip 0.0.0.0 --port 8000
# vLLM backend + Forced Aligner (enable timestamps)
qwen-asr-demo \
--asr-checkpoint ./Qwen3-ASR-0.6B \
--aligner-checkpoint ./Qwen3-ForcedAligner-0.6B \
--backend vllm \
--cuda-visible-devices 0 \
--backend-kwargs '{"gpu_memory_utilization":0.7,"max_inference_batch_size":8,"max_new_tokens":2048}' \
--aligner-kwargs '{"device_map":"cuda:0","dtype":"bfloat16"}' \
--ip 0.0.0.0 --port 8000
访问 http://ip:8000 ,

0.6B 占用GPU 15G
# vLLM backend + Forced Aligner (enable timestamps)
qwen-asr-demo \
--asr-checkpoint ./Qwen3-ASR-0.6B \
--backend vllm \
--cuda-visible-devices 0 \
--backend-kwargs '{"gpu_memory_utilization":0.7,"max_inference_batch_size":8,"max_new_tokens":2048}' \
--aligner-kwargs '{"device_map":"cuda:0","dtype":"bfloat16"}' \
--ip 0.0.0.0 --port 8000
访问 http://ip:8000 ,

创建sshkey
openssl req -x509 -newkey rsa:2048 \
-keyout key.pem -out cert.pem \
-days 3650 -nodes \
-subj "/CN=localhost"
qwen-asr-demo \
--asr-checkpoint ./Qwen3-ASR-0.6B \
--aligner-checkpoint ./Qwen3-ForcedAligner-0.6B \
--backend vllm \
--cuda-visible-devices 0 \
--backend-kwargs '{"gpu_memory_utilization":0.7,"max_inference_batch_size":8,"max_new_tokens":2048}' \
--aligner-kwargs '{"device_map":"cuda:0","dtype":"bfloat16"}' \
--ssl-certfile cert.pem \
--ssl-keyfile key.pem \
--no-ssl-verify
访问 https://ip:8000 ,可以使用麦克风
如果访问失败,请检查 8000 端口是否开放
