> ## Documentation Index
> Fetch the complete documentation index at: https://docs.skiesoft.com/llms.txt
> Use this file to discover all available pages before exploring further.

# 語音辨識

> 深入了解「聽有 AI」語音辨識技術的功能與最佳實踐

## 語音辨識概述

「聽有 AI」是 軟雲開發的先進語音辨識技術，專為臺灣市場優化，支援國語、臺語、英語及混合語言辨識，達到業界領先的 97.5% 準確率。

### 核心特色

<CardGroup cols={2}>
  <Card title="多語言混合辨識" icon="language">
    無縫支援國語、臺語、英語混合語音，自動識別語言切換
  </Card>

  <Card title="即時處理" icon="bolt">
    低於 200ms 的超低延遲，適合即時應用場景
  </Card>

  <Card title="高準確率" icon="bullseye">
    97.5% 的辨識準確率，持續優化模型效能
  </Card>

  <Card title="產業優化" icon="building">
    針對醫療、金融、教育等產業提供專用模型
  </Card>
</CardGroup>

## 支援的語言

### 語言代碼

| 語言   | 代碼            | 說明          |
| ---- | ------------- | ----------- |
| 國語   | `zh-TW`       | 臺灣國語，包含繁體中文 |
| 臺語   | `zh-TW-taigi` | 臺灣閩南語       |
| 英語   | `en-US`       | 美式英語        |
| 混合語言 | `mixed`       | 自動偵測並切換語言   |

### 語言設定範例

<CodeGroup>
  ```javascript Node.js theme={null}
  // 單一語言辨識
  const result = await client.transcribe({
    audio: audioBuffer,
    language: 'zh-TW'
  });

  // 混合語言辨識（推薦）
  const mixedResult = await client.transcribe({
    audio: audioBuffer,
    language: 'mixed'
  });
  ```

  ```python Python theme={null}
  # 單一語言辨識
  result = await client.transcribe(
      audio=audio_data,
      language='zh-TW'
  )

  # 混合語言辨識
  mixed_result = await client.transcribe(
      audio=audio_data,
      language='mixed'
  )
  ```
</CodeGroup>

## 音訊格式支援

### 支援的格式

| 格式   | 副檔名     | 建議使用場景    |
| ---- | ------- | --------- |
| WAV  | `.wav`  | 高品質錄音，無壓縮 |
| MP3  | `.mp3`  | 一般用途，檔案較小 |
| FLAC | `.flac` | 無損壓縮，高品質  |
| AAC  | `.aac`  | 行動裝置錄音    |
| OGG  | `.ogg`  | 開源格式      |

### 音訊參數

| 參數   | 建議值      | 支援範圍                  |
| ---- | -------- | --------------------- |
| 採樣率  | 16kHz    | 8kHz - 48kHz          |
| 位元深度 | 16-bit   | 16-bit, 24-bit        |
| 聲道數  | 單聲道      | 單聲道、立體聲               |
| 編碼   | LINEAR16 | LINEAR16, FLAC, MULAW |

### 音訊品質最佳化

```javascript theme={null}
// 最佳音訊設定
const optimalConfig = {
  sampleRate: 16000,    // 16kHz 採樣率
  encoding: 'LINEAR16', // 線性 PCM 編碼
  channels: 1,          // 單聲道
  bitDepth: 16         // 16-bit 位元深度
};

const result = await client.transcribe({
  audio: audioBuffer,
  language: 'mixed',
  ...optimalConfig
});
```

## 辨識模式

### 1. 即時辨識

適用於需要即時回饋的場景，如客服、會議記錄：

```javascript theme={null}
const stream = client.createRealTimeStream({
  language: 'mixed',
  sampleRate: 16000,
  encoding: 'LINEAR16',
  interimResults: true // 顯示中間結果
});

stream.on('data', (result) => {
  if (result.isFinal) {
    console.log('最終結果:', result.transcript);
  } else {
    console.log('中間結果:', result.transcript);
  }
});
```

### 2. 批次辨識

適用於處理預錄音檔，如影片字幕、Podcast 逐字稿：

```javascript theme={null}
const result = await client.transcribe({
  audio: fs.readFileSync('recording.wav'),
  language: 'mixed',
  enableWordTimeOffsets: true, // 啟用時間軸
  enableSpeakerDiarization: true // 說話者分離
});

console.log('完整逐字稿:', result.transcript);

// 顯示時間軸資訊
result.words.forEach(word => {
  console.log(`${word.text} (${word.startTime}s - ${word.endTime}s)`);
});
```

## 進階功能

### 說話者分離

自動識別不同說話者並標記：

```javascript theme={null}
const result = await client.transcribe({
  audio: audioBuffer,
  language: 'mixed',
  enableSpeakerDiarization: true,
  maxSpeakers: 4 // 最多 4 位說話者
});

// 顯示說話者資訊
result.segments.forEach(segment => {
  console.log(`說話者 ${segment.speaker}: ${segment.text}`);
  console.log(`時間: ${segment.startTime}s - ${segment.endTime}s`);
});
```

### 信心度評分

取得每個詞彙的信心度評分：

```javascript theme={null}
const result = await client.transcribe({
  audio: audioBuffer,
  language: 'mixed',
  enableWordConfidence: true
});

// 顯示低信心度的詞彙
result.words.forEach(word => {
  if (word.confidence < 0.8) {
    console.log(`低信心度詞彙: ${word.text} (${word.confidence})`);
  }
});
```

## 效能優化

### 音訊預處理

```javascript theme={null}
// 音訊預處理建議
const preprocessAudio = (audioBuffer) => {
  // 1. 降噪處理
  const denoisedAudio = removeNoise(audioBuffer);
  
  // 2. 音量正規化
  const normalizedAudio = normalizeVolume(denoisedAudio);
  
  // 3. 移除靜音段落
  const trimmedAudio = trimSilence(normalizedAudio);
  
  return trimmedAudio;
};

const result = await client.transcribe({
  audio: preprocessAudio(rawAudio),
  language: 'mixed'
});
```

### 批次處理優化

```javascript theme={null}
// 大檔案分段處理
const processLargeFile = async (largeAudioFile) => {
  const chunks = splitAudioIntoChunks(largeAudioFile, 60); // 60秒分段
  const results = [];
  
  for (const chunk of chunks) {
    const result = await client.transcribe({
      audio: chunk,
      language: 'mixed'
    });
    results.push(result);
  }
  
  return mergeResults(results);
};
```

## 錯誤處理

### 常見錯誤類型

```javascript theme={null}
try {
  const result = await client.transcribe({
    audio: audioBuffer,
    language: 'mixed'
  });
} catch (error) {
  switch (error.code) {
    case 'AUDIO_TOO_SHORT':
      console.error('音訊檔案太短，至少需要 1 秒');
      break;
    case 'AUDIO_TOO_LONG':
      console.error('音訊檔案太長，最多 60 分鐘');
      break;
    case 'UNSUPPORTED_FORMAT':
      console.error('不支援的音訊格式');
      break;
    case 'POOR_AUDIO_QUALITY':
      console.error('音訊品質過低，建議重新錄製');
      break;
    case 'NO_SPEECH_DETECTED':
      console.error('未偵測到語音內容');
      break;
    default:
      console.error('辨識錯誤:', error.message);
  }
}
```

## 最佳實踐

### 1. 錄音品質

* 使用高品質麥克風
* 避免背景噪音
* 保持適當的錄音距離（15-30cm）
* 確保音量適中，避免過大或過小

### 2. 語言使用

* 清晰發音，避免過快語速
* 混合語言時自然切換
* 避免方言過重的表達

### 3. 應用整合

* 實作適當的錯誤處理
* 提供使用者回饋機制
* 考慮離線備案方案
* 監控 API 使用量

### 4. 效能考量

* 使用適當的音訊格式
* 實作音訊預處理
* 考慮快取機制
* 批次處理大量檔案

***

需要更多語音辨識技術支援？請聯絡我們：[support@skiesoft.com](mailto:support@skiesoft.com)