數位音樂合成 digital music synthesis、數位語音合成 digital speech synthesis、數位語音辨識 digital speech recognition
數位音樂合成 digital music synthesis
使用訊號產生技術,包含週期性訊號及非週期性訊號,並透過數位訊號的排列組合,產生數位音樂
音樂的基本概念
音樂的基本構成要素:音高 pitch、節拍 beats、節奏 tempo
聲音的高低稱為音高 pitch,唱名:Do, Re, Mi, Fa, So, La, Si,對應的英語音名:C, D, E, F, G, A, B
鋼琴鍵盤的排列方式,是依照音高的順序,以中央 C 為基準向左右延伸,兩個相同音名鍵盤之間,有 8 個鍵盤,因此稱為八度音 octave。相鄰白鍵是相差一個全音,相鄰的白鍵與黑鍵,相差一個半音。
音高、音頻對照表:頻率,單位為赫茲。括號內為距離中央C(261.63赫茲)的半音距離。
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
---|---|---|---|---|---|---|---|---|---|---|
C | 16.352 (−48) | 32.703 (−36) | 65.406 (−24) | 130.81 (−12) | 261.63 (0) | 523.25 (+12) | 1046.5 (+24) | 2093.0 (+36) | 4186.0 (+48) | 8372.0 (+60) |
C♯/D♭ | 17.324 (−47) | 34.648 (−35) | 69.296 (−23) | 138.59 (−11) | 277.18 (+1) | 554.37 (+13) | 1108.7 (+25) | 2217.5 (+37) | 4434.9 (+49) | 8869.8 (+61) |
D | 18.354 (−46) | 36.708 (−34) | 73.416 (−22) | 146.83 (−10) | 293.66 (+2) | 587.33 (+14) | 1174.7 (+26) | 2349.3 (+38) | 4698.6 (+50) | 9397.3 (+62) |
D♯/E♭ | 19.445 (−45) | 38.891 (−33) | 77.782 (−21) | 155.56 (−9) | 311.13 (+3) | 622.25 (+15) | 1244.5 (+27) | 2489.0 (+39) | 4978.0 (+51) | 9956.1 (+63) |
E | 20.602 (−44) | 41.203 (−32) | 82.407 (−20) | 164.81 (−8) | 329.63 (+4) | 659.26 (+16) | 1318.5 (+28) | 2637.0 (+40) | 5274.0 (+52) | 10548 (+64) |
F | 21.827 (−43) | 43.654 (−31) | 87.307 (−19) | 174.61 (−7) | 349.23 (+5) | 698.46 (+17) | 1396.9 (+29) | 2793.8 (+41) | 5587.7 (+53) | 11175 (+65) |
F♯/G♭ | 23.125 (−42) | 46.249 (−30) | 92.499 (−18) | 185.00 (−6) | 369.99 (+6) | 739.99 (+18) | 1480.0 (+30) | 2960.0 (+42) | 5919.9 (+54) | 11840 (+66) |
G | 24.500 (−41) | 48.999 (−29) | 97.999 (−17) | 196.00 (−5) | 392.00 (+7) | 783.99 (+19) | 1568.0 (+31) | 3136.0 (+43) | 6271.9 (+55) | 12544 (+67) |
G♯/A♭ | 25.957 (−40) | 51.913 (−28) | 103.83 (−16) | 207.65 (−4) | 415.30 (+8) | 830.61 (+20) | 1661.2 (+32) | 3322.4 (+44) | 6644.9 (+56) | 13290 (+68) |
A | 27.500 (−39) | 55.000 (−27) | 110.00 (−15) | 220.00 (−3) | 440.00 (+9) | 880.00 (+21) | 1760.0 (+33) | 3520.0 (+45) | 7040.0 (+57) | 14080 (+69) |
A♯/B♭ | 29.135 (−38) | 58.270 (−26) | 116.54 (−14) | 233.08 (−2) | 466.16 (+10) | 932.33 (+22) | 1864.7 (+34) | 3729.3 (+46) | 7458.6 (+58) | 14917 (+70) |
B | 30.868 (−37) | 61.735 (−25) | 123.47 (−13) | 246.94 (−1) | 493.88 (+11) | 987.77 (+23) | 1975.5 (+35) | 3951.1 (+47) | 7902.1 (+59) | 15804 (+71) |
樂曲中,每一個音都有自己的節拍 beats,代表這個音的時間長短,在五線譜中,節拍是用 音符 notes 表示,包含:全音符、二分音符、四分音符等等。ex: 五線譜中的拍號為 C 或 4/4,代表每一個小節有 4 拍,全音符代表這個音佔滿整個小節,因此為 4 拍,二分音符是全音符的一半,是 2 拍
另一個元素是節奏 tempo,就是音樂的快慢或速度。目前節奏通常是以每分鐘的節拍數 beats per minutes 決定。音樂的節奏包含:慢板、行板、中板、快板,與音樂要表達的情感有關。
小蜜蜂
import numpy as np
import wave
import struct
# 音符:音高 pitch + 節拍 beat
def note( pitch, beat ):
fs = 44000
amplitude = 30000
# C, D, E, F, G, A, B 的頻率
frequency = np.array( [ 261.6, 293.7, 329.6, 349.2, 392.0, 440.0, 493.9 ] )
num_samples = beat * fs
t = np.linspace( 0, beat, num_samples, endpoint = False )
# 淡出效果
a = np.linspace( 0, 1, num_samples, endpoint = False )
# 弦波
x = amplitude * a * np.cos( 2 * np.pi * frequency[ pitch - 1 ] * t )
return x
def main():
file = "little_bee.wav" # 檔案名稱
# 音高 pitch
pitches = np.array( [ 5, 3, 3, 4, 2, 2, 1, 2, 3, 4, 5, 5, 5, \
5, 3, 3, 4, 2, 2, 1, 3, 5, 5, 3, \
2, 2, 2, 2, 2, 3, 4, 3, 3, 3, 3, 3, 4, 5, \
5, 3, 3, 4, 2, 2, 1, 3, 5, 5, 1 ] )
# 節拍 beat
beats = np.array( [ 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, \
1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 4, \
1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, \
1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 4 ] )
tempo = 0.5 # 節奏(每拍0.5秒)
fs = 44000
# 時間總長度 = 節拍總和 * 節奏
duration = sum( beats ) * tempo
# 總樣本數 = 時間總長度 * 頻率
num_samples = int( duration * fs )
num_channels = 1 # 通道數
samwidth = 2 # 樣本寬度
num_frames = num_samples # 音框數 = 樣本數
comptype = "NONE" # 壓縮型態
compname = "not compressed" # 無壓縮
num_notes = np.size( pitches )
y = np.array( [ ] )
for i in range( num_notes ):
x = note( pitches[i], beats[i] * tempo )
y = np.append( y, x )
wav_file = wave.open( file, 'w' )
wav_file.setparams(( num_channels, samwidth, fs, num_frames, comptype, compname ))
for s in y:
wav_file.writeframes( struct.pack( 'h', int( s ) ) )
wav_file.close( )
main()
數位語音合成 digital speech synthesis
也就是 TTS, Text to Speech 的技術。
python 套件
- Pyttsx Text to Speech
- gTTS Text to Speech
- eSpeak
數位語音辨識 digital speech recognition
Speech To Text
發展的技術:隱藏式馬可夫模型(Hidden Markov Models)、動態時間扭曲(Dynamic Time Warping, DTW)、人工神經網路(Artificial Neural Networks)、深度學習 (Deep Learning)、點對點自動語音辨識 (End-to-End Automatic Speech Recognition)。語音辨識率的準確度受到許多因素影響:雜訊、男生/女生、成人/兒童、口音、語意
新的人工智慧技術:遞迴神經網路 (Recurrent Neural Network, RNN),同時結合長短期記憶 (Long-Short Term Memory, LSTM)的技術最具代表性。AI 技術將成為語音辨識的主流
python 語音辨識 library: SpeechRecognition,支援許多 engines/apis
- CMU Sphinx
- Google Speech Recognition
- Google Cloud Speech API
- Wit.ai
- Microsoft Bing Voice Recognition
- Houndify API
- IBM Speech to Txt
- Snowboy Hotword Detection
沒有留言:
張貼留言