Demo的视频地址:http://pan.baidu.com/s/1c1OsQYk 提取密码:229p
1.REST方式,百度推出的基于跨平台的REST 方式的语音识别,但是API文档中明确说明,该方式只支持音频文件上传的方式
在上篇我们已经搭建了android的环境,并导入百度给出的Android Demo代码。但是之前我们只看到了一个名为"Speech Recorder"的应用,后来我再次启动模拟器的时候,除了"Speech Recorder"以外还有一个"百度语音示例(2.x)"的应用。
在上篇中我们也对REST API的方式做了介绍并亲测上传录制好的音频文件再调用API是能够返回结果的。REST API的调用方式跨平台,轻便好用,但是需要录好音频文件上传后才能得到语音识别的结果,显得不是很方便。
- 绘制一个界面包含开始和结束用于控制语音输入
- 点击开始时,开始调用麦克风
- 添加线程监控麦克风,将麦克风输入的内容存储为指定音频格式的文件
- 当点击停止按钮时,调用语音识别的REST API,读取刚刚生成的音频文件
- 请求远程识别服务,返回音频文件识别后的结果
package com.baidu.speech.serviceapi;import javax.swing.*;import javax.xml.bind.DatatypeConverter;import org.json.JSONObject;import java.awt.*;import java.awt.event.*;import java.io.*;import java.net.HttpURLConnection;import java.net.URL;import javax.sound.sampled.*;public class AudioUI extends JFrame { AudioFormat audioFormat; TargetDataLine targetDataLine; final JButton captureBtn = new JButton("Capture"); final JButton stopBtn = new JButton("Stop"); final JPanel btnPanel = new JPanel(); final ButtonGroup btnGroup = new ButtonGroup(); final JRadioButton aifcBtn = new JRadioButton("AIFC"); final JRadioButton aiffBtn = new JRadioButton("AIFF"); final JRadioButton auBtn = // selected at startup new JRadioButton("AU", true); final JRadioButton sndBtn = new JRadioButton("SND"); final JRadioButton waveBtn = new JRadioButton("WAVE"); //definition variable for REST private static final String serverURL = "http://vop.baidu.com/server_api"; private static String token = ""; private static final String testFileName = "C:\\Users\\Administrator\\workspace\\speechrecognition\\output.wav"; //put your own params here private static final String apiKey = "***";//这里的apiKey就是前面申请在应用卡片中的apiKey private static final String secretKey = "***";//这里的secretKey就是前面申请在应用卡片中的secretKey private static final String cuid = "***";//cuid是设备的唯一标示,因为我用的是PC,所以这里用的是网卡Mac地址 public static void main(String args[]) { new AudioUI(); }// end main public AudioUI() {// constructor captureBtn.setEnabled(true); stopBtn.setEnabled(false); // Register anonymous listeners captureBtn.addActionListener(new ActionListener() { public void actionPerformed(ActionEvent e) { captureBtn.setEnabled(false); stopBtn.setEnabled(true); // Capture input data from the // microphone until the Stop button is // clicked. captureAudio(); }// end actionPerformed }// end ActionListener );// end addActionListener() stopBtn.addActionListener(new ActionListener() { public void actionPerformed(ActionEvent e) { captureBtn.setEnabled(true); stopBtn.setEnabled(false); // Terminate the capturing of input data // from the microphone. targetDataLine.stop(); targetDataLine.close(); try { getToken(); method1(); method2(); } catch (Exception e1) { // TODO Auto-generated catch block e1.printStackTrace(); } }// end actionPerformed }// end ActionListener );// end addActionListener() // Put the buttons in the JFrame getContentPane().add(captureBtn); getContentPane().add(stopBtn); // Include the radio buttons in a group btnGroup.add(aifcBtn); btnGroup.add(aiffBtn); btnGroup.add(auBtn); btnGroup.add(sndBtn); btnGroup.add(waveBtn); // Add the radio buttons to the JPanel btnPanel.add(aifcBtn); btnPanel.add(aiffBtn); btnPanel.add(auBtn); btnPanel.add(sndBtn); btnPanel.add(waveBtn); // Put the JPanel in the JFrame getContentPane().add(btnPanel); // Finish the GUI and make visible getContentPane().setLayout(new FlowLayout()); setTitle("Copyright 2003, R.G.Baldwin"); setDefaultCloseOperation(EXIT_ON_CLOSE); setSize(300, 120); setVisible(true); }// end constructor // This method captures audio input from a // microphone and saves it in an audio file. private void captureAudio() { try { // Get things set up for capture audioFormat = getAudioFormat(); DataLine.Info dataLineInfo = new DataLine.Info(TargetDataLine.class, audioFormat); targetDataLine = (TargetDataLine) AudioSystem.getLine(dataLineInfo); // Create a thread to capture the microphone // data into an audio file and start the // thread running. It will run until the // Stop button is clicked. This method // will return after starting the thread. new CaptureThread().start(); } catch (Exception e) { e.printStackTrace(); System.exit(0); } // end catch }// end captureAudio method // This method creates and returns an // AudioFormat object for a given set of format // parameters. If these parameters don't work // well for you, try some of the other // allowable parameter values, which are shown // in comments following the declarations. private AudioFormat getAudioFormat() { float sampleRate = 8000.0F; // 8000,11025,16000,22050,44100 int sampleSizeInBits = 16; // 8,16 int channels = 1; // 1,2 boolean signed = true; // true,false boolean bigEndian = false; // true,false return new AudioFormat(sampleRate, sampleSizeInBits, channels, signed, bigEndian); }// end getAudioFormat // =============================================// // Inner class to capture data from microphone // and write it to an output audio file. class CaptureThread extends Thread { public void run() { AudioFileFormat.Type fileType = null; File audioFile = null; // Set the file type and the file extension // based on the selected radio button. if (aifcBtn.isSelected()) { fileType = AudioFileFormat.Type.AIFC; audioFile = new File("output.aifc"); } else if (aiffBtn.isSelected()) { fileType = AudioFileFormat.Type.AIFF; audioFile = new File("output.aif"); } else if (auBtn.isSelected()) { fileType = AudioFileFormat.Type.AU; audioFile = new File("output.au"); } else if (sndBtn.isSelected()) { fileType = AudioFileFormat.Type.SND; audioFile = new File("output.snd"); } else if (waveBtn.isSelected()) { fileType = AudioFileFormat.Type.WAVE; audioFile = new File("output.wav"); } // end if try { targetDataLine.open(audioFormat); targetDataLine.start(); AudioSystem.write(new AudioInputStream(targetDataLine), fileType, audioFile); } catch (Exception e) { e.printStackTrace(); } // end catch }// end run }// end inner class CaptureThread // =============================================// private static void getToken() throws Exception { String getTokenURL = "https://openapi.baidu.com/oauth/2.0/token?grant_type=client_credentials" + "&client_id=" + apiKey + "&client_secret=" + secretKey; HttpURLConnection conn = (HttpURLConnection) new URL(getTokenURL).openConnection(); token = new JSONObject(printResponse(conn)).getString("access_token"); } private static void method1() throws Exception { File pcmFile = new File(testFileName); HttpURLConnection conn = (HttpURLConnection) new URL(serverURL).openConnection(); // construct params JSONObject params = new JSONObject(); params.put("format", "pcm"); params.put("rate", 8000); params.put("lan", "en"); params.put("channel", "1"); params.put("token", token); params.put("cuid", cuid); params.put("len", pcmFile.length()); params.put("speech", DatatypeConverter.printBase64Binary(loadFile(pcmFile))); // add request header conn.setRequestMethod("POST"); conn.setRequestProperty("Content-Type", "application/json; charset=utf-8"); conn.setDoInput(true); conn.setDoOutput(true); // send request DataOutputStream wr = new DataOutputStream(conn.getOutputStream()); wr.writeBytes(params.toString()); wr.flush(); wr.close(); printResponse(conn); } private static void method2() throws Exception { File pcmFile = new File(testFileName); HttpURLConnection conn = (HttpURLConnection) new URL(serverURL + "?cuid=" + cuid + "&token=" + token).openConnection(); // add request header conn.setRequestMethod("POST"); conn.setRequestProperty("Content-Type", "audio/pcm; rate=8000"); conn.setDoInput(true); conn.setDoOutput(true); // send request DataOutputStream wr = new DataOutputStream(conn.getOutputStream()); wr.write(loadFile(pcmFile)); wr.flush(); wr.close(); printResponse(conn); } private static String printResponse(HttpURLConnection conn) throws Exception { if (conn.getResponseCode() != 200) { // request error return ""; } InputStream is = conn.getInputStream(); BufferedReader rd = new BufferedReader(new InputStreamReader(is)); String line; StringBuffer response = new StringBuffer(); while ((line = rd.readLine()) != null) { response.append(line); response.append('\r'); } rd.close(); System.out.println(new JSONObject(response.toString()).toString(4)); return response.toString(); } private static byte[] loadFile(File file) throws IOException { InputStream is = new FileInputStream(file); long length = file.length(); byte[] bytes = new byte[(int) length]; int offset = 0; int numRead = 0; while (offset < bytes.length && (numRead = is.read(bytes, offset, bytes.length - offset)) >= 0) { offset += numRead; } if (offset < bytes.length) { is.close(); throw new IOException("Could not completely read file " + file.getName()); } is.close(); return bytes; }}
{ "access_token": "24.***a82646fffd31.259**2335-7980222", "refresh_token": "25.18***6f.315360000.1***-7980222", "scope": "public audio_voice_assistant_get wise_adapt lebo_resource_base lightservice_public hetu_basic lightcms_map_poi kaidian_kaidian", "session_key": "***URC38LpHQ+crR5n6hQ***zVZRBK/rpVGeNviJXnmJpFIwpsT97C4xvsD", "session_secret": "***db82f505cba***", "expires_in": 2592000}{ "result": [ "hello how are you, ", "hello oh how are you, ", "hello how are u, ", "halo how are you, ", "hollow how are you, " ], "err_msg": "success.", "sn": "776663153181460775167", "corpus_no": "6273981573367938505", "err_no": 0}{ "result": ["哈喽好玩哟,"], "err_msg": "success.", "sn": "496395116711460775168", "corpus_no": "6273981574058301747", "err_no": 0}
这里LZ输入的语音为“hello,how are you”。从识别结果来看很理想。而且百度语音识别支持三种语言(中文,英语,粤语),这里第一个返回的结果是英文版本,第二个返回的结果是中文版本。
有了麦克风的支持,REST API的调用方式显得可操作性更强了,你可以将这种方式集成到你的应用,实现在线输入,在线搜索等等功能。
从console控制台可以看到该网站共加载了四条指令:hello(there), show me * search, show :type report以及let's get started
这里LZ说出了指令"hello there",产生的效果就是console显示了识别的结果为hello there并与网站加载的指令hello there匹配上了,所以网站会自动跳转到显示hell的部分
这里LZ说的是show me voice search,所以匹配的show me *search指令,并且页面跳转到“show me”的部分
在自己的github angelloExtend中加入annyang的支持,只需要两步:
{ file: '//cdnjs.cloudflare.com/ajax/libs/annyang/2.3.0/annyang.min.js'},
if (annyang) { // Let's define our first command. First the text we expect, and then the function it should call var commands = { 'show bar chart': function() { alert("hahahha~~~~"); myUser.show('bar'); } }; // Add our commands to annyang annyang.addCommands(commands); // Start listening. You can call this here, or attach this call to an event, button, etc. annyang.start(); }
2.基于REST API方式的语音识别,并实现调用麦克风实现在线语音输入和识别的功能
