iOS SpeechKit API 解读

2022-11-07人工智能88

因为需要语音转文字，就找了一些工具，发现要么不好用，要么付费，突然想起来 iOS 原生不就支持么，而且 iphone 自带 AI 芯片，就看看 iOS 的 SpeechKit 框架，发现已经高度封装了，使用起来特别方便，而且，最主要的是，这个框架非常轻量。

从 Speech.h 可以看到：

#import
#import
#import
#import
#import
#import
#import
#import
#import

就这么几个文件。

// 一些声学特性，ios 13 / macos 10.15 以后可用
API_AVAILABLE(ios(13), macos(10.15))
@interface SFAcousticFeature : NSObject

// 音频片段中的每一个音频帧的特性值
@property (nonatomic, readonly, copy) NSArray *acousticFeatureValuePerFrame;

// 音频帧时长
@property (nonatomic, readonly) NSTimeInterval frameDuration;

@end

// 与录制的音频片段相对应的语音分析
API_AVAILABLE(ios(13), macos(10.15))
@interface SFVoiceAnalytics : NSObject

// 抖动，用于衡量稳定性，百分比表示
@property (nonatomic, readonly, copy) SFAcousticFeature *jitter;

// Shimmer测量声音的稳定性，以分贝为单位
@property (nonatomic, readonly, copy) SFAcousticFeature *shimmer;

// 音高测量音调的高低，并以标准化音高估计值的对数来测量
@property (nonatomic, readonly, copy) SFAcousticFeature *pitch;

// 发声测量帧是否发声的概率，并作为概率进行测量
@property (nonatomic, readonly, copy) SFAcousticFeature *voicing;

@end

API_AVAILABLE(ios(14.5), macos(11.3))
@interface SFSpeechRecognitionMetadata : NSObject

// 每分钟的讲话速度
@property (nonatomic, readonly) double speakingRate;

// 两个词语之间的空隙时长，单位s
@property (nonatomic, readonly) NSTimeInterval averagePauseDuration;

// 开始的时间戳
@property (nonatomic, readonly) NSTimeInterval speechStartTimestamp;

// 时长
@property (nonatomic, readonly) NSTimeInterval speechDuration;

@property (nonatomic, nullable, readonly) SFVoiceAnalytics *voiceAnalytics;

@end

API_AVAILABLE(ios(10.0), macos(10.15))
@interface SFSpeechRecognitionResult : NSObject

@property (nonatomic, readonly, copy) SFTranscription *bestTranscription;

// 识别结果，按识别率排序
@property (nonatomic, readonly, copy) NSArray *transcriptions;

// 识别是否结束
@property (nonatomic, readonly, getter=isFinal) BOOL final;

@property (nonatomic, nullable, readonly) SFSpeechRecognitionMetadata *speechRecognitionMetadata API_AVAILABLE(ios(14.0), macos(11.0));

@end

// 语音识别请求
API_AVAILABLE(ios(10.0), macos(10.15))
@interface SFSpeechRecognitionRequest : NSObject

// 请求分类
@property (nonatomic) SFSpeechRecognitionTaskHint taskHint;

// If true, partial (non-final) results for each utterance will be reported.

// Default is true
@property (nonatomic) BOOL shouldReportPartialResults;

// Phrases which should be recognized even if they are not in the system vocabulary
@property (nonatomic, copy) NSArray *contextualStrings;

// 是否在本地识别，默认 false
@property (nonatomic) BOOL requiresOnDeviceRecognition API_AVAILABLE(ios(13), macos(10.15));

@end

// 音频文件的识别请求
API_AVAILABLE(ios(10.0), macos(10.15))
@interface SFSpeechURLRecognitionRequest : SFSpeechRecognitionRequest

- (instancetype)init NS_UNAVAILABLE;

// Request to transcribe speech from an audio file from the given URL.

- (instancetype)initWithURL:(NSURL *)URL NS_DESIGNATED_INITIALIZER;

@property (nonatomic, readonly, copy) NSURL *URL;

@end

// 音频buffer的识别请求
API_AVAILABLE(ios(10.0), macos(10.15))
@interface SFSpeechAudioBufferRecognitionRequest : SFSpeechRecognitionRequest

// Preferred audio format for optimal speech recognition
@property (nonatomic, readonly) AVAudioFormat *nativeAudioFormat;

// Append audio to the end of the recognition stream. Must currently be in native format.

- (void)appendAudioPCMBuffer:(AVAudioPCMBuffer *)audioPCMBuffer;
- (void)appendAudioSampleBuffer:(CMSampleBufferRef)sampleBuffer;

// Indicate that the audio source is finished and no more audio will be appended
- (void)endAudio;

@end

语音识别请求的task，比较重要的是 SFSpeechRecognitionTaskDelegate。

只有一个emum, 比较简单。

//  Hints on kind of speech recognition being performed
typedef NS_ENUM(NSInteger, SFSpeechRecognitionTaskHint) {
    SFSpeechRecognitionTaskHintUnspecified = 0,     // Unspecified recognition

    SFSpeechRecognitionTaskHintDictation = 1,       // General dictation/keyboard-style
    SFSpeechRecognitionTaskHintSearch = 2,          // Search-style requests
    SFSpeechRecognitionTaskHintConfirmation = 3,    // Short, confirmation-style requests ("Yes", "No", "Maybe")
} API_AVAILABLE(ios(10.0), macos(10.15));

最主要的类，用于申请识别权限，创建识别任务。

Original: https://blog.csdn.net/woshizshu/article/details/124221468
Author: woshizshu
Title: iOS SpeechKit API 解读

一	二	三	四	五	六	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

iOS SpeechKit API 解读

猿创征文｜时间序列分析算法之平稳时间序列预测算法和自回归模型(AR)详解+Python代码实现

logistic回归模型—基于R

环境混合物总体效应：加权分位数和回归（WQS）

数学建模学习：岭回归和lasso回归

R 计算均方差MSE(mean squared error)

python数据相关性绘图-散点图正态分布图回归图等及鸢尾花数据集可视化（附Python代码）

基于Lasso回归的实证分析（Python实现代码）

目标检测中边框回归的直观理解 bbox regression

通过R语言实现平稳时间序列的建模–基础（ARMA模型）

【sklearn使用】sklearn中调用R2（回归问题评价指标）的3种方式

【项目实战】Python实现GBDT(梯度提升树)回归模型(GradientBoostingRegressor算法)项目实战

机器学习算法系列（四）- 岭回归算法（Ridge Regression Algorithm）

stata基础–回归，画散点图，异质性分析

机器学习之分类回归树（CART）

机器学习基础：用 Lasso 做特征选择

利用lasso回归建立预测模型并绘制列线图二分类结局资料的lasso回归与列线图绘制

计量经济学笔记6-Eviews操作-自相关的检验与消除（DW、LM检验与FGLS、广义差分变换）

Pytorch：全连接神经网络-MLP回归

机器学习实验——回归预测算法

基于MATLAB的随机森林（RF）回归与变量影响程度（重要性）排序

机器学习算法、Python、数据分析、学习资料 & 面试大汇总（免费送）