Analysis of Artificial Intelligence Speech Recognition Technology


Ye Na Lee, Ui Jeng Kang, Eun Jung Choi


Seoul Women’s University

621, Hwarang-ro, Nowon-gu, Seoul, Republic of Korea
Seoul, Republic of Korea,,






As artificial intelligence (AI) evolves, it is being used with many technologies. Language and voice recognition were the starting point of all new machine learning capabilities that have come out in recent years. In this paper, we introduce trend of AI personal assistant service and speech recognition technology and propose future direction of development.


Keywords-component; AI, speech recognization technology


1. Introduction


AI means a program that explores solutions through machine learning through thinking, learning and judging like a human being. [1] AI conversed various area. One of the areas where these achievements were most prominent was AI for speech recognition. According to Gartner, it is expected to grow to $2.1billion by 2020. So, many companies are trying to preoccupy the AI speaker market. In May, at the Google I/O Conference, Google introduced Duplex that can make phone bookings on behalf of users. AI tricked our ears into thinking a robot is human. We introduce trends of AI personal assistant service and speech recognition technology and presents future direction of development.


2. Service using speech recognition


2.1. AI personal assistant

AI personal assistant is software that provides services through voice or text conversation with the user. It can analyze the dialogue with the user and extract the intent of the context to provide the personalized service by processing the information.

goal-oriented spoken dialogue systems have been the most prominent component in today’s AI personal assistants.

Table 1. AI personal assistant



Google Assistant

Apple siri


Amazon Alexa


- continued conversation,

- work with more than 5,000 smart home devices

- 38 Languages

- Delete specific recording

- User profiling for voice input processing

- Understand context

- Vision service


Work with 12,000 smart home devices like Ring video

- Order and manage shopping list

- Delete specific recording

Home IoT

Google Home


Samsung Home IoT

Amazon Echo Show & Spot


The new Google Assistant feature will share a summary of a positive news story when a user prompts it with the simple phrase, “Tell me something good.” Also, Alexa expanded from what it learned about user’s voice so that it could grasp even visual information. Most AI personal assistants have been simple results-displaying programs, but Google and Amazon assistant have been enhanced.


2.2. AI speech recognition technology

 Watson from IBM can improve the accuracy by setting important items such as product name and related topics as keywords and provides Text to Speech function. The Google Speech API identifies and translates text in up to four languages into multiple languages, recognizes and annotates multi-channels. Kaldi is a popular open source that is free of charge on GitHub.


Table 2. AI speech recognition technology



IBM Watson

Google Speech API

Microsoft Bing Speech API

Dialogflow API

CMU Sphnix










- Use selected keyword

- Text to Speech

- Multilingual identification and text conversion of up to 4 languages

- recognition multi-channel

- Using LUIS, extract intent and entities in text

- Text to Speech

- support to wearable, mobile, smart-car, speaker

- Due to low resource requirements can be used on mobile

- GitHub

- Integration with finite state transducers

- Open licnese









Recently, speech recognition technology provides a function to distinguish various noises, add punctuation when converting text, and to divide the subject of each utterance in conversation.


2.3 Security Risk of AI Assistants

AI speakers have many security risks. For example, AI speakers perform commands from unauthorized users or other devices. And If you use AI Assistant on IoT device, you have many Wi-Fi vulnerabilities. The biggest concern is privacy. Usually, many companies, except Google and Alexa, don’t delete voice recording.[5]


3. Conclusion


Recently, AI has developed from descriptive analytics to cognitive analytics. AI assistant technology, like Duplex, becomes similar human being. A problem of Identification of people and machines must be solved. AI assistants easily access and collect our information. This Services may have other vulnerabilities too. As is often the case, whenever a communication advancement like voice recognition starts to go mainstream, criminals looking to take advantage of it aren’t far behind. [2]



This research was supported by the MISP (Ministry of Science, ICT & Future Planning), Korea, under the National program for Excellence in SW (2016-0-00022) supervised by the IITP (Institute for Information & communications Technology Promotion)




[1] G.W Lee, “Geo-Spatial Information System.”, Goomibook, 2016

[2] John Markoff, “As Artificial Intelligence Evolves So Does Its Criminal Potential”, The New York Times, October 2016

[3] Hansen, John & Hasan, Taufiq. “Speaker Recognition by Machines and Humans: A tutorial review”, Signal Processing Magazine, IEEE. 32. 74-99. 10.1109/MSP.2015.2462851, November 2015.

[4] MYERS, Karen, et al. An intelligent personal assistant for task and time management. AI Magazine, 2007

[5] Candid Wueest, “A guide to the security of voice-activated smart speakers”, An ISRT Special Report., Symantec, November 2017.