As artificial intelligence (AI) technology evolves from text-based to multimodal environments, voice AI is gaining renewed attention. Major information technology (IT) companies such as Apple, Naver, and Kakao are enhancing their voice AI technology and expanding its applications beyond smartphones to include vehicles, home appliances, and media platforms.
Multimodal refers to the integration of various AIs, including text, images, and voice, for a more comprehensive and interactive user experience.
According to market research firm Mordor Intelligence on the 5th, the global voice AI market is expected to grow nearly threefold, from 20 trillion KRW in 2024 to 56 trillion KRW by 2029.
Gartner analyzed, "As voice recognition and natural conversation capabilities are integrated, user experience has significantly improved, leading to a sharp increase in demand."
◆ 'Open-source vs Closed-system'... The Battle for Dominance in Voice AI Intensifies
As the voice AI market rapidly grows, the competition for technological dominance is intensifying. Currently, the AI industry is divided into two paradigms.
One is the open-source AI camp. Companies like Meta, Mozilla, Qtai, and Coqui are accelerating the spread of technology by releasing their AI voice technologies as open-source.
Meta is strengthening its open-source technology through "MMS," which can recognize and generate over 4,000 languages. MMS can learn from data without the need for labeled training tags.
The AI research institute "Qtai," often referred to as France's OpenAI, recently unveiled a voice AI called "Moshi." Moshi operates without an internet connection and generates speech in just 0.2 seconds.
In contrast, big tech companies like OpenAI, Google, and Apple continue to maintain closed models, building their own independent ecosystems.
Google launched its voice AI "Gemini 2.0" in December last year, which enhances multimodal capabilities. It is optimized for mobile environments, including smartphones, and offers 10 different voices, allowing users to choose the tone and style.
OpenAI also released the "Voice Mode" for ChatGPT in December, which improved the use of 50 languages, including Korean and Japanese. It features voice speed control and speaker emotion recognition, enabling more natural conversations.
◆ Naver, Kakao, and Other Domestic Companies Also Busy Strengthening Features
Apple also plans to integrate ChatGPT into its voice AI, Siri. Beta testing is underway, and some features are expected to be officially launched in 2025. This integration is expected to provide a more natural and sophisticated voice interface across HomePod, iPhone, and macOS.
An industry insider commented, "While open-source models increase technological accessibility and spread quickly, closed models focus on delivering powerful performance and differentiated features." He added, "Each has its strengths, and the choice will depend on the preferences of companies and consumers."
In particular, China's AI startup Deepseek is emerging as a key player expanding the open-source AI ecosystem. Earlier this year, Deepseek released the AI inference model "R1" along with its voice AI "Deepseek Voice." It can process both text and voice simultaneously and operate without an internet connection, making it highly applicable to various devices such as smartphones, in-car AI systems, and smart homes.
Domestic companies are also moving quickly in the voice AI market.
Naver is enhancing its capabilities by adding new features, such as an information search assistant, to its AI chatbot "Clova X."
Kakao is developing the AI voice assistant "Kanana," which is scheduled for release in the first half of this year. Kanana will be available in two versions: the personal AI "Nana" and the group-chat AI "Kana." Nana will engage in one-on-one conversations and offer personalized responses by remembering group chat content, while Kana will specialize in group chats, providing features such as quiz creation, answer scoring, and summaries.
Additionally, Kakao's "Kakao i" voice assistant is currently integrated with KakaoT, shopping, and banking services.
ChatGPT를 사용하여 번역한 기사입니다.
Copyright ⓒ Metro. All rights reserved. (주)메트로미디어의 모든 기사 또는 컨텐츠에 대한 무단 전재ㆍ복사ㆍ배포를 금합니다.
주식회사 메트로미디어 · 서울특별시 종로구 자하문로17길 18 ㅣ Tel : 02. 721. 9800 / Fax : 02. 730. 2882
문의메일 : webmaster@metroseoul.co.kr ㅣ 대표이사 · 발행인 · 편집인 : 이장규 ㅣ 신문사업 등록번호 : 서울, 가00206
인터넷신문 등록번호 : 서울, 아02546 ㅣ 등록일 : 2013년 3월 20일 ㅣ 제호 : 메트로신문
사업자등록번호 : 242-88-00131 ISSN : 2635-9219 ㅣ 청소년 보호책임자 및 고충처리인 : 안대성