"The Era of Speaking AI"…Voice AI Gaining Attention Again…Why?

An AI-generated image depicting the evolution of artificial intelligence (AI) technology from a text-based environment to a multimodal one.

As artificial intelligence (AI) technology evolves from text-based to multimodal environments, voice AI is gaining renewed attention. Major information technology (IT) companies such as Apple, Naver, and Kakao are enhancing their voice AI technology and expanding its applications beyond smartphones to include vehicles, home appliances, and media platforms.

Multimodal refers to the integration of various AIs, including text, images, and voice, for a more comprehensive and interactive user experience.

According to market research firm Mordor Intelligence on the 5th, the global voice AI market is expected to grow nearly threefold, from 20 trillion KRW in 2024 to 56 trillion KRW by 2029.

Gartner analyzed, "As voice recognition and natural conversation capabilities are integrated, user experience has significantly improved, leading to a sharp increase in demand."

◆ 'Open-source vs Closed-system'... The Battle for Dominance in Voice AI Intensifies

As the voice AI market rapidly grows, the competition for technological dominance is intensifying. Currently, the AI industry is divided into two paradigms.

One is the open-source AI camp. Companies like Meta, Mozilla, Qtai, and Coqui are accelerating the spread of technology by releasing their AI voice technologies as open-source.

Meta is strengthening its open-source technology through "MMS," which can recognize and generate over 4,000 languages. MMS can learn from data without the need for labeled training tags.

The AI research institute "Qtai," often referred to as France's OpenAI, recently unveiled a voice AI called "Moshi." Moshi operates without an internet connection and generates speech in just 0.2 seconds.

In contrast, big tech companies like OpenAI, Google, and Apple continue to maintain closed models, building their own independent ecosystems.

Google launched its voice AI "Gemini 2.0" in December last year, which enhances multimodal capabilities. It is optimized for mobile environments, including smartphones, and offers 10 different voices, allowing users to choose the tone and style.

OpenAI also released the "Voice Mode" for ChatGPT in December, which improved the use of 50 languages, including Korean and Japanese. It features voice speed control and speaker emotion recognition, enabling more natural conversations.

◆ Naver, Kakao, and Other Domestic Companies Also Busy Strengthening Features

Apple also plans to integrate ChatGPT into its voice AI, Siri. Beta testing is underway, and some features are expected to be officially launched in 2025. This integration is expected to provide a more natural and sophisticated voice interface across HomePod, iPhone, and macOS.

An industry insider commented, "While open-source models increase technological accessibility and spread quickly, closed models focus on delivering powerful performance and differentiated features." He added, "Each has its strengths, and the choice will depend on the preferences of companies and consumers."

In particular, China's AI startup Deepseek is emerging as a key player expanding the open-source AI ecosystem. Earlier this year, Deepseek released the AI inference model "R1" along with its voice AI "Deepseek Voice." It can process both text and voice simultaneously and operate without an internet connection, making it highly applicable to various devices such as smartphones, in-car AI systems, and smart homes.

Domestic companies are also moving quickly in the voice AI market.

Naver is enhancing its capabilities by adding new features, such as an information search assistant, to its AI chatbot "Clova X."

Kakao is developing the AI voice assistant "Kanana," which is scheduled for release in the first half of this year. Kanana will be available in two versions: the personal AI "Nana" and the group-chat AI "Kana." Nana will engage in one-on-one conversations and offer personalized responses by remembering group chat content, while Kana will specialize in group chats, providing features such as quiz creation, answer scoring, and summaries.

Additionally, Kakao's "Kakao i" voice assistant is currently integrated with KakaoT, shopping, and banking services.

ChatGPT를 사용하여 번역한 기사입니다.

메트로人

머니

산업

IT·과학

정치＆정책

생활경제

사회

에듀＆JOB

기획연재

오피니언

라이프

플러스

독자서비스

포럼＆컨퍼런스

"The Era of Speaking AI"…Voice AI Gaining Attention Again…Why?

기획코너 > Global Metro