Briefing: Taiwan government releases voice data to bolster AI training

By Miles Goscha
1 min read

MOST Launches “AI Voice Data Set” to Assist Chinese AI Language Technology – CTIMES

What happened: Taiwan’s Ministry of Science and Technology (MOST) released 400 hours of Chinese-language voice data to the public for use as training material for AI-powered voice applications. According to CTimes, the dataset includes self-recordings, as well as “data related to police and educational broadcasts.” It will be uploaded onto the National Center for High Performance Computing’s (NCHC) Data Market Platform, and is the first of multiple planned releases from MOST’s collection of 2,000 to 3,000 hours of voice data. 

Why it’s important: Voice recognition tech is one of the hottest subcategories of China’s rapidly growing AI industry, as evidenced by the country’s voice recognition unicorn and $9 billion-valued iFlyTek’s recent efforts to raise up to $350 million to invest in AI startups worldwide. And while iFlyTek and China’s other tech giants may hold a sizable share of the $55 billion voice recognition market, MOST’s data release can be particularly helpful to smaller players who lack large straining sets and are looking to improve the quality of their machine-learning processes.