Briefing: Taiwan government releases voice data to bolster AI training
Jun 27, 2019
What happened: Taiwan’s Ministry of Science and Technology (MOST) released 400 hours of Chinese-language voice data to the public for use as training material for AI-powered voice applications. According to CTimes, the dataset includes self-recordings, as well as “data related to police and educational broadcasts.” It will be uploaded onto the National Center for High Performance Computing’s (NCHC) Data Market Platform, and is the first of multiple planned releases from MOST’s collection of 2,000 to 3,000 hours of voice data.
Why it’s important: Voice recognition tech is one of the hottest subcategories of China’s rapidly growing AI industry, as evidenced by the country’s voice recognition unicorn and $9 billion-valued iFlyTek’s recent efforts to raise up to $350 million to invest in AI startups worldwide. And while iFlyTek and China’s other tech giants may hold a sizable share of the $55 billion voice recognition market, MOST’s data release can be particularly helpful to smaller players who lack large straining sets and are looking to improve the quality of their machine-learning processes.