Cross-cultural and Rhizome-like Music Information Retrieval and Classification System

Jung-Hua Liu

To create a
big-data, rhizome-like, self-learning auto-correction classification system for Taiwanese, Mandarin, and Taiwanese aboriginal music.

The interaction of music in various languages in Taiwan has caused a complicated hybrid of music that should be classified using the rhizome concept
with big data rather than a hierarchical classification system.

Taiwan is a multicultural society that has developed a hybrid of music from different languages, including Mandarin, Taiwanese, Hakka, and more than 10 aboriginal languages.

Some pop singers in Taiwan can sing in two languages. For example, May Day is a famous Taiwanese band that sings both Taiwanese and Mandarin songs. Worth noting, May Dayfs Taiwanese songs are different from traditional or popular Taiwanese music because their melody and lyrics are based on Mandarin pop. This hybrid of music offers an important opportunity to create a cross-cultural music information retrieval system that not only will involve technical music but also will relate to linguistics and anthropology.


Literature Review

a.      Languages

The heterogeneity in Taiwan music provides an opportunity to rethink whether existing music taxonomy, such as R&B and jazz, presents the features of the songs. Benjamin Fields (2011) combined content-based information retrieval, graph networks, and social networks to explore how best to generate contextualized playlists for users. His study focused on personal music taxonomy rather than universal music taxonomy. Similar to Fieldsf research, this proposal intends to create a classification system which can contextualize music data rather than organize the data into fixed categories.

Music can be considered an invisible movie or an acoustic storybook that transmits complex emotions in its skills, parts, and frames. For example, traditional Taiwanese songs emphasize gguikau (
Ÿ†Œϋ, the style of speaking)h to strengthen the expression of meaning. If a Taiwanese song cannot provide singers with this feature or the singers do not express it, the song would be considered a gnewh or gstrangeh Taiwanese song. This feature is similar to Japanesefs Enka and to out-of-the-ordinary music analysis units, such as timbre, melody, harmony, and rhythm. In addition to the features in a songfs genre, language is an important category in music taxonomy. However, separating songs into language levels may hinder the study of cross-language commonalities in songs. Nicolas Scaringella et al. (2006) have mentioned the dilemma of classification:

...[T]his semantic confusion within a single taxonomy can lead to redundancies that may not be confusing for human users but that may hardly be solved by automatic systems. Furthermore, genre taxonomies may be dependent on cultural references. For example, a song by the French singer Charles Aznavour would be considered variety in France but would be filed as world music in the United Kingdom (2006:134).

b.      Instruments

In addition to vocals, most songs are accompanied by instruments which may play a more important role beyond making sound, especially for ethnomusicologists and anthropologists. For example, the sound of a nose flute in Taiwanfs aboriginal Paiwan can invoke emotions and memories for both the audience and the player, even if the player does not finish the song. This situation is similar to the poem by Bai Juyi in the Tang Dynasty, which features gemotion before melody –’¬‹Θ’²ζ—Lξ.h This relationship between timbre and instruments is strong and it should be classified carefully to prevent losing this information.


c. Taxonomy, and structured and document-oriented data
Traditional music classification
is hierarchical, such as pop music in Mandarin or Taiwanese categories. For romantic pop music, an emotion tag is used to label the song (Figure 1). Figure 2 shows the simple and clear classification of music genres in Applefs iTunes store. However, this taxonomy provides only a simplified music world, so more inter-relational data is needed to help explore and retrieve music information. For example, Zhijung Zhao et al. (2010) have explored how to detect emotions in Chinese and Western music based on timbre, rhythm, and pitch. They used four mood categories (anxiousness, depression, contentment, and exuberance) to classify the music in their auto-detection system. Although their mood classification system is simplified, traditional structured classification cannot be used to integrate their research into the existing system.

Figure 1
Figure 2


Data from ethnomusicology has exposed shortcomings in existing music taxonomy. Linton C. Freeman and Alan P. Merriam
(1956) considered gthe individuality of esthetic expression as it is shaped by the customs of a particular group is more sharply established.h They computed frequencies of use of major seconds, minor thirds, and total intervals in 20 Trinidad Rada and Brazilian Ketu songs. After weighting various measurements via a lambda score, the statics displayed the differences between songs in these two areas. They then attempted to solve the classification challenge through statistics rather than distinguishing songs via classification terms. Their approach not only reflected the debater of classification in cultural anthropology but also served as a reminder that creating a hierarchical classification system may be useful only in a specific field, such as the commercial music market.

In contrast, document-oriented data contains more variations. For example, Samingadfs (
‹IϊŒN) Wandering (—¬˜Q‹L) is a Mandarin song with a Taiwan aboriginal rhythm and melody. In a document-oriented database, this song could be stored in the following categories:

Entity 1. Samingad-> Singer, Wandering, Mandarin, Puyuma (”Ϊ“μ), Taiwan Aboriginal, Nostalgia.
Entity 2. Wandering-> Song, Mandarin, Samingad, Puyuma, Nostalgia.
Entity 3. Nostalgia-> Samingad, Song, Puyuma, Mandarin, Alone.
Entity 4. Lonesome-> Wandering, Mandarin, Nostalgia.

From the examples above, it can be seen that document-oriented taxonomy is flexible and it is similar to Gilles Deleuze and Félix Guattarifs rhizome concept. They adopted this term to describe non-hierarchical data and knowledge and to emphasize multiplicity, as in music having various aspects.


This system will handle different data sets that constitute big data, so it will be composed of multiple tools and algorithms:
A. Content-Based Information: Mel-scale Frequency Cepstral Coefficients and Matlab.

B. Language Detection: Praat ( and sound recognition (

C. Document-Oriented Data:

a. Semantics: Apache Lucent project (

b. Searching: Elastic Search (

c. Database: Redis ( and MySQL (

d. Programming Languages: JAVA, PHP, and Python.
e. Distributed Computing: Hadoop (

When adopting melody or rhythm to detect music, we may find that some music has been mesh
ed up with another song. For instance, is an improvisation tool that mixes different songs to produce a new one.


Commercial: Music store search engine.

Academic: Data warehouse and corpus.

Related Project

a.       Structural Analysis of Large Amounts of Music Information (

b.      Harvesting Speech Datasets for Linguistic Research on the Web (

c.       Mining a Year of Speech: a Digging into Data challenge (

Deleuze, Gilles and Félix Guattari.
(1980). A Thousand Plateaus. Brian Massumi (trans.). London and New York: Continuum.

Fields, Benjamin. (2011). Contextualize your listening: The playlist as recommendation engine. Diss. Goldsmiths, University of London, 2011. Print. 

Freeman, Linton C. and Alan P. Merriam. (1956). Statistical classification in anthropology: An application to ethnomusicology. American Anthropologist, New Series, 58(3) (June 1956):464-472.

Scaringella, Nicolas, Giorgio Zoia, and Daniel J. Mlynek. (2006). Automatic genre classification of music content: A survey. Signal Processing Magazine, IEEE, 23(2).

Zhao, Zhijun, Lingyun Xie, Jing Liu, and Wen Wu. (2010). The analysis of mood taxonomy comparison between Chinese and Western music. Signal Processing Systems (ICSPS), 2010, 2nd International Conference, 1:606-610.