Audio Analysis V6 Classifier

The Cyanite API exposes a variety of classifiers for your music.

BPM

The BPM classifier provides you the BPM of the track.

Key

The Key classifier provides you with the predicted key.

Mood

The mood multi-label classifier provides the following labels:

aggressive, calm, chilled, dark, energetic, epic, happy, romantic, sad, scary, sexy, ethereal, uplifting

Each label has a score ranging from 0-1, where 0 (0%) indicates that the track is unlikely to represent a given mood and 1 (100%) indicates a high probability that the track represents a given mood.

Since the mood of a track might not always be properly described by a single tag, the mood classifier is able to predict multiple moods for a given song instead of only one. A track could be classified with dark (Score: 0.9), while also being classified with aggressive (Score: 0.8).

The mood can be retrieved both averaged over the whole track and segment-wise over time with 15s temporal resolution. In addition to the score, the API also exposes a list that includes the most likely moods, or the term ambiguous in case of none of the audio not reflecting any of our mood tags properly.

In addition you can access advanced mood which is more detailed taxonomy.

Genre

The genre multi-label classifier provides the following labels:

ambient, blues, classical, electronicDance, folkCountry, funkSoul, jazz, latin, metal, pop,rapHipHop, reggae, rnb, rock, singerSongwriter

Each label has a score ranging from 0-1 where 0 (0%) indicates that the track is unlikely to represent a given genre and 1 (100%) indicates a high probability that track represents a given genre.

Since music could break genre borders the genre classifier can predict multiple genres for a given song instead of only predicting one genre. A track could be classified with rapHipHop (Score: 0.9) but also reggae (Score: 0.8).

The genre can be retrieved both averaged over the whole track and segment-wise over time with 15s temporal resolution. In addition to the score, the API also exposes a list that includes the most likely genres.

Sub-genre

For some tracks an additional sub-genre can be predicted. Possible sub-genres include:

bluesRock, folkRock, hardRock, indieAlternative, psychedelicProgressiveRock, punk, rockAndRoll, popSoftRock, abstractIDMLeftfield, breakbeatDnB, deepHouse, electro, house, minimal, synthPop, techHouse, techno, trance, contemporaryRnB, gangsta, jazzyHipHop, popRap, trap, blackMetal, deathMetal, doomMetal, heavyMetal, metalcore, nuMetal, disco, funk, gospel, neoSoul, soul, bigBandSwing, bebop, contemporaryJazz, easyListening, fusion, latinJazz, smoothJazz, country, folk

Each label has a score ranging from 0-1 where 0 (0%) indicates that the track is unlikely to represent a given sub-genre and 1 (100%) indicates a high probability that track represents a given sub-genre.

The sub-genre can be retrieved both averaged over the whole track and segment-wise over time with 15s temporal resolution. In addition to the score, the API also exposes a list that includes the most likely sub-genres.

caution

Some tracks don't have any sub-genre. In this case the sub-genre tags is an empty array and averaged segments values are unavailable.

Voice

The voice classifier categorizes the audio as female or male singing voice or instrumental (non-vocal).

Each label has a score ranging from 0-1 where 0 (0%) indicates that the track is unlikely to have the given voice elements and 1 (100%) indicates a high probability that track contains the given voice elements.

The voice classifier results can be retrieved both averaged over the whole track and segment-wise over time with 15s temporal resolution.

Voice Presence Profile

This label describes the amount of singing voice throughout the full duration of the track and may be none, low, medium or high.

AudioAnalysisV6Segments.voicePresenceProfile

Predominant Voice Gender

This label indicates if the predominant singing voice holds more likely female or male characteristics. It may be none if no singing voice is detected.

AudioAnalysisV6Segments.predominantVoiceGender

Voice Tags

The label provides tags for voice classification. Possible values are female, male and instrumental.

AudioAnalysisV6Segments.voiceTags

Instruments

The instrument classifier predicts the presence of the following instruments: percussion, synth, piano, acousticGuitar, electricGuitar, strings, bass, bassGuitar and brassWoodwinds.

It is possible to retrieve the presence of each instrument for each track segment, a list of the dominant instruments and a taxonomy that describes the presence of each instrument over the complete track.

The segment instrument score reaches from 0-1 where 0 (0%) indicates that the segment is unlikely to contain a given instrument and 1 (100%) indicates a high probability that the track segment contains a given instrument.

The taxonomy absent, partially, frequently and throughout describe the presence of each instrument

Taxonomy	Description
`absent`	Instrument has not been detected
`throughout`	Instrument is detected throughout the full duration of the track
`frequently`	Instrument is detected in major parts of the track
`partially`	Instrument is detected in minor parts of the track.

Valence / Arousal

The valence / arousal regression model predicts the degree of valence or arousal of a track.

Each label has a score ranging from -1 to 1 where -1 indicates the lowest degree (negative valence, negative arousal) and 1 indicates the highest degree (positive valence, positive arousal).

The valence / arousal results can be retrieved both averaged over the whole track and segment-wise over time with 15s temporal resolution.

Energy Level

The Energy Level is a label for the intensity of an analysed track which can be either variable, medium, high, low.

A low Energy Level indicates a calm overall appearance of a track while a high one stands for more strong and powerful characteristics. A track with a variable energy level will hold steady changes in its intensity profile.

AudioAnalysisV6Result.energyLevel

Energy Dynamics

Energy Dynamics describes the progression of the Energy Level throughout the duration of the music piece, where the value low represents a stable trend and high depicts a strong variance between low and high energy levels. The high value indicates variable energy level (see above).

AudioAnalysisV6Result.energyDynamics

Musical Era

The musical era classifier describes the era the audio was likely produced in, or which the sound of production suggests.

AudioAnalysisV6Result.musicalEraTag

Movement

The movement multi-label classifier provides the following labels:

bouncy, driving, flowing, groovy, nonrhythmic, pulsing, robotic, running, steady, stomping

Each label has a score ranging from 0-1, where 0 (0%) indicates that the track is unlikely to represent a given movement and 1 (100%) indicates a high probability that the track represents a given movement.

Since the movement of a track might not always be properly described by a single tag, the movement classifier is able to predict multiple movements for a given song instead of only one. A track could be classified with bouncy (Score: 0.9), while also being classified with robotic (Score: 0.8).

The movement can be retrieved both averaged over the whole track and segment-wise over time with 15s temporal resolution.

The movement tags label provides tags for movement classification. Possible values are same as above, but without values.

Character

The character multi-label classifier provides the following labels:

bold, cool, epic, ethereal, heroic, luxurious, magical, mysterious, playful, powerful, retro, sophisticated, sparkling, sparse, unpolished, warm

Each label has a score ranging from 0-1, where 0 (0%) indicates that the track is unlikely to represent a given character and 1 (100%) indicates a high probability that the track represents a given character.

Since the character of a track might not always be properly described by a single tag, the character classifier is able to predict multiple characters for a given song instead of only one. A track could be classified with cool (Score: 0.9), while also being classified with powerful (Score: 0.8).

The character can be retrieved both averaged over the whole track and segment-wise over time with 15s temporal resolution.

The character tags label provides tags for character classification. Possible values are same as above, but without values.

Classical Epoch

The classical epoch multi-label classifier provides the following labels:

middleAge, renaissance, baroque, classical, romantic, contemporary

The classifier is triggered once the Classical main genre is tagged.

Each label has a score ranging from 0-1, where 0 (0%) indicates that the track is unlikely to represent a given classical epoch and 1 (100%) indicates a high probability that the track represents a given classical epoch.

The classical epoch can be retrieved both averaged over the whole track and segment-wise over time with 15s temporal resolution.

The classical epoch tags label provides tags for classical epoch classification. Possible values are the same as above but without values.

Transformer Caption

The transformer caption is a string of max. 30 words describing the track in one or few sentences.

AudioAnalysisV6Result.transformerCaption

BPM​

Key​

Mood​

Genre​

Sub-genre​

Voice​

Voice Presence Profile​

Predominant Voice Gender​

Voice Tags​

Instruments​

Valence / Arousal​

Energy Level​

Energy Dynamics​

Musical Era​

Movement​

Character​

Classical Epoch​

Transformer Caption​

BPM

Key