Audio Analysis V6 Classifier
The Cyanite API exposes a variety of classifiers for your music.
BPM
The BPM classifier provides you the BPM of the track.
Key
The Key classifier provides you with the predicted key.
Mood
The mood multi-label classifier provides the following labels:
aggressive
, calm
, chilled
, dark
, energetic
, epic
, happy
, romantic
, sad
, scary
, sexy
, ethereal
, uplifting
Each label has a score ranging from 0-1, where 0 (0%) indicates that the track is unlikely to represent a given mood and 1 (100%) indicates a high probability that the track represents a given mood.
Since the mood of a track might not always be properly described by a single tag, the mood classifier is able to predict multiple moods for a given song instead of only one.
A track could be classified with dark
(Score: 0.9
), while also being classified with aggressive
(Score: 0.8
).
The mood can be retrieved both averaged over the whole track and segment-wise over time with 15s temporal resolution.
In addition to the score, the API also exposes a list that includes the most likely moods, or the term ambiguous
in case of none of the audio not reflecting any of our mood tags properly.
In addition you can access advanced mood which is more detailed taxonomy.
Genre
The genre multi-label classifier provides the following labels:
ambient
, blues
, classical
, electronicDance
, folkCountry
, funkSoul
, jazz
, latin
, metal
, pop
,rapHipHop
, reggae
, rnb
, rock
, singerSongwriter
Each label has a score ranging from 0-1 where 0 (0%) indicates that the track is unlikely to represent a given genre and 1 (100%) indicates a high probability that track represents a given genre.
Since music could break genre borders the genre classifier can predict multiple genres for a given song instead of only predicting one genre.
A track could be classified with rapHipHop
(Score: 0.9
) but also reggae
(Score: 0.8
).
The genre can be retrieved both averaged over the whole track and segment-wise over time with 15s temporal resolution. In addition to the score, the API also exposes a list that includes the most likely genres.
Sub-genre
For some tracks an additional sub-genre can be predicted. Possible sub-genres include:
bluesRock
, folkRock
, hardRock
, indieAlternative
, psychedelicProgressiveRock
, punk
, rockAndRoll
, popSoftRock
, abstractIDMLeftfield
, breakbeatDnB
, deepHouse
, electro
, house
, minimal
, synthPop
, techHouse
, techno
, trance
, contemporaryRnB
, gangsta
, jazzyHipHop
, popRap
, trap
, blackMetal
, deathMetal
, doomMetal
, heavyMetal
, metalcore
, nuMetal
, disco
, funk
, gospel
, neoSoul
, soul
, bigBandSwing
, bebop
, contemporaryJazz
, easyListening
, fusion
, latinJazz
, smoothJazz
, country
, folk
Each label has a score ranging from 0-1 where 0 (0%) indicates that the track is unlikely to represent a given sub-genre and 1 (100%) indicates a high probability that track represents a given sub-genre.
The sub-genre can be retrieved both averaged over the whole track and segment-wise over time with 15s temporal resolution. In addition to the score, the API also exposes a list that includes the most likely sub-genres.
Some tracks don't have any sub-genre. In this case the sub-genre tags is an empty array and averaged segments values are unavailable.
Voice
The voice classifier categorizes the audio as female
or male
singing voice or instrumental
(non-vocal).
Each label has a score ranging from 0-1 where 0 (0%) indicates that the track is unlikely to have the given voice elements and 1 (100%) indicates a high probability that track contains the given voice elements.
The voice classifier results can be retrieved both averaged over the whole track and segment-wise over time with 15s temporal resolution.
Voice Presence Profile
This label describes the amount of singing voice throughout the full duration of the track and may be none
, low
, medium
or high
.
Predominant Voice Gender
This label indicates if the predominant singing voice holds more likely female
or male
characteristics. It may be none
if no singing voice is detected.
Voice Tags
The label provides tags for voice classification. Possible values are female
, male
and instrumental
.
Instruments
The instrument classifier predicts the presence of the following instruments: percussion
, synth
, piano
, acousticGuitar
, electricGuitar
, strings
, bass
, bassGuitar
and brassWoodwinds
.
It is possible to retrieve the presence of each instrument for each track segment, a list of the dominant instruments and a taxonomy that describes the presence of each instrument over the complete track.
The segment instrument score reaches from 0-1 where 0 (0%) indicates that the segment is unlikely to contain a given instrument and 1 (100%) indicates a high probability that the track segment contains a given instrument.
The taxonomy absent
, partially
, frequently
and throughout
describe the presence of each instrument
Taxonomy | Description |
---|---|
absent | Instrument has not been detected |
throughout | Instrument is detected throughout the full duration of the track |
frequently | Instrument is detected in major parts of the track |
partially | Instrument is detected in minor parts of the track. |
AudioAnalysisV6Segments.instruments
AudioAnalysisV6Result.instrumentPresence
AudioAnalysisV6InstrumentPresence
AudioAnalysisV6Result.instrumentTags
AudioAnalysisV6InstrumentTags
AudioAnalysisV6Result.instruments
(deprecated)
Valence / Arousal
The valence / arousal regression model predicts the degree of valence
or arousal
of a track.
Each label has a score ranging from -1 to 1 where -1 indicates the lowest degree (negative valence, negative arousal) and 1 indicates the highest degree (positive valence, positive arousal).
The valence / arousal results can be retrieved both averaged over the whole track and segment-wise over time with 15s temporal resolution.
Energy Level
The Energy Level is a label for the intensity of an analysed track which can be either variable
, medium
, high
, low
.
A low
Energy Level indicates a calm overall appearance of a track while a high
one stands for more strong and powerful characteristics.
A track with a variable
energy level will hold steady changes in its intensity profile.
Energy Dynamics
Energy Dynamics describes the progression of the Energy Level throughout the duration of the music piece, where the value low
represents a stable trend and high
depicts a strong variance between low
and high
energy levels.
The high
value indicates variable
energy level (see above).
Musical Era
The musical era classifier describes the era the audio was likely produced in, or which the sound of production suggests.
Movement
The movement multi-label classifier provides the following labels:
bouncy
, driving
, flowing
, groovy
, nonrhythmic
, pulsing
, robotic
, running
, steady
, stomping
Each label has a score ranging from 0-1, where 0 (0%) indicates that the track is unlikely to represent a given movement and 1 (100%) indicates a high probability that the track represents a given movement.
Since the movement of a track might not always be properly described by a single tag, the movement classifier is able to predict multiple movements for a given song instead of only one.
A track could be classified with bouncy
(Score: 0.9
), while also being classified with robotic
(Score: 0.8
).
The movement can be retrieved both averaged over the whole track and segment-wise over time with 15s temporal resolution.
The movement tags label provides tags for movement classification. Possible values are same as above, but without values.
Character
The character multi-label classifier provides the following labels:
bold
, cool
, epic
, ethereal
, heroic
, luxurious
, magical
, mysterious
, playful
, powerful
, retro
, sophisticated
, sparkling
, sparse
, unpolished
, warm
Each label has a score ranging from 0-1, where 0 (0%) indicates that the track is unlikely to represent a given character and 1 (100%) indicates a high probability that the track represents a given character.
Since the character of a track might not always be properly described by a single tag, the character classifier is able to predict multiple characters for a given song instead of only one.
A track could be classified with cool
(Score: 0.9
), while also being classified with powerful
(Score: 0.8
).
The character can be retrieved both averaged over the whole track and segment-wise over time with 15s temporal resolution.
The character tags label provides tags for character classification. Possible values are same as above, but without values.
Classical Epoch
The classical epoch multi-label classifier provides the following labels:
middleAge
, renaissance
, baroque
, classical
, romantic
, contemporary
The classifier is triggered once the Classical main genre is tagged.
Each label has a score ranging from 0-1, where 0 (0%) indicates that the track is unlikely to represent a given classical epoch and 1 (100%) indicates a high probability that the track represents a given classical epoch.
The classical epoch can be retrieved both averaged over the whole track and segment-wise over time with 15s temporal resolution.
The classical epoch tags label provides tags for classical epoch classification. Possible values are the same as above but without values.
AudioAnalysisV6Result.classicalEpoch
AudioAnalysisV6Result.classicalEpochTags
AudioAnalysisV6Segments.classicalEpoch
Transformer Caption
The transformer caption is a string of max. 30 words describing the track in one or few sentences.