Plenty of artists have given us their spin on Dolly Parton's "Jolene" over the years, but Holly Herndon's take is without doubt the most intriguing rendition we've come across yet.
On first listen, it might sound like a decent, if somewhat uninspired cover of the 1973 country classic. But the voice you're hearing in the track isn't the sound of electronic musician Holly Herndon's vocal cords - it's the sound of a neural network, imitating her singing voice through the power of machine learning.
The neural network, nicknamed Holly+, was developed by Herndon in collaboration with Never Before Heard Sounds. Described as Herndon's "digital twin", Holly+ is a Voice Model, a "deep neural network" capable of generating raw audio that convincingly mimics an individual voice.
The network is trained on recordings of speech and singing from a target voice, and can be used to synthesize sung phrases using text input, or to accomplish "audio style transfer", where audio from an existing recording can be converted to resemble the target voice. Those curious about the technology's potential are able to submit their own audio to be reproduced in Holly+'s voice through a website set up by Herndon.
Herndon demonstrated Holly+ live in a recent TED talk, which you can watch below. In the video, she outlines the radical possibilities opened up by the technology and explores the ethical and legal questions they present, suggesting that soon, any producer or musician could be able to sing in the voice of their favourite vocalist.
In the live demonstration, beginning at 5:25, vocalist Pher alternates between two microphones, performing a song both in his own voice and that of Holly+, before duetting with both at once.
Writing in a blog post published following the announcement of Holly+ last year, Herndon stated that she's "confident that generating convincing spoken and sung voices will soon become standard practice for artists and other creatives, as presaged by the popularity of celebrity vocal deep fakes already found all over YouTube."
"As our ability to produce more detailed and convincing voice generation evolves, so too will the need for comprehensive, high fidelity vocal training data, as well as the urgency of discerning provenance," she continues. "For this reason I believe that there will be demand for official, high fidelity, vocal models of public figures, so why not experiment with my own?"
A prominent question raised by Herndon's work is how music produced by both authorized digital likenesses such as Holly+, and unauthorized 'deepfake' voice models, would be dealt with by the music industry's existing framework of intellectual property rights.
Herndon proposes that the public be given "open source access" to her voice model, governing the access, decision-making and profits made from her digital twin through a DAO (Decentralized Autonomous Organization), a blockchain-based organization that would collectively act as stewards of the voice model and any works produced using the technology.
"The Holly+ model creates a virtuous cycle," Herndon writes. "I release tools to allow for the creative usage of my likeness, the best artworks and license opportunities are approved by DAO members, and profit from those works will be shared amongst artists using the tools, DAO members, and a treasury to fund further development of the tools."