From 23a56bd50b04211da3cab45f72c3390711b2416b Mon Sep 17 00:00:00 2001 From: Mitja Felicijan Date: Wed, 12 Jul 2023 18:35:08 +0200 Subject: Moved notes and posts into subfolders --- ...22-07-05-what-would-dna-sound-if-synthesized.md | 364 +++++++++++++++++++++ 1 file changed, 364 insertions(+) create mode 100644 content/posts/2022-07-05-what-would-dna-sound-if-synthesized.md (limited to 'content/posts/2022-07-05-what-would-dna-sound-if-synthesized.md') diff --git a/content/posts/2022-07-05-what-would-dna-sound-if-synthesized.md b/content/posts/2022-07-05-what-would-dna-sound-if-synthesized.md new file mode 100644 index 0000000..968341c --- /dev/null +++ b/content/posts/2022-07-05-what-would-dna-sound-if-synthesized.md @@ -0,0 +1,364 @@ +--- +title: What would DNA sound if synthesized to an audio file +url: what-would-dna-sound-if-synthesized.html +date: 2022-07-05T12:00:00+02:00 +type: post +draft: false +--- + +## Introduction + +Lately, I have been thinking a lot about the nature of life, what are the +foundation blocks of life and things like that. It's remarkable how complex and +on the other hand simple the creation is when you look at it. The miracle of +life keeps us grounded when our imagination goes wild. If the DNA are the blocks +of life, you could consider them to be an API nature provided us to better +understand all of this chaos masquerading as order. + +I have been reading a lot about superintelligence and our somehow misguided path +to create general artificial intelligence. What would the building blocks or our +creation look like? Is the compression really the ultimate storage of +information? Will our creation also ponder this questions when creating new +worlds for themselves, or will we just disappear into the vastness of +possibilities? It is a little offensive that we are playing God whilst being +completely ignorant of our own reality. Who knows! Like many other +breakthroughs, this one will also come at a cost not known to us when it finally +happens. + +To keep things a bit lighter, I decided to convert some popular DNA sequences +into an audio files for us to listen to. I am not the first one, nor I will be +the last one to do this. But it is an interesting exercise in better +understanding the relationship between art and science. Maybe listening to DNA +instead of parsing it will find a way into better understanding, or at least +enjoying the creation and cryptic nature of life. + +## DNA encoding and primer example + +I have been exploring DNA in the past in my post from about 3 years ago in +[Encoding binary data into DNA +sequence](/encoding-binary-data-into-dna-sequence.html) where I have been +converting all sorts of data into DNA sequences. + +This will be a similar exercise but instead of converting to DNA, I will be +generating tones from Nucleotides. + +| Nucleotides | Note | Frequency | +| ---------------- | ---- | --------- | +| **A** (Adenine) | A | 440 Hz | +| **C** (Cytosine) | C | 783.99 Hz | +| **G** (Guanine) | G | 523.25 Hz | +| **T** (Thymine) | D | 587.33 Hz | + +Since we do not have T in equal-tempered scale, I choose D to represent T note. + +You can check [Frequencies for equal-tempered scale, A4 = 440 +Hz](https://pages.mtu.edu/~suits/notefreqs.html). For this tuning, we also +choose `Speed of Sound = 345 m/s = 1130 ft/s = 770 miles/hr`. + +Now that we have this out of the way, we can also brush up on the DNA sequencing +a bit. This is a famous quote I also used for the encoding tests, and it goes +like this. + +> How wonderful that we have met with a paradox. Now we have some hope of +> making progress. +> ― Niels Bohr + +```shell +>SEQ1 +GACAGCTTGTGTACAAGTGTGCTTGCTCGCGAGCGGGTACGCGCGTGGGCTAACAAGTGA +GCCAGCAGGTGAACAAGTGTGCGGACAAGCCAGCAGGTGCGCGGACAAGCTGGCGGGTGA +ACAAGTGTGCCGGTGAGCCAACAAGCAGACAAGTAAGCAGGTACGCAGGCGAGCTTGTCA +ACTCACAAGATCGCTTGTGTACAAGTGTGCGGACAAGCCAGCAGGTGCGCGGACAAGTAT +GCTTGCTGGCGGACAAGCCAGCTTGTAAGCGGACAAGCTTGCGCACAAGCTGGCAGGCCT +GCCGGCTCGCGTACAAATTCACAAGTAAGTACGCTTGCGTGTACGCGGGTATGTATACTC +AACCTCACCAAACGGGACAAGATCGCCGGCGGGCTAGTATACAAGAACGCTTGCCAGTAC +AACC +``` + +This is what we gonna work with to get things rolling forward, when creating +parser and waveform generator. + +## Parsing DNA data + +This step is rather simple one. All we need to do is parse input DNA sequence in +[FASTA format](https://en.wikipedia.org/wiki/FASTA_format) well known in +[Bioinformatics](https://en.wikipedia.org/wiki/Bioinformatics) to extract single +Nucleotides that will be converted into separate tones based on equal-tempered +scale explained above. + +```python +nucleotide_tone_map = { + 'A': 440, + 'C': 523.25, + 'G': 783.99, + 'T': 587.33, # converted to D +} + +def split(word): + return [char for char in word] + +def generate_from_dna_sequence(sequence): + for nucleotide in split(sequence): + print(nucleotide, nucleotide_tone_map[nucleotide]) +``` + +## Generating sine wave + +Because we are essentially creating a long stream of notes we will be appending +sine notes to a global array we will later use for creating a WAV file out of +it. + +```python +import math + +def append_sinewave(freq=440.0, duration_milliseconds=500, volume=1.0): + global audio + + num_samples = duration_milliseconds * (sample_rate / 1000.0) + + for x in range(int(num_samples)): + audio.append(volume * math.sin(2 * math.pi * freq * (x / sample_rate))) + + return +``` + +The sine wave generated here is the standard beep. If you want something more +aggressive, you could try a square or saw tooth waveform. + +## Generating a WAV file from accumulated sine waves + + +```python +import wave +import struct + +def save_wav(file_name): + wav_file = wave.open(file_name, 'w') + nchannels = 1 + sampwidth = 2 + + nframes = len(audio) + comptype = 'NONE' + compname = 'not compressed' + wav_file.setparams((nchannels, sampwidth, sample_rate, nframes, comptype, compname)) + + for sample in audio: + wav_file.writeframes(struct.pack('h', int(sample * 32767.0))) + + wav_file.close() +``` + +44100 is the industry standard sample rate - CD quality. If you need to save on +file size, you can adjust it downwards. The standard for low quality is, 8000 or +8kHz. + +WAV files here are using short, 16 bit, signed integers for the sample size. +So, we multiply the floating-point data we have by 32767, the maximum value for +a short integer. + +> It is theoretically possible to use the floating point -1.0 to 1.0 data +> directly in a WAV file, but not obvious how to do that using the wave module +> in Python. + +## Generating Spectograms + +I have tried two methods of doing this and both were just fine. I however opted +out to use the [SoX - Sound eXchange, the Swiss Army knife of audio +manipulation](https://linux.die.net/man/1/sox) one because it didn't require +anything else. + +```shell +sox output.wav -n spectrogram -o spectrogram.png +``` + +An example spectrogram of Ludwig van Beethoven Symphony No. 6 First movement. + + + +![Ludwig van Beethoven Symphony No. 6 First movement](/assets/dna-synthesized/symphony-no6-1st-movement.png) + +The other option could also be in combination with +[gnuplot](http://www.gnuplot.info/). This would require an intermediary step, +however. + +```shell +sox output.wav audio.dat +tail -n+3 audio.dat > audio_only.dat +gnuplot audio.gpi +``` + +And input file `audio.gpi` that would be passed to gnuplot looks something like +this. + +``` +# set output format and size +set term png size 1000,280 + +# set output file +set output "audio.png" + +# set y range +set yr [-1:1] + +# we want just the data +unset key +unset tics +unset border +set lmargin 0 +set rmargin 0 +set tmargin 0 +set bmargin 0 + +# draw rectangle to change background color +set obj 1 rectangle behind from screen 0,0 to screen 1,1 +set obj 1 fillstyle solid 1.0 fillcolor rgbcolor "#ffffff" + +# draw data with foreground color +plot "audio_only.dat" with lines lt rgb 'red' +``` + +## Pre-generated sequences + +What I did was take interesting parts from an animal's genome and feed it to a +tone generator script. This then generated a WAV file and I converted those to +MP3, so they can be played in a browser. The last step was creating a +spectrogram based on a WAV file. + +### Niels Bohr quote + + + +![Spectogram](/assets/dna-synthesized/quote/spectogram.png) + +### Mouse + +This is part of a mouse genome `Mus_musculus.GRCm39.dna.nonchromosomal`. You +can get [genom data +here](http://ftp.ensembl.org/pub/release-106/fasta/mus_musculus/dna/). + + + +![Spectogram](/assets/dna-synthesized/mouse/spectogram.png) + +### Bison + +This is part of a bison genome `Bison_bison_bison.Bison_UMD1.0.cdna`. You can +get [genom data +here](http://ftp.ensembl.org/pub/release-106/fasta/bison_bison_bison/cdna/). + + + +![Spectogram](/assets/dna-synthesized/bison/spectogram.png) + +### Taurus + +This is part of a taurus genome `Bos_taurus.ARS-UCD1.2.cdna`. You can get +[genom data +here](http://ftp.ensembl.org/pub/release-106/fasta/bos_taurus/cdna/). + + + +![Spectogram](/assets/dna-synthesized/taurus/spectogram.png) + +## Making a drummer out of a DNA sequence + +To make things even more interesting, I decided to send this data via MIDI to my +[Elektron Model:Samples](https://www.elektron.se/en/model-samples). This is a +really cool piece of equipment that supports MIDI in via USB and 3.5 mm audio +jack. + +Elektron is connected to my MacBook via USB cable and audio out is patched to a +Sony Bluetooth speaker I have that supports 3.5 mm audio in. Elektron doesn't +have internal speakers. + +![](/assets/dna-synthesized/elektron/IMG_0619.jpg) + +![](/assets/dna-synthesized/elektron/IMG_0620.jpg) + +![](/assets/dna-synthesized/elektron/IMG_0622.jpg) + +For communicating with Elektron, I choose `pygame` Python module that has MIDI +built in. With this, it was rather simple to send notes to the device. All I did +was map MIDI notes to the actual Nucleotides. + +Before all of this I also checked Audio MIDI Setup app under MacOS and checked +MIDI Studio by pressing ⌘-2. + +![](/assets/dna-synthesized/elektron/midi-studio.jpg) + +The whole script that parses and send notes to the Elektron looks like this. + +```python +import pygame.midi +import time + +pygame.midi.init() + +print(pygame.midi.get_default_output_id()) +print(pygame.midi.get_device_info(0)) + +player = pygame.midi.Output(1) +player.set_instrument(2) + +def send_note(note, velocity): + global player + player.note_on(note, velocity) + time.sleep(0.3) + player.note_off(note, velocity) + + +nucleotide_midi_map = { + 'A': 60, + 'C': 90, + 'G': 160, + 'T': 180, # is D +} + +with open("quote.fa") as f: + sequence = f.read().replace('\n', '') + +for nucleotide in [char for char in sequence]: + print("Playing nucleotide {} with MIDI note {}".format( + nucleotide, nucleotide_midi_map[nucleotide])) + send_note(nucleotide_midi_map[nucleotide], 127) + +del player +pygame.midi.quit() +``` + + + +All of this could be made much more interesting if I choose different +instruments for different Nucleotides, or doing more funky stuff with Elektron. +But for now, this should be enough. It is just a proof of concept. Something to +play around with. + +## Going even further + +As you probably notice, the end results are quite similar to each other. This is +to be expected because we are operating only with 4 notes essentially. What +could make this more interesting is using something like +[Supercollider](https://supercollider.github.io/) to create more interesting +sounds. By transposing notes or using effects based on repeated data in a +sequence. Possibilities are endless. + +It is really astonishing what can be achieved with a little bit of code and an +idea. I could see this becoming an interesting background soundscape instrument +if done properly. It could replace random note generator with something more +intriguing, biological, natural. + +I actually find the results fascinating. I took some time and listened to this +music of nature. Even though it's quite the same, it's also quite different. +The subtle differences on repeat kind of creates music on its own. Makes you +wonder. It kind of puts Occam’s Razor in its place. Nature for sure loves to +make things as energy efficient as possible. -- cgit v1.2.3