diff options
Diffstat (limited to '_posts/2022-07-05-what-would-dna-sound-if-synthesized.md')
| -rw-r--r-- | _posts/2022-07-05-what-would-dna-sound-if-synthesized.md | 365 |
1 files changed, 365 insertions, 0 deletions
diff --git a/_posts/2022-07-05-what-would-dna-sound-if-synthesized.md b/_posts/2022-07-05-what-would-dna-sound-if-synthesized.md new file mode 100644 index 0000000..7aaac68 --- /dev/null +++ b/_posts/2022-07-05-what-would-dna-sound-if-synthesized.md | |||
| @@ -0,0 +1,365 @@ | |||
| 1 | --- | ||
| 2 | title: What would DNA sound if synthesized to an audio file | ||
| 3 | permalink: /what-would-dna-sound-if-synthesized.html | ||
| 4 | date: 2022-07-05T12:00:00+02:00 | ||
| 5 | layout: post | ||
| 6 | type: post | ||
| 7 | draft: false | ||
| 8 | --- | ||
| 9 | |||
| 10 | ## Introduction | ||
| 11 | |||
| 12 | Lately, I have been thinking a lot about the nature of life, what are the | ||
| 13 | foundation blocks of life and things like that. It's remarkable how complex and | ||
| 14 | on the other hand simple the creation is when you look at it. The miracle of | ||
| 15 | life keeps us grounded when our imagination goes wild. If the DNA are the blocks | ||
| 16 | of life, you could consider them to be an API nature provided us to better | ||
| 17 | understand all of this chaos masquerading as order. | ||
| 18 | |||
| 19 | I have been reading a lot about superintelligence and our somehow misguided path | ||
| 20 | to create general artificial intelligence. What would the building blocks or our | ||
| 21 | creation look like? Is the compression really the ultimate storage of | ||
| 22 | information? Will our creation also ponder this questions when creating new | ||
| 23 | worlds for themselves, or will we just disappear into the vastness of | ||
| 24 | possibilities? It is a little offensive that we are playing God whilst being | ||
| 25 | completely ignorant of our own reality. Who knows! Like many other | ||
| 26 | breakthroughs, this one will also come at a cost not known to us when it finally | ||
| 27 | happens. | ||
| 28 | |||
| 29 | To keep things a bit lighter, I decided to convert some popular DNA sequences | ||
| 30 | into an audio files for us to listen to. I am not the first one, nor I will be | ||
| 31 | the last one to do this. But it is an interesting exercise in better | ||
| 32 | understanding the relationship between art and science. Maybe listening to DNA | ||
| 33 | instead of parsing it will find a way into better understanding, or at least | ||
| 34 | enjoying the creation and cryptic nature of life. | ||
| 35 | |||
| 36 | ## DNA encoding and primer example | ||
| 37 | |||
| 38 | I have been exploring DNA in the past in my post from about 3 years ago in | ||
| 39 | [Encoding binary data into DNA | ||
| 40 | sequence](/encoding-binary-data-into-dna-sequence.html) where I have been | ||
| 41 | converting all sorts of data into DNA sequences. | ||
| 42 | |||
| 43 | This will be a similar exercise but instead of converting to DNA, I will be | ||
| 44 | generating tones from Nucleotides. | ||
| 45 | |||
| 46 | | Nucleotides | Note | Frequency | | ||
| 47 | | ---------------- | ---- | --------- | | ||
| 48 | | **A** (Adenine) | A | 440 Hz | | ||
| 49 | | **C** (Cytosine) | C | 783.99 Hz | | ||
| 50 | | **G** (Guanine) | G | 523.25 Hz | | ||
| 51 | | **T** (Thymine) | D | 587.33 Hz | | ||
| 52 | |||
| 53 | Since we do not have T in equal-tempered scale, I choose D to represent T note. | ||
| 54 | |||
| 55 | You can check [Frequencies for equal-tempered scale, A4 = 440 | ||
| 56 | Hz](https://pages.mtu.edu/~suits/notefreqs.html). For this tuning, we also | ||
| 57 | choose `Speed of Sound = 345 m/s = 1130 ft/s = 770 miles/hr`. | ||
| 58 | |||
| 59 | Now that we have this out of the way, we can also brush up on the DNA sequencing | ||
| 60 | a bit. This is a famous quote I also used for the encoding tests, and it goes | ||
| 61 | like this. | ||
| 62 | |||
| 63 | > How wonderful that we have met with a paradox. Now we have some hope of | ||
| 64 | > making progress. | ||
| 65 | > ― Niels Bohr | ||
| 66 | |||
| 67 | ```shell | ||
| 68 | >SEQ1 | ||
| 69 | GACAGCTTGTGTACAAGTGTGCTTGCTCGCGAGCGGGTACGCGCGTGGGCTAACAAGTGA | ||
| 70 | GCCAGCAGGTGAACAAGTGTGCGGACAAGCCAGCAGGTGCGCGGACAAGCTGGCGGGTGA | ||
| 71 | ACAAGTGTGCCGGTGAGCCAACAAGCAGACAAGTAAGCAGGTACGCAGGCGAGCTTGTCA | ||
| 72 | ACTCACAAGATCGCTTGTGTACAAGTGTGCGGACAAGCCAGCAGGTGCGCGGACAAGTAT | ||
| 73 | GCTTGCTGGCGGACAAGCCAGCTTGTAAGCGGACAAGCTTGCGCACAAGCTGGCAGGCCT | ||
| 74 | GCCGGCTCGCGTACAAATTCACAAGTAAGTACGCTTGCGTGTACGCGGGTATGTATACTC | ||
| 75 | AACCTCACCAAACGGGACAAGATCGCCGGCGGGCTAGTATACAAGAACGCTTGCCAGTAC | ||
| 76 | AACC | ||
| 77 | ``` | ||
| 78 | |||
| 79 | This is what we gonna work with to get things rolling forward, when creating | ||
| 80 | parser and waveform generator. | ||
| 81 | |||
| 82 | ## Parsing DNA data | ||
| 83 | |||
| 84 | This step is rather simple one. All we need to do is parse input DNA sequence in | ||
| 85 | [FASTA format](https://en.wikipedia.org/wiki/FASTA_format) well known in | ||
| 86 | [Bioinformatics](https://en.wikipedia.org/wiki/Bioinformatics) to extract single | ||
| 87 | Nucleotides that will be converted into separate tones based on equal-tempered | ||
| 88 | scale explained above. | ||
| 89 | |||
| 90 | ```python | ||
| 91 | nucleotide_tone_map = { | ||
| 92 | 'A': 440, | ||
| 93 | 'C': 523.25, | ||
| 94 | 'G': 783.99, | ||
| 95 | 'T': 587.33, # converted to D | ||
| 96 | } | ||
| 97 | |||
| 98 | def split(word): | ||
| 99 | return [char for char in word] | ||
| 100 | |||
| 101 | def generate_from_dna_sequence(sequence): | ||
| 102 | for nucleotide in split(sequence): | ||
| 103 | print(nucleotide, nucleotide_tone_map[nucleotide]) | ||
| 104 | ``` | ||
| 105 | |||
| 106 | ## Generating sine wave | ||
| 107 | |||
| 108 | Because we are essentially creating a long stream of notes we will be appending | ||
| 109 | sine notes to a global array we will later use for creating a WAV file out of | ||
| 110 | it. | ||
| 111 | |||
| 112 | ```python | ||
| 113 | import math | ||
| 114 | |||
| 115 | def append_sinewave(freq=440.0, duration_milliseconds=500, volume=1.0): | ||
| 116 | global audio | ||
| 117 | |||
| 118 | num_samples = duration_milliseconds * (sample_rate / 1000.0) | ||
| 119 | |||
| 120 | for x in range(int(num_samples)): | ||
| 121 | audio.append(volume * math.sin(2 * math.pi * freq * (x / sample_rate))) | ||
| 122 | |||
| 123 | return | ||
| 124 | ``` | ||
| 125 | |||
| 126 | The sine wave generated here is the standard beep. If you want something more | ||
| 127 | aggressive, you could try a square or saw tooth waveform. | ||
| 128 | |||
| 129 | ## Generating a WAV file from accumulated sine waves | ||
| 130 | |||
| 131 | |||
| 132 | ```python | ||
| 133 | import wave | ||
| 134 | import struct | ||
| 135 | |||
| 136 | def save_wav(file_name): | ||
| 137 | wav_file = wave.open(file_name, 'w') | ||
| 138 | nchannels = 1 | ||
| 139 | sampwidth = 2 | ||
| 140 | |||
| 141 | nframes = len(audio) | ||
| 142 | comptype = 'NONE' | ||
| 143 | compname = 'not compressed' | ||
| 144 | wav_file.setparams((nchannels, sampwidth, sample_rate, nframes, comptype, compname)) | ||
| 145 | |||
| 146 | for sample in audio: | ||
| 147 | wav_file.writeframes(struct.pack('h', int(sample * 32767.0))) | ||
| 148 | |||
| 149 | wav_file.close() | ||
| 150 | ``` | ||
| 151 | |||
| 152 | 44100 is the industry standard sample rate - CD quality. If you need to save on | ||
| 153 | file size, you can adjust it downwards. The standard for low quality is, 8000 or | ||
| 154 | 8kHz. | ||
| 155 | |||
| 156 | WAV files here are using short, 16 bit, signed integers for the sample size. | ||
| 157 | So, we multiply the floating-point data we have by 32767, the maximum value for | ||
| 158 | a short integer. | ||
| 159 | |||
| 160 | > It is theoretically possible to use the floating point -1.0 to 1.0 data | ||
| 161 | > directly in a WAV file, but not obvious how to do that using the wave module | ||
| 162 | > in Python. | ||
| 163 | |||
| 164 | ## Generating Spectograms | ||
| 165 | |||
| 166 | I have tried two methods of doing this and both were just fine. I however opted | ||
| 167 | out to use the [SoX - Sound eXchange, the Swiss Army knife of audio | ||
| 168 | manipulation](https://linux.die.net/man/1/sox) one because it didn't require | ||
| 169 | anything else. | ||
| 170 | |||
| 171 | ```shell | ||
| 172 | sox output.wav -n spectrogram -o spectrogram.png | ||
| 173 | ``` | ||
| 174 | |||
| 175 | An example spectrogram of Ludwig van Beethoven Symphony No. 6 First movement. | ||
| 176 | |||
| 177 | <audio controls> | ||
| 178 | <source src="/assets/posts/dna-synthesized/symphony-no6-1st-movement.mp3" type="audio/mpeg"> | ||
| 179 | </audio> | ||
| 180 | |||
| 181 |  | ||
| 182 | |||
| 183 | The other option could also be in combination with | ||
| 184 | [gnuplot](http://www.gnuplot.info/). This would require an intermediary step, | ||
| 185 | however. | ||
| 186 | |||
| 187 | ```shell | ||
| 188 | sox output.wav audio.dat | ||
| 189 | tail -n+3 audio.dat > audio_only.dat | ||
| 190 | gnuplot audio.gpi | ||
| 191 | ``` | ||
| 192 | |||
| 193 | And input file `audio.gpi` that would be passed to gnuplot looks something like | ||
| 194 | this. | ||
| 195 | |||
| 196 | ```txt | ||
| 197 | # set output format and size | ||
| 198 | set term png size 1000,280 | ||
| 199 | |||
| 200 | # set output file | ||
| 201 | set output "audio.png" | ||
| 202 | |||
| 203 | # set y range | ||
| 204 | set yr [-1:1] | ||
| 205 | |||
| 206 | # we want just the data | ||
| 207 | unset key | ||
| 208 | unset tics | ||
| 209 | unset border | ||
| 210 | set lmargin 0 | ||
| 211 | set rmargin 0 | ||
| 212 | set tmargin 0 | ||
| 213 | set bmargin 0 | ||
| 214 | |||
| 215 | # draw rectangle to change background color | ||
| 216 | set obj 1 rectangle behind from screen 0,0 to screen 1,1 | ||
| 217 | set obj 1 fillstyle solid 1.0 fillcolor rgbcolor "#ffffff" | ||
| 218 | |||
| 219 | # draw data with foreground color | ||
| 220 | plot "audio_only.dat" with lines lt rgb 'red' | ||
| 221 | ``` | ||
| 222 | |||
| 223 | ## Pre-generated sequences | ||
| 224 | |||
| 225 | What I did was take interesting parts from an animal's genome and feed it to a | ||
| 226 | tone generator script. This then generated a WAV file and I converted those to | ||
| 227 | MP3, so they can be played in a browser. The last step was creating a | ||
| 228 | spectrogram based on a WAV file. | ||
| 229 | |||
| 230 | ### Niels Bohr quote | ||
| 231 | |||
| 232 | <audio controls> | ||
| 233 | <source src="/assets/posts/dna-synthesized/quote/out.mp3" type="audio/mpeg"> | ||
| 234 | </audio> | ||
| 235 | |||
| 236 |  | ||
| 237 | |||
| 238 | ### Mouse | ||
| 239 | |||
| 240 | This is part of a mouse genome `Mus_musculus.GRCm39.dna.nonchromosomal`. You | ||
| 241 | can get [genom data | ||
| 242 | here](http://ftp.ensembl.org/pub/release-106/fasta/mus_musculus/dna/). | ||
| 243 | |||
| 244 | <audio controls> | ||
| 245 | <source src="/assets/posts/dna-synthesized/mouse/out.mp3" type="audio/mpeg"> | ||
| 246 | </audio> | ||
| 247 | |||
| 248 |  | ||
| 249 | |||
| 250 | ### Bison | ||
| 251 | |||
| 252 | This is part of a bison genome `Bison_bison_bison.Bison_UMD1.0.cdna`. You can | ||
| 253 | get [genom data | ||
| 254 | here](http://ftp.ensembl.org/pub/release-106/fasta/bison_bison_bison/cdna/). | ||
| 255 | |||
| 256 | <audio controls> | ||
| 257 | <source src="/assets/posts/dna-synthesized/bison/out.mp3" type="audio/mpeg"> | ||
| 258 | </audio> | ||
| 259 | |||
| 260 |  | ||
| 261 | |||
| 262 | ### Taurus | ||
| 263 | |||
| 264 | This is part of a taurus genome `Bos_taurus.ARS-UCD1.2.cdna`. You can get | ||
| 265 | [genom data | ||
| 266 | here](http://ftp.ensembl.org/pub/release-106/fasta/bos_taurus/cdna/). | ||
| 267 | |||
| 268 | <audio controls> | ||
| 269 | <source src="/assets/posts/dna-synthesized/taurus/out.mp3" type="audio/mpeg"> | ||
| 270 | </audio> | ||
| 271 | |||
| 272 |  | ||
| 273 | |||
| 274 | ## Making a drummer out of a DNA sequence | ||
| 275 | |||
| 276 | To make things even more interesting, I decided to send this data via MIDI to my | ||
| 277 | [Elektron Model:Samples](https://www.elektron.se/en/model-samples). This is a | ||
| 278 | really cool piece of equipment that supports MIDI in via USB and 3.5 mm audio | ||
| 279 | jack. | ||
| 280 | |||
| 281 | Elektron is connected to my MacBook via USB cable and audio out is patched to a | ||
| 282 | Sony Bluetooth speaker I have that supports 3.5 mm audio in. Elektron doesn't | ||
| 283 | have internal speakers. | ||
| 284 | |||
| 285 |  | ||
| 286 | |||
| 287 |  | ||
| 288 | |||
| 289 |  | ||
| 290 | |||
| 291 | For communicating with Elektron, I choose `pygame` Python module that has MIDI | ||
| 292 | built in. With this, it was rather simple to send notes to the device. All I did | ||
| 293 | was map MIDI notes to the actual Nucleotides. | ||
| 294 | |||
| 295 | Before all of this I also checked Audio MIDI Setup app under MacOS and checked | ||
| 296 | MIDI Studio by pressing ⌘-2. | ||
| 297 | |||
| 298 |  | ||
| 299 | |||
| 300 | The whole script that parses and send notes to the Elektron looks like this. | ||
| 301 | |||
| 302 | ```python | ||
| 303 | import pygame.midi | ||
| 304 | import time | ||
| 305 | |||
| 306 | pygame.midi.init() | ||
| 307 | |||
| 308 | print(pygame.midi.get_default_output_id()) | ||
| 309 | print(pygame.midi.get_device_info(0)) | ||
| 310 | |||
| 311 | player = pygame.midi.Output(1) | ||
| 312 | player.set_instrument(2) | ||
| 313 | |||
| 314 | def send_note(note, velocity): | ||
| 315 | global player | ||
| 316 | player.note_on(note, velocity) | ||
| 317 | time.sleep(0.3) | ||
| 318 | player.note_off(note, velocity) | ||
| 319 | |||
| 320 | |||
| 321 | nucleotide_midi_map = { | ||
| 322 | 'A': 60, | ||
| 323 | 'C': 90, | ||
| 324 | 'G': 160, | ||
| 325 | 'T': 180, # is D | ||
| 326 | } | ||
| 327 | |||
| 328 | with open("quote.fa") as f: | ||
| 329 | sequence = f.read().replace('\n', '') | ||
| 330 | |||
| 331 | for nucleotide in [char for char in sequence]: | ||
| 332 | print("Playing nucleotide {} with MIDI note {}".format( | ||
| 333 | nucleotide, nucleotide_midi_map[nucleotide])) | ||
| 334 | send_note(nucleotide_midi_map[nucleotide], 127) | ||
| 335 | |||
| 336 | del player | ||
| 337 | pygame.midi.quit() | ||
| 338 | ``` | ||
| 339 | |||
| 340 | <video src="/assets/posts/dna-synthesized/elektron/elektron.mp4" controls></video> | ||
| 341 | |||
| 342 | All of this could be made much more interesting if I choose different | ||
| 343 | instruments for different Nucleotides, or doing more funky stuff with Elektron. | ||
| 344 | But for now, this should be enough. It is just a proof of concept. Something to | ||
| 345 | play around with. | ||
| 346 | |||
| 347 | ## Going even further | ||
| 348 | |||
| 349 | As you probably notice, the end results are quite similar to each other. This is | ||
| 350 | to be expected because we are operating only with 4 notes essentially. What | ||
| 351 | could make this more interesting is using something like | ||
| 352 | [Supercollider](https://supercollider.github.io/) to create more interesting | ||
| 353 | sounds. By transposing notes or using effects based on repeated data in a | ||
| 354 | sequence. Possibilities are endless. | ||
| 355 | |||
| 356 | It is really astonishing what can be achieved with a little bit of code and an | ||
| 357 | idea. I could see this becoming an interesting background soundscape instrument | ||
| 358 | if done properly. It could replace random note generator with something more | ||
| 359 | intriguing, biological, natural. | ||
| 360 | |||
| 361 | I actually find the results fascinating. I took some time and listened to this | ||
| 362 | music of nature. Even though it's quite the same, it's also quite different. | ||
| 363 | The subtle differences on repeat kind of creates music on its own. Makes you | ||
| 364 | wonder. It kind of puts Occam’s Razor in its place. Nature for sure loves to | ||
| 365 | make things as energy efficient as possible. | ||
