diff options
| author | Mitja Felicijan <m@mitjafelicijan.com> | 2023-07-08 23:25:41 +0200 |
|---|---|---|
| committer | Mitja Felicijan <m@mitjafelicijan.com> | 2023-07-08 23:25:41 +0200 |
| commit | cd6644ea4ddc78597934ab0ef5ba50e3c3daa927 (patch) | |
| tree | 03de331a8db6386dfd6fa75155bfbcea6b4feaf3 /content/posts/2022-07-05-what-would-dna-sound-if-synthesized.md | |
| parent | 84ed124529ffeee1590295b8de3a8faf51848680 (diff) | |
| download | mitjafelicijan.com-cd6644ea4ddc78597934ab0ef5ba50e3c3daa927.tar.gz | |
Moved to a simpler SSG
Diffstat (limited to 'content/posts/2022-07-05-what-would-dna-sound-if-synthesized.md')
| -rw-r--r-- | content/posts/2022-07-05-what-would-dna-sound-if-synthesized.md | 363 |
1 files changed, 0 insertions, 363 deletions
diff --git a/content/posts/2022-07-05-what-would-dna-sound-if-synthesized.md b/content/posts/2022-07-05-what-would-dna-sound-if-synthesized.md deleted file mode 100644 index e26088b..0000000 --- a/content/posts/2022-07-05-what-would-dna-sound-if-synthesized.md +++ /dev/null | |||
| @@ -1,363 +0,0 @@ | |||
| 1 | --- | ||
| 2 | title: What would DNA sound if synthesized to an audio file | ||
| 3 | url: what-would-dna-sound-if-synthesized.html | ||
| 4 | date: 2022-07-05T12:00:00+02:00 | ||
| 5 | draft: false | ||
| 6 | --- | ||
| 7 | |||
| 8 | ## Introduction | ||
| 9 | |||
| 10 | Lately, I have been thinking a lot about the nature of life, what are the | ||
| 11 | foundation blocks of life and things like that. It's remarkable how complex and | ||
| 12 | on the other hand simple the creation is when you look at it. The miracle of | ||
| 13 | life keeps us grounded when our imagination goes wild. If the DNA are the blocks | ||
| 14 | of life, you could consider them to be an API nature provided us to better | ||
| 15 | understand all of this chaos masquerading as order. | ||
| 16 | |||
| 17 | I have been reading a lot about superintelligence and our somehow misguided path | ||
| 18 | to create general artificial intelligence. What would the building blocks or our | ||
| 19 | creation look like? Is the compression really the ultimate storage of | ||
| 20 | information? Will our creation also ponder this questions when creating new | ||
| 21 | worlds for themselves, or will we just disappear into the vastness of | ||
| 22 | possibilities? It is a little offensive that we are playing God whilst being | ||
| 23 | completely ignorant of our own reality. Who knows! Like many other | ||
| 24 | breakthroughs, this one will also come at a cost not known to us when it finally | ||
| 25 | happens. | ||
| 26 | |||
| 27 | To keep things a bit lighter, I decided to convert some popular DNA sequences | ||
| 28 | into an audio files for us to listen to. I am not the first one, nor I will be | ||
| 29 | the last one to do this. But it is an interesting exercise in better | ||
| 30 | understanding the relationship between art and science. Maybe listening to DNA | ||
| 31 | instead of parsing it will find a way into better understanding, or at least | ||
| 32 | enjoying the creation and cryptic nature of life. | ||
| 33 | |||
| 34 | ## DNA encoding and primer example | ||
| 35 | |||
| 36 | I have been exploring DNA in the past in my post from about 3 years ago in | ||
| 37 | [Encoding binary data into DNA | ||
| 38 | sequence](/encoding-binary-data-into-dna-sequence.html) where I have been | ||
| 39 | converting all sorts of data into DNA sequences. | ||
| 40 | |||
| 41 | This will be a similar exercise but instead of converting to DNA, I will be | ||
| 42 | generating tones from Nucleotides. | ||
| 43 | |||
| 44 | | Nucleotides | Note | Frequency | | ||
| 45 | | ---------------- | ---- | --------- | | ||
| 46 | | **A** (Adenine) | A | 440 Hz | | ||
| 47 | | **C** (Cytosine) | C | 783.99 Hz | | ||
| 48 | | **G** (Guanine) | G | 523.25 Hz | | ||
| 49 | | **T** (Thymine) | D | 587.33 Hz | | ||
| 50 | |||
| 51 | Since we do not have T in equal-tempered scale, I choose D to represent T note. | ||
| 52 | |||
| 53 | You can check [Frequencies for equal-tempered scale, A4 = 440 | ||
| 54 | Hz](https://pages.mtu.edu/~suits/notefreqs.html). For this tuning, we also | ||
| 55 | choose `Speed of Sound = 345 m/s = 1130 ft/s = 770 miles/hr`. | ||
| 56 | |||
| 57 | Now that we have this out of the way, we can also brush up on the DNA sequencing | ||
| 58 | a bit. This is a famous quote I also used for the encoding tests, and it goes | ||
| 59 | like this. | ||
| 60 | |||
| 61 | > How wonderful that we have met with a paradox. Now we have some hope of | ||
| 62 | > making progress. | ||
| 63 | > ― Niels Bohr | ||
| 64 | |||
| 65 | ```shell | ||
| 66 | >SEQ1 | ||
| 67 | GACAGCTTGTGTACAAGTGTGCTTGCTCGCGAGCGGGTACGCGCGTGGGCTAACAAGTGA | ||
| 68 | GCCAGCAGGTGAACAAGTGTGCGGACAAGCCAGCAGGTGCGCGGACAAGCTGGCGGGTGA | ||
| 69 | ACAAGTGTGCCGGTGAGCCAACAAGCAGACAAGTAAGCAGGTACGCAGGCGAGCTTGTCA | ||
| 70 | ACTCACAAGATCGCTTGTGTACAAGTGTGCGGACAAGCCAGCAGGTGCGCGGACAAGTAT | ||
| 71 | GCTTGCTGGCGGACAAGCCAGCTTGTAAGCGGACAAGCTTGCGCACAAGCTGGCAGGCCT | ||
| 72 | GCCGGCTCGCGTACAAATTCACAAGTAAGTACGCTTGCGTGTACGCGGGTATGTATACTC | ||
| 73 | AACCTCACCAAACGGGACAAGATCGCCGGCGGGCTAGTATACAAGAACGCTTGCCAGTAC | ||
| 74 | AACC | ||
| 75 | ``` | ||
| 76 | |||
| 77 | This is what we gonna work with to get things rolling forward, when creating | ||
| 78 | parser and waveform generator. | ||
| 79 | |||
| 80 | ## Parsing DNA data | ||
| 81 | |||
| 82 | This step is rather simple one. All we need to do is parse input DNA sequence in | ||
| 83 | [FASTA format](https://en.wikipedia.org/wiki/FASTA_format) well known in | ||
| 84 | [Bioinformatics](https://en.wikipedia.org/wiki/Bioinformatics) to extract single | ||
| 85 | Nucleotides that will be converted into separate tones based on equal-tempered | ||
| 86 | scale explained above. | ||
| 87 | |||
| 88 | ```python | ||
| 89 | nucleotide_tone_map = { | ||
| 90 | 'A': 440, | ||
| 91 | 'C': 523.25, | ||
| 92 | 'G': 783.99, | ||
| 93 | 'T': 587.33, # converted to D | ||
| 94 | } | ||
| 95 | |||
| 96 | def split(word): | ||
| 97 | return [char for char in word] | ||
| 98 | |||
| 99 | def generate_from_dna_sequence(sequence): | ||
| 100 | for nucleotide in split(sequence): | ||
| 101 | print(nucleotide, nucleotide_tone_map[nucleotide]) | ||
| 102 | ``` | ||
| 103 | |||
| 104 | ## Generating sine wave | ||
| 105 | |||
| 106 | Because we are essentially creating a long stream of notes we will be appending | ||
| 107 | sine notes to a global array we will later use for creating a WAV file out of | ||
| 108 | it. | ||
| 109 | |||
| 110 | ```python | ||
| 111 | import math | ||
| 112 | |||
| 113 | def append_sinewave(freq=440.0, duration_milliseconds=500, volume=1.0): | ||
| 114 | global audio | ||
| 115 | |||
| 116 | num_samples = duration_milliseconds * (sample_rate / 1000.0) | ||
| 117 | |||
| 118 | for x in range(int(num_samples)): | ||
| 119 | audio.append(volume * math.sin(2 * math.pi * freq * (x / sample_rate))) | ||
| 120 | |||
| 121 | return | ||
| 122 | ``` | ||
| 123 | |||
| 124 | The sine wave generated here is the standard beep. If you want something more | ||
| 125 | aggressive, you could try a square or saw tooth waveform. | ||
| 126 | |||
| 127 | ## Generating a WAV file from accumulated sine waves | ||
| 128 | |||
| 129 | |||
| 130 | ```python | ||
| 131 | import wave | ||
| 132 | import struct | ||
| 133 | |||
| 134 | def save_wav(file_name): | ||
| 135 | wav_file = wave.open(file_name, 'w') | ||
| 136 | nchannels = 1 | ||
| 137 | sampwidth = 2 | ||
| 138 | |||
| 139 | nframes = len(audio) | ||
| 140 | comptype = 'NONE' | ||
| 141 | compname = 'not compressed' | ||
| 142 | wav_file.setparams((nchannels, sampwidth, sample_rate, nframes, comptype, compname)) | ||
| 143 | |||
| 144 | for sample in audio: | ||
| 145 | wav_file.writeframes(struct.pack('h', int(sample * 32767.0))) | ||
| 146 | |||
| 147 | wav_file.close() | ||
| 148 | ``` | ||
| 149 | |||
| 150 | 44100 is the industry standard sample rate - CD quality. If you need to save on | ||
| 151 | file size, you can adjust it downwards. The standard for low quality is, 8000 or | ||
| 152 | 8kHz. | ||
| 153 | |||
| 154 | WAV files here are using short, 16 bit, signed integers for the sample size. | ||
| 155 | So, we multiply the floating-point data we have by 32767, the maximum value for | ||
| 156 | a short integer. | ||
| 157 | |||
| 158 | > It is theoretically possible to use the floating point -1.0 to 1.0 data | ||
| 159 | > directly in a WAV file, but not obvious how to do that using the wave module | ||
| 160 | > in Python. | ||
| 161 | |||
| 162 | ## Generating Spectograms | ||
| 163 | |||
| 164 | I have tried two methods of doing this and both were just fine. I however opted | ||
| 165 | out to use the [SoX - Sound eXchange, the Swiss Army knife of audio | ||
| 166 | manipulation](https://linux.die.net/man/1/sox) one because it didn't require | ||
| 167 | anything else. | ||
| 168 | |||
| 169 | ```shell | ||
| 170 | sox output.wav -n spectrogram -o spectrogram.png | ||
| 171 | ``` | ||
| 172 | |||
| 173 | An example spectrogram of Ludwig van Beethoven Symphony No. 6 First movement. | ||
| 174 | |||
| 175 | <audio controls> | ||
| 176 | <source src="/assets/dna-synthesized/symphony-no6-1st-movement.mp3" type="audio/mpeg"> | ||
| 177 | </audio> | ||
| 178 | |||
| 179 |  | ||
| 180 | |||
| 181 | The other option could also be in combination with | ||
| 182 | [gnuplot](http://www.gnuplot.info/). This would require an intermediary step, | ||
| 183 | however. | ||
| 184 | |||
| 185 | ```shell | ||
| 186 | sox output.wav audio.dat | ||
| 187 | tail -n+3 audio.dat > audio_only.dat | ||
| 188 | gnuplot audio.gpi | ||
| 189 | ``` | ||
| 190 | |||
| 191 | And input file `audio.gpi` that would be passed to gnuplot looks something like | ||
| 192 | this. | ||
| 193 | |||
| 194 | ``` | ||
| 195 | # set output format and size | ||
| 196 | set term png size 1000,280 | ||
| 197 | |||
| 198 | # set output file | ||
| 199 | set output "audio.png" | ||
| 200 | |||
| 201 | # set y range | ||
| 202 | set yr [-1:1] | ||
| 203 | |||
| 204 | # we want just the data | ||
| 205 | unset key | ||
| 206 | unset tics | ||
| 207 | unset border | ||
| 208 | set lmargin 0 | ||
| 209 | set rmargin 0 | ||
| 210 | set tmargin 0 | ||
| 211 | set bmargin 0 | ||
| 212 | |||
| 213 | # draw rectangle to change background color | ||
| 214 | set obj 1 rectangle behind from screen 0,0 to screen 1,1 | ||
| 215 | set obj 1 fillstyle solid 1.0 fillcolor rgbcolor "#ffffff" | ||
| 216 | |||
| 217 | # draw data with foreground color | ||
| 218 | plot "audio_only.dat" with lines lt rgb 'red' | ||
| 219 | ``` | ||
| 220 | |||
| 221 | ## Pre-generated sequences | ||
| 222 | |||
| 223 | What I did was take interesting parts from an animal's genome and feed it to a | ||
| 224 | tone generator script. This then generated a WAV file and I converted those to | ||
| 225 | MP3, so they can be played in a browser. The last step was creating a | ||
| 226 | spectrogram based on a WAV file. | ||
| 227 | |||
| 228 | ### Niels Bohr quote | ||
| 229 | |||
| 230 | <audio controls> | ||
| 231 | <source src="/assets/dna-synthesized/quote/out.mp3" type="audio/mpeg"> | ||
| 232 | </audio> | ||
| 233 | |||
| 234 |  | ||
| 235 | |||
| 236 | ### Mouse | ||
| 237 | |||
| 238 | This is part of a mouse genome `Mus_musculus.GRCm39.dna.nonchromosomal`. You | ||
| 239 | can get [genom data | ||
| 240 | here](http://ftp.ensembl.org/pub/release-106/fasta/mus_musculus/dna/). | ||
| 241 | |||
| 242 | <audio controls> | ||
| 243 | <source src="/assets/dna-synthesized/mouse/out.mp3" type="audio/mpeg"> | ||
| 244 | </audio> | ||
| 245 | |||
| 246 |  | ||
| 247 | |||
| 248 | ### Bison | ||
| 249 | |||
| 250 | This is part of a bison genome `Bison_bison_bison.Bison_UMD1.0.cdna`. You can | ||
| 251 | get [genom data | ||
| 252 | here](http://ftp.ensembl.org/pub/release-106/fasta/bison_bison_bison/cdna/). | ||
| 253 | |||
| 254 | <audio controls> | ||
| 255 | <source src="/assets/dna-synthesized/bison/out.mp3" type="audio/mpeg"> | ||
| 256 | </audio> | ||
| 257 | |||
| 258 |  | ||
| 259 | |||
| 260 | ### Taurus | ||
| 261 | |||
| 262 | This is part of a taurus genome `Bos_taurus.ARS-UCD1.2.cdna`. You can get | ||
| 263 | [genom data | ||
| 264 | here](http://ftp.ensembl.org/pub/release-106/fasta/bos_taurus/cdna/). | ||
| 265 | |||
| 266 | <audio controls> | ||
| 267 | <source src="/assets/dna-synthesized/taurus/out.mp3" type="audio/mpeg"> | ||
| 268 | </audio> | ||
| 269 | |||
| 270 |  | ||
| 271 | |||
| 272 | ## Making a drummer out of a DNA sequence | ||
| 273 | |||
| 274 | To make things even more interesting, I decided to send this data via MIDI to my | ||
| 275 | [Elektron Model:Samples](https://www.elektron.se/en/model-samples). This is a | ||
| 276 | really cool piece of equipment that supports MIDI in via USB and 3.5 mm audio | ||
| 277 | jack. | ||
| 278 | |||
| 279 | Elektron is connected to my MacBook via USB cable and audio out is patched to a | ||
| 280 | Sony Bluetooth speaker I have that supports 3.5 mm audio in. Elektron doesn't | ||
| 281 | have internal speakers. | ||
| 282 | |||
| 283 |  | ||
| 284 | |||
| 285 |  | ||
| 286 | |||
| 287 |  | ||
| 288 | |||
| 289 | For communicating with Elektron, I choose `pygame` Python module that has MIDI | ||
| 290 | built in. With this, it was rather simple to send notes to the device. All I did | ||
| 291 | was map MIDI notes to the actual Nucleotides. | ||
| 292 | |||
| 293 | Before all of this I also checked Audio MIDI Setup app under MacOS and checked | ||
| 294 | MIDI Studio by pressing ⌘-2. | ||
| 295 | |||
| 296 |  | ||
| 297 | |||
| 298 | The whole script that parses and send notes to the Elektron looks like this. | ||
| 299 | |||
| 300 | ```python | ||
| 301 | import pygame.midi | ||
| 302 | import time | ||
| 303 | |||
| 304 | pygame.midi.init() | ||
| 305 | |||
| 306 | print(pygame.midi.get_default_output_id()) | ||
| 307 | print(pygame.midi.get_device_info(0)) | ||
| 308 | |||
| 309 | player = pygame.midi.Output(1) | ||
| 310 | player.set_instrument(2) | ||
| 311 | |||
| 312 | def send_note(note, velocity): | ||
| 313 | global player | ||
| 314 | player.note_on(note, velocity) | ||
| 315 | time.sleep(0.3) | ||
| 316 | player.note_off(note, velocity) | ||
| 317 | |||
| 318 | |||
| 319 | nucleotide_midi_map = { | ||
| 320 | 'A': 60, | ||
| 321 | 'C': 90, | ||
| 322 | 'G': 160, | ||
| 323 | 'T': 180, # is D | ||
| 324 | } | ||
| 325 | |||
| 326 | with open("quote.fa") as f: | ||
| 327 | sequence = f.read().replace('\n', '') | ||
| 328 | |||
| 329 | for nucleotide in [char for char in sequence]: | ||
| 330 | print("Playing nucleotide {} with MIDI note {}".format( | ||
| 331 | nucleotide, nucleotide_midi_map[nucleotide])) | ||
| 332 | send_note(nucleotide_midi_map[nucleotide], 127) | ||
| 333 | |||
| 334 | del player | ||
| 335 | pygame.midi.quit() | ||
| 336 | ``` | ||
| 337 | |||
| 338 | <video src="/assets/dna-synthesized/elektron/elektron.mp4" controls></video> | ||
| 339 | |||
| 340 | All of this could be made much more interesting if I choose different | ||
| 341 | instruments for different Nucleotides, or doing more funky stuff with Elektron. | ||
| 342 | But for now, this should be enough. It is just a proof of concept. Something to | ||
| 343 | play around with. | ||
| 344 | |||
| 345 | ## Going even further | ||
| 346 | |||
| 347 | As you probably notice, the end results are quite similar to each other. This is | ||
| 348 | to be expected because we are operating only with 4 notes essentially. What | ||
| 349 | could make this more interesting is using something like | ||
| 350 | [Supercollider](https://supercollider.github.io/) to create more interesting | ||
| 351 | sounds. By transposing notes or using effects based on repeated data in a | ||
| 352 | sequence. Possibilities are endless. | ||
| 353 | |||
| 354 | It is really astonishing what can be achieved with a little bit of code and an | ||
| 355 | idea. I could see this becoming an interesting background soundscape instrument | ||
| 356 | if done properly. It could replace random note generator with something more | ||
| 357 | intriguing, biological, natural. | ||
| 358 | |||
| 359 | I actually find the results fascinating. I took some time and listened to this | ||
| 360 | music of nature. Even though it's quite the same, it's also quite different. | ||
| 361 | The subtle differences on repeat kind of creates music on its own. Makes you | ||
| 362 | wonder. It kind of puts Occam’s Razor in its place. Nature for sure loves to | ||
| 363 | make things as energy efficient as possible. | ||
