Added new post about synthesized DNA

author: Mitja Felicijan <mitja.felicijan@gmail.com> 2022-07-05 04:40:26 +0200
committer: Mitja Felicijan <mitja.felicijan@gmail.com> 2022-07-05 04:40:26 +0200
commit: 672e2f7e1c3ed89ff3c2e192d646b56ce74702a3 (patch)
tree: f143640a169d8a17557dc5959c274c6e663844a3 /posts
parent: d99ba79d190d449f561cd4415d16a13584f43c10 (diff)
download: mitjafelicijan.com-672e2f7e1c3ed89ff3c2e192d646b56ce74702a3.tar.gz
1 files changed, 233 insertions, 0 deletions
diff --git a/posts/2022-07-05-what-would-dna-sound-if-synthesized.md b/posts/2022-07-05-what-would-dna-sound-if-synthesized.md
new file mode 100644
index 0000000..0c41dd0
--- /dev/null
+++ b/posts/2022-07-05-what-would-dna-sound-if-synthesized.md
@@ -0,0 +1,233 @@
+---
+Title: What would DNA sound if synthesized to an audio file
+Description: What would DNA sound if synthesized
+Slug: what-would-dna-sound-if-synthesized
+Listing: true
+Created: 2022-07-05
+Tags: []
+---
+1. [Introduction](#introduction)
+2. [DNA encoding and primer example](#dna-encoding-and-primer-example)
+3. [Parsing DNA data](#parsing-dna-data)
+4. [Generating sine wave](#generating-sine-wave)
+5. [Generating a WAV file from accumulated sine waves](#generating-a-wav-file-from-accumulated-sine-waves)
+6. [Generating Spectograms](#generating-spectograms)
+7. [Pre-generated sequences](#pre-generated-sequences)
+   1. [Niels Bohr quote](#niels-bohr-quote)
+   2. [Mouse](#mouse)
+   3. [Bison](#bison)
+   4. [Taurus](#taurus)
+8. [Going even further](#going-even-further)
+## Introduction
+Lately, I have been thinking a lot about the nature of life, what are the foundation blocks of life and things like that. It's remarkable how complex and on the other hand simple the creation is when you look at it. The miracle of life keeps us grounded when our imagination goes wild. If the DNA are the blocks of life, you could consider them to be an API nature provided us to better understand all of this chaos masquerading as order.
+I have been reading a lot about superintelligence and our somehow misguided path to create general artificial intelligence. What would the building blocks or our creation look like? Is the compression really the ultimate storage of information? Will our creation also ponder this questions when creating new worlds for themselves, or will we just disappear into the vastness of possibilities? It is a little offensive that we are playing God whilst being completely ignorant of our own reality. Who knows! Like many other breakthroughs, this one will also come at a cost not known to us when it finally happens.
+To keep things a bit lighter, I decided to convert some popular DNA sequences into an audio files for us to listen to. I am not the first one, nor I will be the last one to do this. But it is an interesting exercise in better understanding the relationship between art and science. Maybe listening to DNA instead of parsing it will find a way into better understanding, or at least enjoying the creation and cryptic nature of life.
+## DNA encoding and primer example
+I have been exploring DNA in the past in my post from about 3 years ago in [Encoding binary data into DNA sequence](/encoding-binary-data-into-dna-sequence.html) where I have been converting all sorts of data into DNA sequences.
+This will be a similar exercise but instead of converting to DNA, I will be generating tones from Nucleotides.
+| Nucleotides      | Note | Frequency |
+| ---------------- | ---- | --------- |
+| **A** (Adenine)  | A    | 440 Hz    |
+| **C** (Cytosine) | C    | 783.99 Hz |
+| **G** (Guanine)  | G    | 523.25 Hz |
+| **T** (Thymine)  | D    | 587.33 Hz |
+Since we do not have T in equal-tempered scale, I choose D to represent T note.
+You can check [Frequencies for equal-tempered scale, A4 = 440 Hz](https://pages.mtu.edu/~suits/notefreqs.html). For this tuning, we also choose `Speed of Sound = 345 m/s = 1130 ft/s = 770 miles/hr`.
+Now that we have this out of the way, we can also brush up on the DNA sequencing a bit. This is a famous quote I also used for the encoding tests, and it goes like this.
+> How wonderful that we have met with a paradox. Now we have some hope of making progress.
+> ― Niels Bohr
+```shell
+>SEQ1
+GACAGCTTGTGTACAAGTGTGCTTGCTCGCGAGCGGGTACGCGCGTGGGCTAACAAGTGA
+GCCAGCAGGTGAACAAGTGTGCGGACAAGCCAGCAGGTGCGCGGACAAGCTGGCGGGTGA
+ACAAGTGTGCCGGTGAGCCAACAAGCAGACAAGTAAGCAGGTACGCAGGCGAGCTTGTCA
+ACTCACAAGATCGCTTGTGTACAAGTGTGCGGACAAGCCAGCAGGTGCGCGGACAAGTAT
+GCTTGCTGGCGGACAAGCCAGCTTGTAAGCGGACAAGCTTGCGCACAAGCTGGCAGGCCT
+GCCGGCTCGCGTACAAATTCACAAGTAAGTACGCTTGCGTGTACGCGGGTATGTATACTC
+AACCTCACCAAACGGGACAAGATCGCCGGCGGGCTAGTATACAAGAACGCTTGCCAGTAC
+AACC
+```
+This is what we gonna work with to get things rolling forward, when creating parser and waveform generator.
+## Parsing DNA data
+This step is rather simple one. All we need to do is parse input DNA sequence in [FASTA format](https://en.wikipedia.org/wiki/FASTA_format) well known in [Bioinformatics](https://en.wikipedia.org/wiki/Bioinformatics) to extract single Nucleotides that will be converted into separate tones based on equal-tempered scale explained above.
+```python
+nucleotide_tone_map = {
+  'A': 440,
+  'C': 523.25,
+  'G': 783.99,
+  'T': 587.33,  # converted to D
+}
+def split(word):
+  return [char for char in word]
+def generate_from_dna_sequence(sequence):
+  for nucleotide in split(sequence):
+    print(nucleotide, nucleotide_tone_map[nucleotide])
+```
+## Generating sine wave
+Because we are essentially creating a long stream of notes we will be appending sine notes to a global array we will later use for creating a WAV file out of it.
+```python
+import math
+def append_sinewave(freq=440.0, duration_milliseconds=500, volume=1.0):
+  global audio
+  num_samples = duration_milliseconds * (sample_rate / 1000.0)
+  for x in range(int(num_samples)):
+    audio.append(volume * math.sin(2 * math.pi * freq * (x / sample_rate)))
+  return
+```
+The sine wave generated here is the standard beep. If you want something more aggressive, you could try a square or saw tooth waveform.
+## Generating a WAV file from accumulated sine waves
+```python
+import wave
+import struct
+def save_wav(file_name):
+  wav_file = wave.open(file_name, 'w')
+  nchannels = 1
+  sampwidth = 2
+  nframes = len(audio)
+  comptype = 'NONE'
+  compname = 'not compressed'
+  wav_file.setparams((nchannels, sampwidth, sample_rate, nframes, comptype, compname))
+  for sample in audio:
+    wav_file.writeframes(struct.pack('h', int(sample * 32767.0)))
+  wav_file.close()
+```
+44100 is the industry standard sample rate - CD quality.  If you need to save on file size, you can adjust it downwards. The standard for low quality is, 8000 or 8kHz.
+WAV files here are using short, 16 bit, signed integers for the sample size.  So, we multiply the floating-point data we have by 32767, the maximum value for a short integer.
+> It is theoretically possible to use the floating point -1.0 to 1.0 data directly in a WAV file, but not obvious how to do that using the wave module in Python.
+## Generating Spectograms
+I have tried two methods of doing this and both were just fine. I however opted out to use the [SoX - Sound eXchange, the Swiss Army knife of audio manipulation](https://linux.die.net/man/1/sox) one because it didn't require anything else.
+```shell
+sox output.wav -n spectrogram -o spectrogram.png
+```
+An example spectrogram of Ludwig van Beethoven Symphony No. 6 First movement.
+<audio controls>
+  <source src="/assets/dna-synthesized/symphony-no6-1st-movement.mp3" type="audio/mpeg">
+</audio>
+![Ludwig van Beethoven Symphony No. 6 First movement](/assets/dna-synthesized/symphony-no6-1st-movement.png)
+The other option could also be in combination with [gnuplot](http://www.gnuplot.info/). This would require an intermediary step, however.
+```shell
+sox output.wav audio.dat
+tail -n+3 audio.dat > audio_only.dat
+gnuplot audio.gpi
+```
+And input file `audio.gpi` that would be passed to gnuplot looks something like this.
+```
+# set output format and size
+set term png size 1000,280
+# set output file
+set output "audio.png"
+# set y range
+set yr [-1:1]
+# we want just the data
+unset key
+unset tics
+unset border
+set lmargin 0
+set rmargin 0
+set tmargin 0
+set bmargin 0
+# draw rectangle to change background color
+set obj 1 rectangle behind from screen 0,0 to screen 1,1
+set obj 1 fillstyle solid 1.0 fillcolor rgbcolor "#ffffff"
+# draw data with foreground color
+plot "audio_only.dat" with lines lt rgb 'red'
+```
+## Pre-generated sequences
+### Niels Bohr quote
+<audio controls>
+  <source src="/assets/dna-synthesized/quote/out.mp3" type="audio/mpeg">
+</audio>
+![Spectogram](/assets/dna-synthesized/quote/spectogram.png)
+### Mouse
+This is part of a mouse genome `Mus_musculus.GRCm39.dna.nonchromosomal`. You can get [genom data here](http://ftp.ensembl.org/pub/release-106/fasta/mus_musculus/dna/).
+<audio controls>
+  <source src="/assets/dna-synthesized/mouse/out.mp3" type="audio/mpeg">
+</audio>
+![Spectogram](/assets/dna-synthesized/mouse/spectogram.png)
+### Bison
+This is part of a mouse genome `Bison_bison_bison.Bison_UMD1.0.cdna`. You can get [genom data here](http://ftp.ensembl.org/pub/release-106/fasta/bison_bison_bison/cdna/).
+<audio controls>
+  <source src="/assets/dna-synthesized/bison/out.mp3" type="audio/mpeg">
+</audio>
+![Spectogram](/assets/dna-synthesized/bison/spectogram.png)
+### Taurus
+This is part of a mouse genome `Bos_taurus.ARS-UCD1.2.cdna`. You can get [genom data here](http://ftp.ensembl.org/pub/release-106/fasta/bos_taurus/cdna/).
+<audio controls>
+  <source src="/assets/dna-synthesized/taurus/out.mp3" type="audio/mpeg">
+</audio>
+![Spectogram](/assets/dna-synthesized/taurus/spectogram.png)
+## Going even further
+As you probably notice, the end results are quite similar to each other. This is to be expected because we are operating only with 4 notes essentially. What could make this more interesting is using something like [Supercollider](https://supercollider.github.io/) to create more interesting sounds. By transposing notes or using effects based on repeated data in a sequence. Possibilities are endless.
+I actually find the results fascinating. I took some time and listened to this music of nature. Even though it's quite the same, it's also quite different. The subtle differences on repeat kind of creates music on its own. Makes you wonder. It kind of puts Occam’s Razor in its place. Nature for sure loves to make things as energy efficient as possible.
author	Mitja Felicijan <mitja.felicijan@gmail.com>	2022-07-05 04:40:26 +0200
committer	Mitja Felicijan <mitja.felicijan@gmail.com>	2022-07-05 04:40:26 +0200
commit	672e2f7e1c3ed89ff3c2e192d646b56ce74702a3 (patch)
tree	f143640a169d8a17557dc5959c274c6e663844a3 /posts
parent	d99ba79d190d449f561cd4415d16a13584f43c10 (diff)
download	mitjafelicijan.com-672e2f7e1c3ed89ff3c2e192d646b56ce74702a3.tar.gz

diff --git a/posts/2022-07-05-what-would-dna-sound-if-synthesized.md b/posts/2022-07-05-what-would-dna-sound-if-synthesized.md new file mode 100644 index 0000000..0c41dd0 --- /dev/null +++ b/posts/2022-07-05-what-would-dna-sound-if-synthesized.md
@@ -0,0 +1,233 @@
	1	---
	2	Title: What would DNA sound if synthesized to an audio file
	3	Description: What would DNA sound if synthesized
	4	Slug: what-would-dna-sound-if-synthesized
	5	Listing: true
	6	Created: 2022-07-05
	7	Tags: []
	8	---
	9
	10	1. [Introduction](#introduction)
	11	2. [DNA encoding and primer example](#dna-encoding-and-primer-example)
	12	3. [Parsing DNA data](#parsing-dna-data)
	13	4. [Generating sine wave](#generating-sine-wave)
	14	5. [Generating a WAV file from accumulated sine waves](#generating-a-wav-file-from-accumulated-sine-waves)
	15	6. [Generating Spectograms](#generating-spectograms)
	16	7. [Pre-generated sequences](#pre-generated-sequences)
	17	1. [Niels Bohr quote](#niels-bohr-quote)
	18	2. [Mouse](#mouse)
	19	3. [Bison](#bison)
	20	4. [Taurus](#taurus)
	21	8. [Going even further](#going-even-further)
	22
	23	## Introduction
	24
	25	Lately, I have been thinking a lot about the nature of life, what are the foundation blocks of life and things like that. It's remarkable how complex and on the other hand simple the creation is when you look at it. The miracle of life keeps us grounded when our imagination goes wild. If the DNA are the blocks of life, you could consider them to be an API nature provided us to better understand all of this chaos masquerading as order.
	26
	27	I have been reading a lot about superintelligence and our somehow misguided path to create general artificial intelligence. What would the building blocks or our creation look like? Is the compression really the ultimate storage of information? Will our creation also ponder this questions when creating new worlds for themselves, or will we just disappear into the vastness of possibilities? It is a little offensive that we are playing God whilst being completely ignorant of our own reality. Who knows! Like many other breakthroughs, this one will also come at a cost not known to us when it finally happens.
	28
	29	To keep things a bit lighter, I decided to convert some popular DNA sequences into an audio files for us to listen to. I am not the first one, nor I will be the last one to do this. But it is an interesting exercise in better understanding the relationship between art and science. Maybe listening to DNA instead of parsing it will find a way into better understanding, or at least enjoying the creation and cryptic nature of life.
	30
	31	## DNA encoding and primer example
	32
	33	I have been exploring DNA in the past in my post from about 3 years ago in [Encoding binary data into DNA sequence](/encoding-binary-data-into-dna-sequence.html) where I have been converting all sorts of data into DNA sequences.
	34
	35	This will be a similar exercise but instead of converting to DNA, I will be generating tones from Nucleotides.
	36
	37	\| Nucleotides \| Note \| Frequency \|
	38	\| ---------------- \| ---- \| --------- \|
	39	\| A (Adenine) \| A \| 440 Hz \|
	40	\| C (Cytosine) \| C \| 783.99 Hz \|
	41	\| G (Guanine) \| G \| 523.25 Hz \|
	42	\| T (Thymine) \| D \| 587.33 Hz \|
	43
	44	Since we do not have T in equal-tempered scale, I choose D to represent T note.
	45
	46	You can check [Frequencies for equal-tempered scale, A4 = 440 Hz](https://pages.mtu.edu/~suits/notefreqs.html). For this tuning, we also choose `Speed of Sound = 345 m/s = 1130 ft/s = 770 miles/hr`.
	47
	48	Now that we have this out of the way, we can also brush up on the DNA sequencing a bit. This is a famous quote I also used for the encoding tests, and it goes like this.
	49
	50	> How wonderful that we have met with a paradox. Now we have some hope of making progress.
	51	> ― Niels Bohr
	52
	53	```shell
	54	>SEQ1
	55	GACAGCTTGTGTACAAGTGTGCTTGCTCGCGAGCGGGTACGCGCGTGGGCTAACAAGTGA
	56	GCCAGCAGGTGAACAAGTGTGCGGACAAGCCAGCAGGTGCGCGGACAAGCTGGCGGGTGA
	57	ACAAGTGTGCCGGTGAGCCAACAAGCAGACAAGTAAGCAGGTACGCAGGCGAGCTTGTCA
	58	ACTCACAAGATCGCTTGTGTACAAGTGTGCGGACAAGCCAGCAGGTGCGCGGACAAGTAT
	59	GCTTGCTGGCGGACAAGCCAGCTTGTAAGCGGACAAGCTTGCGCACAAGCTGGCAGGCCT
	60	GCCGGCTCGCGTACAAATTCACAAGTAAGTACGCTTGCGTGTACGCGGGTATGTATACTC
	61	AACCTCACCAAACGGGACAAGATCGCCGGCGGGCTAGTATACAAGAACGCTTGCCAGTAC
	62	AACC
	63	```
	64
	65	This is what we gonna work with to get things rolling forward, when creating parser and waveform generator.
	66
	67	## Parsing DNA data
	68
	69	This step is rather simple one. All we need to do is parse input DNA sequence in [FASTA format](https://en.wikipedia.org/wiki/FASTA_format) well known in [Bioinformatics](https://en.wikipedia.org/wiki/Bioinformatics) to extract single Nucleotides that will be converted into separate tones based on equal-tempered scale explained above.
	70
	71	```python
	72	nucleotide_tone_map = {
	73	'A': 440,
	74	'C': 523.25,
	75	'G': 783.99,
	76	'T': 587.33, # converted to D
	77	}
	78
	79	def split(word):
	80	return [char for char in word]
	81
	82	def generate_from_dna_sequence(sequence):
	83	for nucleotide in split(sequence):
	84	print(nucleotide, nucleotide_tone_map[nucleotide])
	85	```
	86
	87	## Generating sine wave
	88
	89	Because we are essentially creating a long stream of notes we will be appending sine notes to a global array we will later use for creating a WAV file out of it.
	90
	91	```python
	92	import math
	93
	94	def append_sinewave(freq=440.0, duration_milliseconds=500, volume=1.0):
	95	global audio
	96
	97	num_samples = duration_milliseconds * (sample_rate / 1000.0)
	98
	99	for x in range(int(num_samples)):
	100	audio.append(volume * math.sin(2 * math.pi * freq * (x / sample_rate)))
	101
	102	return
	103	```
	104
	105	The sine wave generated here is the standard beep. If you want something more aggressive, you could try a square or saw tooth waveform.
	106
	107	## Generating a WAV file from accumulated sine waves
	108
	109
	110	```python
	111	import wave
	112	import struct
	113
	114	def save_wav(file_name):
	115	wav_file = wave.open(file_name, 'w')
	116	nchannels = 1
	117	sampwidth = 2
	118
	119	nframes = len(audio)
	120	comptype = 'NONE'
	121	compname = 'not compressed'
	122	wav_file.setparams((nchannels, sampwidth, sample_rate, nframes, comptype, compname))
	123
	124	for sample in audio:
	125	wav_file.writeframes(struct.pack('h', int(sample * 32767.0)))
	126
	127	wav_file.close()
	128	```
	129
	130	44100 is the industry standard sample rate - CD quality. If you need to save on file size, you can adjust it downwards. The standard for low quality is, 8000 or 8kHz.
	131
	132	WAV files here are using short, 16 bit, signed integers for the sample size. So, we multiply the floating-point data we have by 32767, the maximum value for a short integer.
	133
	134	> It is theoretically possible to use the floating point -1.0 to 1.0 data directly in a WAV file, but not obvious how to do that using the wave module in Python.
	135
	136	## Generating Spectograms
	137
	138	I have tried two methods of doing this and both were just fine. I however opted out to use the [SoX - Sound eXchange, the Swiss Army knife of audio manipulation](https://linux.die.net/man/1/sox) one because it didn't require anything else.
	139
	140	```shell
	141	sox output.wav -n spectrogram -o spectrogram.png
	142	```
	143
	144	An example spectrogram of Ludwig van Beethoven Symphony No. 6 First movement.
	145
	146	<audio controls>
	147	<source src="/assets/dna-synthesized/symphony-no6-1st-movement.mp3" type="audio/mpeg">
	148	</audio>
	149
	150	![Ludwig van Beethoven Symphony No. 6 First movement](/assets/dna-synthesized/symphony-no6-1st-movement.png)
	151
	152	The other option could also be in combination with [gnuplot](http://www.gnuplot.info/). This would require an intermediary step, however.
	153
	154	```shell
	155	sox output.wav audio.dat
	156	tail -n+3 audio.dat > audio_only.dat
	157	gnuplot audio.gpi
	158	```
	159
	160	And input file `audio.gpi` that would be passed to gnuplot looks something like this.
	161
	162	```
	163	# set output format and size
	164	set term png size 1000,280
	165
	166	# set output file
	167	set output "audio.png"
	168
	169	# set y range
	170	set yr [-1:1]
	171
	172	# we want just the data
	173	unset key
	174	unset tics
	175	unset border
	176	set lmargin 0
	177	set rmargin 0
	178	set tmargin 0
	179	set bmargin 0
	180
	181	# draw rectangle to change background color
	182	set obj 1 rectangle behind from screen 0,0 to screen 1,1
	183	set obj 1 fillstyle solid 1.0 fillcolor rgbcolor "#ffffff"
	184
	185	# draw data with foreground color
	186	plot "audio_only.dat" with lines lt rgb 'red'
	187	```
	188
	189	## Pre-generated sequences
	190
	191	### Niels Bohr quote
	192
	193	<audio controls>
	194	<source src="/assets/dna-synthesized/quote/out.mp3" type="audio/mpeg">
	195	</audio>
	196
	197	![Spectogram](/assets/dna-synthesized/quote/spectogram.png)
	198
	199	### Mouse
	200
	201	This is part of a mouse genome `Mus_musculus.GRCm39.dna.nonchromosomal`. You can get [genom data here](http://ftp.ensembl.org/pub/release-106/fasta/mus_musculus/dna/).
	202
	203	<audio controls>
	204	<source src="/assets/dna-synthesized/mouse/out.mp3" type="audio/mpeg">
	205	</audio>
	206
	207	![Spectogram](/assets/dna-synthesized/mouse/spectogram.png)
	208
	209	### Bison
	210
	211	This is part of a mouse genome `Bison_bison_bison.Bison_UMD1.0.cdna`. You can get [genom data here](http://ftp.ensembl.org/pub/release-106/fasta/bison_bison_bison/cdna/).
	212
	213	<audio controls>
	214	<source src="/assets/dna-synthesized/bison/out.mp3" type="audio/mpeg">
	215	</audio>
	216
	217	![Spectogram](/assets/dna-synthesized/bison/spectogram.png)
	218
	219	### Taurus
	220
	221	This is part of a mouse genome `Bos_taurus.ARS-UCD1.2.cdna`. You can get [genom data here](http://ftp.ensembl.org/pub/release-106/fasta/bos_taurus/cdna/).
	222
	223	<audio controls>
	224	<source src="/assets/dna-synthesized/taurus/out.mp3" type="audio/mpeg">
	225	</audio>
	226
	227	![Spectogram](/assets/dna-synthesized/taurus/spectogram.png)
	228
	229	## Going even further
	230
	231	As you probably notice, the end results are quite similar to each other. This is to be expected because we are operating only with 4 notes essentially. What could make this more interesting is using something like [Supercollider](https://supercollider.github.io/) to create more interesting sounds. By transposing notes or using effects based on repeated data in a sequence. Possibilities are endless.
	232
	233	I actually find the results fascinating. I took some time and listened to this music of nature. Even though it's quite the same, it's also quite different. The subtle differences on repeat kind of creates music on its own. Makes you wonder. It kind of puts Occam’s Razor in its place. Nature for sure loves to make things as energy efficient as possible.