A unified system for recording hereditary information in molecules. Code within code: second genetic code revealed. Flatworms of the class Rhabditophora

Leading scientific journal Nature announced the discovery of a second genetic code - a kind of "code within a code", which was recently cracked by molecular biologists and computer programmers. Moreover, in order to reveal it, they did not use evolutionary theory, but information technology.

New code called the Splicing Code. It is within the DNA. This code controls the underlying genetic code in a very complex yet predictable way. The splicing code controls how and when genes and regulatory elements are assembled. Revealing this code within a code helps shed light on some of the long-standing mysteries of genetics that have surfaced since the Complete Human Genome Sequencing Project. One such mystery was why there are only 20,000 genes in an organism as complex as the human being? (Scientists expected to find a lot more.) Why are genes broken into segments (exons) that are separated by non-coding elements (introns) and then joined together (i.e., spliced) after transcription? And why are genes turned on in some cells and tissues and not in others? For two decades molecular biologists tried to elucidate the mechanisms of genetic regulation. This article points to a very important point understanding what is really going on. It doesn't answer every question, but it does demonstrate that the internal code exists. This code is a communication system that can be deciphered so clearly that scientists could predict how a genome might behave in certain situations and with inexplicable accuracy.

Imagine that you hear an orchestra in the next room. You open the door, look inside and see three or four musicians playing musical instruments in the room. This is what Brandon Frey, who helped break the code, says the human genome looks like. He says: “We were only able to detect 20,000 genes, but we knew that they form a huge number of protein products and regulatory elements. How? One of the methods is called alternative splicing". Different exons (parts of genes) can assemble different ways. “For example, three genes for the neurexin protein can create over 3,000 genetic messages that help control the brain’s wiring system.” Frey says. Right there in the article, it says that scientists know that 95% of our genes have alternative splicing, and in most cases, transcripts (RNA molecules resulting from transcription) are expressed differently in different types of cells and tissues. There must be something that controls how these thousands of combinations are assembled and expressed. This is the task of the Splicing Code.

Readers who want a quick overview of the discovery can read the article at Science Daily entitled "Researchers who cracked the 'Splicing Code' unravel the mystery behind biological complexity". The article says: “Scientists at the University of Toronto have gained a fundamental new understanding of how living cells use a limited number of genes to form incredibly complex organs like the brain.”. Nature magazine itself begins with Heidi Ledford's "Code Within Code." This was followed by a paper by Tejedor and Valcarcel titled “Gene Regulation: Breaking the Second Genetic Code. Finally, a paper by a group of researchers from the University of Toronto led by Benjamin D. Blencoe and Brandon D. Frey, "Deciphering the Splicing Code," was decisive.

This article is an information science victory that reminds us of codebreakers from World War II. Their methods included algebra, geometry, probability theory, vector calculus, information theory, program code optimization, and other advanced techniques. What they didn't need was evolutionary theory , which has never been mentioned in scientific articles. Reading this article, you can see how much tension the authors of this overture are under:

“We describe a ‘splicing code’ scheme that uses combinations of hundreds of RNA properties to predict tissue-mediated changes in alternative splicing of thousands of exons. The code establishes new classes of splicing patterns, recognizes different regulatory programs in different tissues, and establishes mutation-controlled regulatory sequences. We have uncovered widely used regulatory strategies, including: using unexpectedly large property pools; detection of low levels of exon inclusion, which are attenuated by the properties of specific tissues; the manifestation of properties in introns is deeper than previously thought; and modulation of the levels of the splice variant by the structural characteristics of the transcript. The code helped establish a class of exons whose inclusion mutes expression in adult tissues, activating mRNA degradation, and whose exclusion promotes expression during embryogenesis. The code facilitates the disclosure and detailed description of genome-wide regulated events of alternative splicing.”

The team that cracked the code included specialists from the Department of Electronic and computer science, as well as from the Department of Molecular Genetics. (Frey himself works for Microsoft Research, a division of Microsoft Corporation) Like the decoders of the past, Frey and Barash developed "a new computer-assisted biological analysis that detects 'code words' hidden within the genome". With the help of a huge amount of data created by molecular geneticists, a group of researchers carried out "reverse engineering" of the splicing code until they could predict how he would act. Once the researchers got the hang of it, they tested the code for mutations and saw how exons were inserted or removed. They found that the code could even cause tissue-specific changes or act differently depending on whether it was an adult mouse or an embryo. One gene, Xpo4, is associated with cancer; The researchers noted: “These data support the conclusion that Xpo4 gene expression must be tightly controlled to avoid potential detrimental effects, including oncogenesis (cancer), since it is active during embryogenesis but is reduced in adult tissues. It turns out that they were absolutely surprised by the level of control they saw. Intentionally or not, Frey did not use random variation and selection as a clue, but the language of intelligent design. He noted: "Understanding a complex biological system is like understanding a complex electronic circuit."

Heidi Ledford said that the apparent simplicity of Watson-Crick's genetic code, with its four bases, triplet codons, 20 amino acids, and 64 DNA "characters" - hides a whole world of complexity. Imprisoned inside this over simple code The splicing code is much more complicated.

But between DNA and proteins lies RNA, a separate world of complexity. RNA is a transformer that sometimes carries genetic messages, and sometimes controls them, while using many structures that can influence its function. In a paper published in the same issue, a team of researchers led by Benjamin D. Blencow and Brandon D. Frey at the University of Toronto in Ontario, Canada, report attempts to unravel a second genetic code that can predict how messenger RNA segments are transcribed from a particular genes can mix and match to form a variety of products in different tissues. This process is known as alternative splicing. This time there is no simple table - instead, algorithms that combine more than 200 different properties of DNA with definitions of the structure of RNA.

The work of these researchers points to the rapid progress that computational methods have made in modeling RNA. In addition to understanding alternative splicing, computer science is helping scientists predict RNA structures and identify small regulatory fragments of RNA that do not code for proteins. "It's a wonderful time", says Christopher Berg, a computer biologist at the Massachusetts Institute of Technology in Cambridge. “In the future, we will have a huge success”.

Computer science, computer biology, algorithms, and codes were not part of Darwin's vocabulary when he developed his theory. Mendel had a very simplified model of how traits are distributed during inheritance. In addition, the idea that features are encoded was only introduced in 1953. We see that the original genetic code is regulated by an even more complex code included in it. These are revolutionary ideas.. Moreover, there are all indications that this level of control is not the last. Ledford reminds us that, for example, RNA and proteins have a three-dimensional structure. The function of molecules can change when their shape changes. There must be something that controls folding so that the three-dimensional structure does what the function requires. In addition, access to genes appears to be controlled another code, histone code. This code is encoded by molecular markers or "tails" on histone proteins that serve as centers for DNA coiling and supercoiling. Describing our time, Ledford speaks of "permanent renaissance in RNC informatics".

Tejedor and Valcarcel agree that complexity lies behind simplicity. “In theory, everything looks very simple: DNA forms RNA, which then creates a protein”, - they begin their article. “But the reality is much more complicated.”. In the 1950s, we learned that all living organisms, from bacteria to humans, have a basic genetic code. But we soon realized that complex organisms (eukaryotes) have some unnatural and difficult to understand property: their genomes have peculiar sections, introns, that must be removed so that exons can join together. Why? The fog is clearing today “The main advantage of this mechanism is that it allows different cells choose alternative ways of splicing the messenger RNA precursor (pre-mRNA) and thus one gene forms different messages”, they explain, “and then different mRNAs can encode different proteins With various functions» . From less code, you get more information, as long as there is this other code inside the code that knows how to do it.

What makes cracking the splicing code so difficult is that the factors that control exon assembly are set by many other factors: sequences near exon boundaries, intron sequences, and regulatory factors that either aid or inhibit the splicing mechanism. Besides, "the effects of a certain sequence or factor may vary depending on its location relative to the boundaries of the intron-exon or other regulatory motifs", - Tejedor and Valcarcel explain. "Therefore challenging task in predicting tissue-specific splicing is to compute the algebra of a myriad of motifs and the relationships between the regulatory factors that recognize them".

To solve this problem, a team of researchers entered into the computer a huge amount of data about the RNA sequences and the conditions under which they were formed. "The computer was then given the task of identifying the combination of properties that would best explain the experimentally established tissue-specific exon selection.". In other words, the researchers reverse engineered the code. Like World War II codebreakers, once scientists know the algorithm, they can make predictions: "It correctly and accurately identified alternative exons and predicted their differential regulation between pairs of tissue types." And just like any good scientific theory, the discovery provided new insights: “This allowed us to re-explain previously established regulatory motivations and pointed to previously unknown properties of known regulators, as well as unexpected functional relationships between them.”, the researchers noted. “For example, the code implies that the inclusion of exons leading to processed proteins is a general mechanism for controlling the process of gene expression during the transition from embryonic tissue to adult tissue.”.

Tejedor and Valcarcel consider the publication of their article important first step: "The work... is better seen as the discovery of the first fragment of the much larger Rosetta Stone needed to decipher the alternative messages of our genome." According to these scientists, future research will undoubtedly improve their knowledge of this new code. At the end of their article, they mention evolution in passing, and they do it in a very unusual way. They say, “That doesn't mean that evolution created these codes. This means that progress will require an understanding of how the codes interact. Another surprise was that the degree of conservation observed to date raises the question of the possible existence of "species-specific codes".

The code probably works in every single cell, and therefore must be responsible for more than 200 types of mammalian cells. It also has to cope with a huge variety of alternative splicing schemes, not to mention simple solutions on the inclusion or skipping of a single exon. The limited evolutionary retention of regulation of alternative splicing (estimated to be about 20% between humans and mice) raises the question of the existence of species-specific codes. Moreover, the relationship between DNA processing and gene transcription influences alternative splicing, and recent evidence points to the packaging of DNA by histone proteins and histone covalent modifications (the so-called epigenetic code) in the regulation of splicing. Therefore, future methods will have to establish the exact interaction between the histone code and the splicing code. The same applies to the still little understood influence of complex RNA structures on alternative splicing.

Codes, codes and more codes. The fact that scientists say almost nothing about Darwinism in these papers indicates that evolutionary theorists, adherents of old ideas and traditions, have a lot to think about after they read these papers. But those who are enthusiastic about the biology of codes will be at the forefront. They have a great opportunity to take advantage of the exciting web application that the codebreakers have created to encourage further exploration. It can be found on the University of Toronto website called "Alternative Splicing Prediction Website". Visitors will look in vain for mention of evolution here, despite the old axiom that nothing in biology makes sense without it. A new version this 2010 expression might sound like this: "Nothing in biology makes sense unless viewed in the light of computer science" .

Links and notes

We're glad we were able to tell you about this story on the day it was published. Perhaps this is one of the most significant scientific articles of the year. (Of course, every big discovery made by other groups of scientists, like the discovery of Watson and Crick, is significant.) The only thing we can say to this is: “Wow!” This discovery is a remarkable confirmation of Designed Creation and a huge challenge to the Darwinian empire. It is interesting how evolutionists will try to correct their simplified history of random mutations and natural selection, which was invented back in the 19th century, in the light of these new data.

Do you understand what Tejedor and Valcarcel are talking about? Views can have their own code specific to those views. “Therefore, future methods will have to establish the exact interaction between the histone [epigenetic] code and the splicing code,” they note. In translation, this means: “Darwinists have nothing to do with it. They just can't handle it." If the simple genetic code of Watson-Crick was a problem for the Darwinists, then what do they say now about the splicing code, which creates thousands of transcripts from the same genes? And how will they deal with the epigenetic code that controls gene expression? And who knows, maybe in this incredible “interaction” that we are just beginning to learn about, other codes are involved, reminiscent of the Rosetta Stone, just beginning to emerge from the sand?

Now that we're thinking about codes and computer science, we're starting to think about different paradigms for new research. What if the genome partially acts as a storage network? What if cryptography takes place in it or compression algorithms occur? We should remember about modern information systems and information storage technologies. Maybe we will even find elements of steganography. Undoubtedly, there are additional resistance mechanisms, such as duplications and corrections, that may help explain the existence of pseudogenes. Whole genome copying may be a response to stress. Some of these phenomena may be useful indicators historical events, which have nothing to do with a universal common ancestor, but help explore comparative genomics within informatics and resistance design, and help understand the cause of a disease.

Evolutionists find themselves in a major quandary. The researchers tried to modify the code, but got only cancer and mutations. How are they going to navigate the field of fitness when it's all mined with catastrophes waiting in the wings as soon as someone starts tampering with these inextricably linked codes? We know there is some built-in resilience and portability, but the whole picture is an incredibly complex, designed, optimized information system, not a jumble of pieces that can be played around endlessly. The whole idea of ​​code is the concept of intelligent design.

A.E. Wilder-Smith emphasized this. The code assumes an agreement between the two parts. An agreement is an agreement in advance. It implies planning and purpose. The SOS symbol, as Wilder-Smith would say, we use by convention as a distress signal. SOS does not look like a disaster. It doesn't smell like a disaster. It doesn't feel like a disaster. People would not understand that these letters stand for disaster if they did not understand the essence of the agreement itself. Similarly, an alanine codon, HCC, does not look, smell, or feel like alanine. A codon would have nothing to do with alanine unless there was a pre-established agreement between the two coding systems (protein code and DNA code) that "GCC should stand for alanine." To convey this agreement, a family of transducers, aminoacyl-tRNA synthetases, are used, which translate one code into another.

This was to strengthen the theory of design in the 1950s, and many creationists preached it effectively. But evolutionists are like eloquent salesmen. They made up their tales about the Tinker Bell fairy, who deciphers the code and creates new species through mutation and selection, and convinced many people that miracles can still happen today. Well, well, today is the 21st century outside the window and we know the epigenetic code and the splicing code - two codes that are much more complex and dynamic than the simple code of DNA. We know about codes within codes, about codes above codes and below codes - we know a whole hierarchy of codes. This time, evolutionists can't just put their finger in a gun and bluff us with their beautiful speeches when guns are placed on both sides - a whole arsenal aimed at their main structural elements. All this is a game. A whole era of computer science has grown around them, they have long gone out of fashion and look like the Greeks, who are trying to climb modern tanks and helicopters with spears.

Sad to admit, evolutionists don't understand this, or even if they do, they're not going to give up. Incidentally, this week, just as the article on the Splicing Code was published, the most vicious and hateful anti-creation and intelligent design rhetoric in recent memory has been pouring from the pages of pro-Darwinian magazines and newspapers. We are yet to hear of many more such examples. And as long as they hold the microphones in their hands and control the institutions, many people will fall for them, thinking that science continues to give them a good reason. We are telling you all this so that you will read this material, study it, understand it, and stock up on the information you need in order to combat this fanatical, misleading nonsense with the truth. Now, go ahead!

On the right is the largest human DNA helix built from people on the beach in Varna (Bulgaria), which was included in the Guinness Book of Records on April 23, 2016

Deoxyribonucleic acid. General information

DNA (deoxyribonucleic acid) is a kind of blueprint of life, a complex code that contains data on hereditary information. This complex macromolecule is capable of storing and transmitting hereditary genetic information from generation to generation. DNA determines such properties of any living organism as heredity and variability. The information encoded in it determines the entire development program of any living organism. Genetically embedded factors predetermine the entire course of life of both a person and any other organism. Artificial or natural influence of the external environment can only slightly affect the overall severity of individual genetic traits or affect the development of programmed processes.

Deoxyribonucleic acid(DNA) is a macromolecule (one of the three main ones, the other two are RNA and proteins), which provides storage, transmission from generation to generation and implementation of the genetic program for the development and functioning of living organisms. DNA contains information about the structure various kinds RNA and proteins.

In eukaryotic cells (animals, plants, and fungi), DNA is found in the cell nucleus as part of chromosomes, as well as in some cell organelles (mitochondria and plastids). In the cells of prokaryotic organisms (bacteria and archaea), a circular or linear DNA molecule, the so-called nucleoid, is attached from the inside to cell membrane. They and lower eukaryotes (for example, yeast) also have small autonomous, mostly circular DNA molecules called plasmids.

From a chemical point of view, DNA is a long polymeric molecule consisting of repeating blocks - nucleotides. Each nucleotide is made up of a nitrogenous base, a sugar (deoxyribose), and a phosphate group. The bonds between nucleotides in a chain are formed by deoxyribose ( WITH) and phosphate ( F) groups (phosphodiester bonds).


Rice. 2. Nuclertide consists of a nitrogenous base, sugar (deoxyribose) and a phosphate group

In the overwhelming majority of cases (except for some viruses containing single-stranded DNA), the DNA macromolecule consists of two chains oriented by nitrogenous bases to each other. This double-stranded molecule is twisted in a helix.

There are four types of nitrogenous bases found in DNA (adenine, guanine, thymine, and cytosine). The nitrogenous bases of one of the chains are connected to the nitrogenous bases of the other chain by hydrogen bonds according to the principle of complementarity: adenine combines only with thymine ( A-T), guanine - only with cytosine ( G-C). It is these pairs that make up the "rungs" of the helical "ladder" of DNA (see: Fig. 2, 3 and 4).


Rice. 2. Nitrogenous bases

The sequence of nucleotides allows you to "encode" information about various types of RNA, the most important of which are informational or template (mRNA), ribosomal (rRNA) and transport (tRNA). All these types of RNA are synthesized on the DNA template by copying the DNA sequence into the RNA sequence synthesized during transcription and take part in protein biosynthesis (translation process). In addition to coding sequences, cell DNA contains sequences that perform regulatory and structural functions.


Rice. 3. DNA replication

Location of basic combinations chemical compounds DNA and quantitative relationships between these combinations provide encoding of hereditary information.

Education new DNA (replication)

  1. The process of replication: the unwinding of the DNA double helix - the synthesis of complementary strands by DNA polymerase - the formation of two DNA molecules from one.
  2. The double helix "unzips" into two branches when enzymes break the bond between the base pairs of chemical compounds.
  3. Each branch is a new DNA element. New base pairs are connected in the same sequence as in the parent branch.

Upon completion of the duplication, two independent helices are formed, created from the chemical compounds of the parent DNA and having the same genetic code with it. In this way, DNA is able to rip through information from cell to cell.

More detailed information:

STRUCTURE OF NUCLEIC ACIDS


Rice. 4 . Nitrogenous bases: adenine, guanine, cytosine, thymine

Deoxyribonucleic acid(DNA) refers to nucleic acids. Nucleic acids is a class of irregular biopolymers whose monomers are nucleotides.

NUCLEOTIDES consist of nitrogenous base, connected to a five-carbon carbohydrate (pentose) - deoxyribose(in the case of DNA) or ribose(in the case of RNA), which combines with a phosphoric acid residue (H 2 PO 3 -).

Nitrogenous bases There are two types: pyrimidine bases - uracil (only in RNA), cytosine and thymine, purine bases - adenine and guanine.


Rice. Fig. 5. The structure of nucleotides (left), the location of the nucleotide in DNA (bottom) and the types of nitrogenous bases (right): pyrimidine and purine


The carbon atoms in a pentose molecule are numbered from 1 to 5. Phosphate combines with the third and fifth carbon atoms. This is how nucleic acids are linked together to form a chain of nucleic acids. Thus, we can isolate the 3' and 5' ends of the DNA strand:


Rice. 6. Isolation of the 3' and 5' ends of the DNA strand

Two strands of DNA form double helix. These chains in a spiral are oriented in opposite directions. In different strands of DNA, nitrogenous bases are connected to each other by means of hydrogen bonds. Adenine always combines with thymine, and cytosine always combines with guanine. It is called complementarity rule(cm. principle of complementarity).

Complementarity rule:

A-T G-C

For example, if we are given a DNA strand that has the sequence

3'-ATGTCCTAGCTGCTCG - 5',

then the second chain will be complementary to it and directed in the opposite direction - from the 5'-end to the 3'-end:

5'- TACAGGATCGACGAGC- 3'.


Rice. 7. The direction of the chains of the DNA molecule and the connection of nitrogenous bases using hydrogen bonds

DNA REPLICATION

DNA replication is the process of duplicating a DNA molecule by matrix synthesis. In most cases of natural DNA replicationprimerfor DNA synthesis is short snippet (created again). Such a ribonucleotide primer is created by the enzyme primase (DNA primase in prokaryotes, DNA polymerase in eukaryotes), and is subsequently replaced by deoxyribonucleotide polymerase, which normally performs repair functions (correcting chemical damage and breaks in the DNA molecule).

Replication occurs in a semi-conservative manner. This means that the double helix of DNA unwinds and a new chain is completed on each of its chains according to the principle of complementarity. The daughter DNA molecule thus contains one strand from the parent molecule and one newly synthesized. Replication occurs in the 3' to 5' direction of the parent strand.

Rice. 8. Replication (doubling) of the DNA molecule

DNA synthesis- this is not such a complicated process as it might seem at first glance. If you think about it, then first you need to figure out what synthesis is. It is the process of bringing something together. The formation of a new DNA molecule takes place in several stages:

1) DNA topoisomerase, located in front of the replication fork, cuts the DNA in order to facilitate its unwinding and unwinding.
2) DNA helicase, following topoisomerase, affects the process of "unwinding" the DNA helix.
3) DNA-binding proteins carry out the binding of DNA strands, and also carry out their stabilization, preventing them from sticking to each other.
4) DNA polymerase δ(delta) , coordinated with the speed of movement of the replication fork, performs the synthesisleadingchains subsidiary DNA in the direction 5" → 3" on the matrix maternal strands of DNA in the direction from its 3" end to the 5" end (speed up to 100 base pairs per second). These events on this maternal strands of DNA are limited.



Rice. 9. Schematic representation of the DNA replication process: (1) Lagging strand (lag strand), (2) Leading strand (leading strand), (3) DNA polymerase α (Polα), (4) DNA ligase, (5) RNA -primer, (6) Primase, (7) Okazaki fragment, (8) DNA polymerase δ (Polδ ), (9) Helicase, (10) Single-stranded DNA-binding proteins, (11) Topoisomerase.

The synthesis of the lagging daughter DNA strand is described below (see below). scheme replication fork and function of replication enzymes)

For more information on DNA replication, see

5) Immediately after the unwinding and stabilization of another strand of the parent molecule, it joinsDNA polymerase α(alpha)and in the direction 5 "→3" synthesizes a primer (RNA primer) - an RNA sequence on a DNA template with a length of 10 to 200 nucleotides. After that, the enzymeremoved from the DNA strand.

Instead of DNA polymeraseα attached to the 3" end of the primer DNA polymeraseε .

6) DNA polymeraseε (epsilon) as if continues to lengthen the primer, but as a substrate embedsdeoxyribonucleotides(in the amount of 150-200 nucleotides). As a result, a solid thread is formed from two parts -RNA(i.e. primer) and DNA. DNA polymerase εworks until it encounters the primer of the previousfragment Okazaki(synthesized a little earlier). This enzyme is then removed from the chain.

7) DNA polymerase β(beta) stands in place ofDNA polymerases ε,moves in the same direction (5" → 3") and removes primer ribonucleotides while inserting deoxyribonucleotides in their place. The enzyme works until the complete removal of the primer, i.e. until a deoxyribonucleotide (even more previously synthesizedDNA polymerase ε). The enzyme is not able to link the result of its work and the DNA in front, so it leaves the chain.

As a result, a fragment of the daughter DNA "lies" on the matrix of the mother thread. It is calledfragment of Okazaki.

8) DNA ligase ligates two adjacent fragments Okazaki , i.e. 5 "-end of the segment, synthesizedDNA polymerase ε,and 3" chain end built-inDNA polymeraseβ .

STRUCTURE OF RNA

Ribonucleic acid(RNA) is one of the three main macromolecules (the other two are DNA and proteins) that are found in the cells of all living organisms.

Just like DNA, RNA is made up of a long chain in which each link is called nucleotide. Each nucleotide is made up of a nitrogenous base, a ribose sugar, and a phosphate group. However, unlike DNA, RNA usually has one rather than two strands. Pentose in RNA is represented by ribose, not deoxyribose (ribose has an additional hydroxyl group on the second carbohydrate atom). Finally, DNA differs from RNA in the composition of nitrogenous bases: instead of thymine ( T) uracil is present in RNA ( U) , which is also complementary to adenine.

The sequence of nucleotides allows RNA to encode genetic information. All cellular organisms use RNA (mRNA) to program protein synthesis.

Cellular RNAs are formed in a process called transcription , that is, the synthesis of RNA on a DNA template, carried out by special enzymes - RNA polymerases.

Messenger RNAs (mRNAs) then take part in a process called broadcast, those. protein synthesis on the mRNA template with the participation of ribosomes. Other RNAs after transcription undergo chemical modifications, and after the formation of secondary and tertiary structures, they perform functions that depend on the type of RNA.

Rice. 10. The difference between DNA and RNA in terms of the nitrogenous base: instead of thymine (T), RNA contains uracil (U), which is also complementary to adenine.

TRANSCRIPTION

This is the process of RNA synthesis on a DNA template. DNA unwinds at one of the sites. One of the chains contains information that needs to be copied onto the RNA molecule - this chain is called coding. The second strand of DNA, which is complementary to the coding strand, is called the template strand. In the process of transcription on the template chain in the 3'-5' direction (along the DNA chain), an RNA chain complementary to it is synthesized. Thus, an RNA copy of the coding strand is created.

Rice. 11. Schematic representation of transcription

For example, if we are given the sequence of the coding strand

3'-ATGTCCTAGCTGCTCG - 5',

then, according to the rule of complementarity, the matrix chain will carry the sequence

5'- TACAGGATCGACGAGC- 3',

and the RNA synthesized from it is the sequence

BROADCAST

Consider the mechanism protein synthesis on the RNA matrix, as well as the genetic code and its properties. Also, for clarity, at the link below, we recommend watching a short video about the processes of transcription and translation occurring in a living cell:

Rice. 12. Process of protein synthesis: DNA codes for RNA, RNA codes for protein

GENETIC CODE

Genetic code- a method of encoding the amino acid sequence of proteins using a sequence of nucleotides. Each amino acid is encoded by a sequence of three nucleotides - a codon or a triplet.

Genetic code common to most pro- and eukaryotes. The table lists all 64 codons and lists the corresponding amino acids. The base order is from the 5" to the 3" end of the mRNA.

Table 1. Standard genetic code

1st
the basis

nie

2nd base

3rd
the basis

nie

U

C

A

G

U

U U U

(Phe/F)

U C U

(Ser/S)

U A U

(Tyr/Y)

U G U

(Cys/C)

U

U U C

U C C

U A C

U G C

C

U U A

(Leu/L)

U C A

U A A

Stop codon**

U G A

Stop codon**

A

U U G

U C G

U A G

Stop codon**

U G G

(Trp/W)

G

C

C U U

C C U

(Pro/P)

C A U

(His/H)

C G U

(Arg/R)

U

C U C

C C C

C A C

C G C

C

C U A

C C A

C A A

(Gln/Q)

CGA

A

C U G

C C G

C A G

C G G

G

A

A U U

(Ile/I)

A C U

(Thr/T)

A A U

(Asn/N)

A G U

(Ser/S)

U

A U C

A C C

A A C

A G C

C

A U A

A C A

A A A

(Lys/K)

A G A

A

A U G

(Met/M)

A C G

A A G

A G G

G

G

G U U

(Val/V)

G C U

(Ala/A)

G A U

(Asp/D)

G G U

(Gly/G)

U

G U C

G C C

G A C

G G C

C

G U A

G C A

G A A

(Glu/E)

G G A

A

G U G

G C G

G A G

G G G

G

Among the triplets, there are 4 special sequences that act as "punctuation marks":

  • *Triplet AUG, also encoding methionine, is called start codon. This codon begins the synthesis of a protein molecule. Thus, during protein synthesis, the first amino acid in the sequence will always be methionine.
  • **Triplets UAA, UAG And UGA called stop codons and do not code for any amino acids. At these sequences, protein synthesis stops.

Properties of the genetic code

1. Tripletity. Each amino acid is encoded by a sequence of three nucleotides - a triplet or codon.

2. Continuity. There are no additional nucleotides between the triplets, information is read continuously.

3. Non-overlapping. One nucleotide cannot be part of two triplets at the same time.

4. Uniqueness. One codon can code for only one amino acid.

5. Degeneracy. One amino acid can be encoded by several different codons.

6. Versatility. The genetic code is the same for all living organisms.

Example. We are given the sequence of the coding strand:

3’- CCGATTGCACGTCGATCGTATA- 5’.

The matrix chain will have the sequence:

5’- GGCTAACGTGCAGCTAGCATAT- 3’.

Now we “synthesize” informational RNA from this chain:

3’- CCGAUUGCACGUCGAUCGUAUA- 5’.

Protein synthesis goes in the direction 5' → 3', therefore, we need to flip the sequence in order to "read" the genetic code:

5’- AUAUGCUAGCUGCACGUUAGCC- 3’.

Now find the start codon AUG:

5’- AU AUG CUAGCUGCACGUUAGCC- 3’.

Divide the sequence into triplets:

sounds like this: information from DNA is transferred to RNA (transcription), from RNA to protein (translation). DNA can also be duplicated by replication, and the process of reverse transcription is also possible, when DNA is synthesized from an RNA template, but such a process is mainly characteristic of viruses.


Rice. 13. central dogma molecular biology

GENOM: GENES AND CHROMOSOMES

(general concepts)

Genome - the totality of all the genes of an organism; its complete chromosome set.

The term "genome" was proposed by G. Winkler in 1920 to describe the totality of genes contained in the haploid set of chromosomes of organisms of the same biological species. The original meaning of this term indicated that the concept of the genome, in contrast to the genotype, is a genetic characteristic of the species as a whole, and not of an individual. With the development of molecular genetics, the meaning of this term has changed. It is known that DNA, which is the carrier genetic information in most organisms and, therefore, forms the basis of the genome, includes not only genes in the modern sense of the word. Most of the DNA of eukaryotic cells is represented by non-coding (“redundant”) nucleotide sequences that do not contain information about proteins and nucleic acids. Thus, the main part of the genome of any organism is the entire DNA of its haploid set of chromosomes.

Genes are segments of DNA molecules that code for polypeptides and RNA molecules.

Over the past century, our understanding of genes has changed significantly. Previously, a genome was a region of a chromosome that encodes or determines one trait or phenotypic(visible) property, such as eye color.

In 1940, George Beadle and Edward Tatham proposed a molecular definition of a gene. Scientists processed fungus spores Neurospora crassa x-rays and other agents that cause changes in the DNA sequence ( mutations), and found mutant strains of the fungus that lost some specific enzymes, which in some cases led to a violation of the whole metabolic pathway. Beadle and Tatham came to the conclusion that a gene is a section of genetic material that defines or codes for a single enzyme. This is how the hypothesis "one gene, one enzyme". This concept was later extended to the definition "one gene - one polypeptide", since many genes encode proteins that are not enzymes, and a polypeptide can be a subunit of a complex protein complex.

On fig. 14 shows a diagram of how DNA triplets determine a polypeptide, the amino acid sequence of a protein, mediated by mRNA. One of the DNA strands plays the role of a template for the synthesis of mRNA, the nucleotide triplets (codons) of which are complementary to the DNA triplets. In some bacteria and many eukaryotes, coding sequences are interrupted by non-coding regions (called introns).

Modern biochemical definition of a gene even more specifically. Genes are all sections of DNA that encode the primary sequence of end products, which include polypeptides or RNA that have a structural or catalytic function.

Along with genes, DNA also contains other sequences that perform an exclusively regulatory function. Regulatory sequences may mark the beginning or end of genes, affect transcription, or indicate the site of initiation of replication or recombination. Some genes can be expressed in different ways, with the same piece of DNA serving as a template for the formation of different products.

We can roughly calculate minimum gene size coding for the intermediate protein. Each amino acid in a polypeptide chain is encoded by a sequence of three nucleotides; the sequences of these triplets (codons) correspond to the chain of amino acids in the polypeptide encoded by the given gene. A polypeptide chain of 350 amino acid residues (medium length chain) corresponds to a sequence of 1050 bp. ( bp). However, many eukaryotic genes and some prokaryotic genes are interrupted by DNA segments that do not carry information about the protein, and therefore turn out to be much longer than a simple calculation shows.

How many genes are on one chromosome?


Rice. 15. View of chromosomes in prokaryotic (left) and eukaryotic cells. Histones are a broad class of nuclear proteins that perform two main functions: they are involved in the packaging of DNA strands in the nucleus and in the epigenetic regulation of nuclear processes such as transcription, replication, and repair.

The DNA of prokaryotes is more simple: their cells do not have a nucleus, so the DNA is located directly in the cytoplasm in the form of a nucleoid.

As is known, bacterial cells have a chromosome in the form of a strand of DNA, packed into a compact structure - a nucleoid. prokaryotic chromosome Escherichia coli, whose genome is completely decoded, is a circular DNA molecule (in fact, this is not a regular circle, but rather a loop without beginning and end), consisting of 4,639,675 bp. This sequence contains approximately 4300 protein genes and another 157 genes for stable RNA molecules. IN human genome approximately 3.1 billion base pairs corresponding to almost 29,000 genes located on 24 different chromosomes.

Prokaryotes (Bacteria).

Bacterium E. coli has one double-stranded circular DNA molecule. It consists of 4,639,675 b.p. and reaches a length of approximately 1.7 mm, which exceeds the length of the cell itself E. coli about 850 times. In addition to the large circular chromosome as part of the nucleoid, many bacteria contain one or more small circular DNA molecules that are freely located in the cytosol. These extrachromosomal elements are called plasmids(Fig. 16).

Most plasmids consist of only a few thousand base pairs, some contain more than 10,000 bp. They carry genetic information and replicate to form daughter plasmids, which enter the daughter cells during the division of the parent cell. Plasmids are found not only in bacteria, but also in yeast and other fungi. In many cases, plasmids offer no advantage to the host cells and their only job is to reproduce independently. However, some plasmids carry genes useful to the host. For example, genes contained in plasmids can confer resistance to antibacterial agents in bacterial cells. Plasmids carrying the β-lactamase gene confer resistance to β-lactam antibiotics such as penicillin and amoxicillin. Plasmids can pass from antibiotic-resistant cells to other cells of the same or different bacterial species, causing those cells to also become resistant. Intensive use of antibiotics is a powerful selective factor that promotes the spread of plasmids encoding antibiotic resistance (as well as transposons that encode similar genes) among pathogenic bacteria, and leads to the emergence of bacterial strains with resistance to several antibiotics. Doctors are beginning to understand the dangers of widespread use of antibiotics and prescribe them only when absolutely necessary. For similar reasons, the widespread use of antibiotics for the treatment of farm animals is limited.

See also: Ravin N.V., Shestakov S.V. Genome of prokaryotes // Vavilov Journal of Genetics and Breeding, 2013. V. 17. No. 4/2. pp. 972-984.

Eukaryotes.

Table 2. DNA, genes and chromosomes of some organisms

shared DNA,

b.s.

Number of chromosomes*

Approximate number of genes

Escherichia coli(bacterium)

4 639 675

4 435

Saccharomyces cerevisiae(yeast)

12 080 000

16**

5 860

Caenorhabditis elegans(nematode)

90 269 800

12***

23 000

Arabidopsis thaliana(plant)

119 186 200

33 000

Drosophila melanogaster(fruit fly)

120 367 260

20 000

Oryza sativa(rice)

480 000 000

57 000

Mus muscle(mouse)

2 634 266 500

27 000

Homo sapiens(Human)

3 070 128 600

29 000

Note. Information is constantly updated; For more up-to-date information, refer to individual genomic project websites.

* For all eukaryotes, except yeast, the diploid set of chromosomes is given. diploid kit chromosomes (from Greek diploos - double and eidos - view) - double set of chromosomes(2n), each of which has a homology to itself.
**Haploid set. Wild strains of yeast typically have eight (octaploid) or more sets of these chromosomes.
***For females with two X chromosomes. Males have an X chromosome, but no Y, i.e. only 11 chromosomes.

A yeast cell, one of the smallest eukaryotes, has 2.6 times more DNA than a cell E. coli(Table 2). fruit fly cells Drosophila, a classic object of genetic research, contains 35 times more DNA, and human cells contain about 700 times more DNA than cells E. coli. Many plants and amphibians contain even more DNA. The genetic material of eukaryotic cells is organized in the form of chromosomes. Diploid set of chromosomes (2 n) depends on the type of organism (Table 2).

For example, in a human somatic cell there are 46 chromosomes ( rice. 17). Each chromosome in a eukaryotic cell, as shown in Fig. 17, A, contains one very large double-stranded DNA molecule. Twenty-four human chromosomes (22 paired chromosomes and two sex chromosomes X and Y) differ in length by more than 25 times. Each eukaryotic chromosome contains a specific set of genes.


Rice. 17. eukaryotic chromosomes.A- a pair of connected and condensed sister chromatids from the human chromosome. In this form, eukaryotic chromosomes remain after replication and in metaphase during mitosis. b- a complete set of chromosomes from a leukocyte of one of the authors of the book. Each normal human somatic cell contains 46 chromosomes.


The size and function of DNA as a matrix for storing and transmitting hereditary material explains the presence of special structural elements in the organization of this molecule. In higher organisms, DNA is distributed between chromosomes.

The set of DNA (chromosomes) of an organism is called the genome. Chromosomes are located in the cell nucleus and form a structure called chromatin. Chromatin is a complex of DNA and basic proteins (histones) in a 1:1 ratio. The length of DNA is usually measured by the number of pairs of complementary nucleotides (bp). For example, the 3rd human chromosomecentury is a DNA molecule with a size of 160 million bp. has a length of approximately 1 mm, therefore, a linearized molecule of the 3rd human chromosome would be 5 mm in length, and the DNA of all 23 chromosomes (~ 3 * 10 9 bp, MR = 1.8 * 10 12) of a haploid cell - egg or sperm cell - in a linearized form would be 1 m. With the exception of germ cells, all cells of the human body (there are about 1013 of them) contain a double set of chromosomes. During cell division, all 46 DNA molecules replicate and reorganize into 46 chromosomes.

If DNA molecules are joined together human genome(22 chromosomes and chromosomes X and Y or X and X), you get a sequence about one meter long. Note: In all mammals and other heterogametic male organisms, females have two X chromosomes (XX) and males have one X chromosome and one Y chromosome (XY).

Most human cells, so the total DNA length of such cells is about 2m. An adult human has about 10 14 cells, so the total length of all DNA molecules is 2・10 11 km. For comparison, the circumference of the Earth is 4・10 4 km, and the distance from the Earth to the Sun is 1.5・10 8 km. That's how amazingly compactly packaged DNA is in our cells!

In eukaryotic cells, there are other organelles containing DNA - these are mitochondria and chloroplasts. Many hypotheses have been put forward regarding the origin of mitochondrial and chloroplast DNA. The generally accepted point of view today is that they are the rudiments of the chromosomes of ancient bacteria that penetrated into the cytoplasm of the host cells and became the precursors of these organelles. Mitochondrial DNA codes for mitochondrial tRNA and rRNA, as well as several mitochondrial proteins. More than 95% of mitochondrial proteins are encoded by nuclear DNA.

STRUCTURE OF GENES

Consider the structure of the gene in prokaryotes and eukaryotes, their similarities and differences. Despite the fact that a gene is a section of DNA that codes for only one protein or RNA, in addition to the directly coding part, it also includes regulatory and other structural elements, which have a different structure in prokaryotes and eukaryotes.

coding sequence- the main structural and functional unit of the gene, it is in it that the triplets of nucleotides encodingamino acid sequence. It starts with a start codon and ends with a stop codon.

Before and after the coding sequence are untranslated 5' and 3' sequences. They perform regulatory and auxiliary functions, for example, ensure the landing of the ribosome on mRNA.

Untranslated and coding sequences make up the unit of transcription - the transcribed DNA region, that is, the DNA region from which mRNA is synthesized.

Terminator A non-transcribed region of DNA at the end of a gene where RNA synthesis stops.

At the beginning of the gene is regulatory area, which includes promoter And operator.

promoter- the sequence with which the polymerase binds during transcription initiation. Operator- this is the area to which special proteins can bind - repressors, which can reduce the activity of RNA synthesis from this gene - in other words, reduce it expression.

The structure of genes in prokaryotes

The general plan for the structure of genes in prokaryotes and eukaryotes does not differ - both contain a regulatory region with a promoter and operator, a transcription unit with coding and non-translated sequences, and a terminator. However, the organization of genes in prokaryotes and eukaryotes is different.

Rice. 18. Scheme of the structure of the gene in prokaryotes (bacteria) -the image is enlarged

At the beginning and at the end of the operon, there are common regulatory regions for several structural genes. From the transcribed region of the operon, one mRNA molecule is read, which contains several coding sequences, each of which has its own start and stop codon. From each of these areasone protein is synthesized. Thus, Several protein molecules are synthesized from one i-RNA molecule.

Prokaryotes are characterized by the combination of several genes into a single functional unit - operon. The work of the operon can be regulated by other genes, which can be noticeably removed from the operon itself - regulators. The protein translated from this gene is called repressor. It binds to the operator of the operon, regulating the expression of all the genes contained in it at once.

Prokaryotes are also characterized by the phenomenon transcription and translation conjugations.


Rice. 19 The phenomenon of conjugation of transcription and translation in prokaryotes - the image is enlarged

This pairing does not occur in eukaryotes due to the presence of a nuclear membrane that separates the cytoplasm, where translation occurs, from the genetic material, on which transcription occurs. In prokaryotes, during the synthesis of RNA on a DNA template, a ribosome can immediately bind to the synthesized RNA molecule. Thus, translation begins even before transcription is complete. Moreover, several ribosomes can simultaneously bind to one RNA molecule, synthesizing several molecules of one protein at once.

The structure of genes in eukaryotes

The genes and chromosomes of eukaryotes are very complexly organized.

Bacteria of many species have only one chromosome, and in almost all cases there is one copy of each gene on each chromosome. Only a few genes, such as rRNA genes, are contained in multiple copies. Genes and regulatory sequences make up almost the entire genome of prokaryotes. Moreover, almost every gene strictly corresponds to the amino acid sequence (or RNA sequence) that it encodes (Fig. 14).

Structural and functional organization eukaryotic genes are much more complex. Study of eukaryotic chromosomes, and later sequencing complete sequences eukaryotic genomes brought many surprises. Many, if not most, eukaryotic genes have interesting feature: their nucleotide sequences contain one or more DNA regions that do not encode the amino acid sequence of the polypeptide product. Such non-translated inserts disrupt the direct correspondence between the nucleotide sequence of the gene and the amino acid sequence of the encoded polypeptide. These untranslated segments in the genes are called introns, or built-in sequences, and the coding segments are exons. In prokaryotes, only a few genes contain introns.

So, in eukaryotes, there is practically no combination of genes into operons, and the coding sequence of a eukaryotic gene is most often divided into translated regions. - exons, and untranslated sections - introns.

In most cases, the function of introns has not been established. In general, only about 1.5% of human DNA is "coding", that is, it carries information about proteins or RNA. However, taking into account large introns, it turns out that 30% of human DNA consists of genes. Since genes make up a relatively small proportion of the human genome, a significant amount of DNA remains unaccounted for.

Rice. 16. Scheme of the structure of the gene in eukaryotes - the image is enlarged

From each gene, an immature, or pre-RNA, is first synthesized, which contains both introns and exons.

After that, the splicing process takes place, as a result of which the intron regions are excised, and a mature mRNA is formed, from which a protein can be synthesized.


Rice. 20. Alternative splicing process - the image is enlarged

Such an organization of genes allows, for example, when different forms of a protein can be synthesized from one gene, due to the fact that exons can be fused in different sequences during splicing.

Rice. 21. Differences in the structure of genes of prokaryotes and eukaryotes - the image is enlarged

MUTATIONS AND MUTAGENESIS

mutation called a persistent change in the genotype, that is, a change in the nucleotide sequence.

The process that leads to mutation is called mutagenesis, and the organism All whose cells carry the same mutation mutant.

mutation theory was first formulated by Hugh de Vries in 1903. Its modern version includes the following provisions:

1. Mutations occur suddenly, abruptly.

2. Mutations are passed down from generation to generation.

3. Mutations can be beneficial, deleterious or neutral, dominant or recessive.

4. The probability of detecting mutations depends on the number of individuals studied.

5. Similar mutations can occur repeatedly.

6. Mutations are not directed.

Mutations can be caused by various factors. Distinguish between mutations caused by mutagenic impacts: physical (eg ultraviolet or radiation), chemical (eg colchicine or reactive oxygen species) and biological (eg viruses). Mutations can also be caused replication errors.

Depending on the conditions for the appearance of mutations are divided into spontaneous- that is, mutations that have arisen in normal conditions, And induced- that is, mutations that arose under special conditions.

Mutations can occur not only in nuclear DNA, but also, for example, in the DNA of mitochondria or plastids. Accordingly, we can distinguish nuclear And cytoplasmic mutations.

As a result of the occurrence of mutations, new alleles can often appear. If the mutant allele overrides the normal allele, the mutation is called dominant. If the normal allele suppresses the mutated one, the mutation is called recessive. Most mutations that give rise to new alleles are recessive.

Mutations are distinguished by effect adaptive, leading to an increase in the adaptability of the organism to the environment, neutral that do not affect survival harmful that reduce the adaptability of organisms to environmental conditions and lethal leading to the death of the organism in the early stages of development.

According to the consequences, mutations are distinguished, leading to loss of protein function, mutations leading to emergence the protein has a new function, as well as mutations that change the dose of a gene, and, accordingly, the dose of protein synthesized from it.

A mutation can occur in any cell of the body. If a mutation occurs in a germ cell, it is called germinal(germinal, or generative). Such mutations do not appear in the organism in which they appeared, but lead to the appearance of mutants in the offspring and are inherited, so they are important for genetics and evolution. If the mutation occurs in any other cell, it is called somatic. Such a mutation can manifest itself to some extent in the organism in which it arose, for example, lead to the formation of cancerous tumors. However, such a mutation is not inherited and does not affect offspring.

Mutations can affect parts of the genome of different sizes. Allocate genetic, chromosomal And genomic mutations.

Gene mutations

Mutations that occur on a scale smaller than one gene are called genetic, or dotted (dotted). Such mutations lead to a change in one or more nucleotides in the sequence. Gene mutations includesubstitutions, leading to the replacement of one nucleotide by another,deletions leading to the loss of one of the nucleotides,insertions, leading to the addition of an extra nucleotide to the sequence.


Rice. 23. Gene (point) mutations

According to the mechanism of action on the protein, gene mutations are divided into:synonymous, which (as a result of the degeneracy of the genetic code) do not lead to a change in the amino acid composition of the protein product,missense mutations, which lead to the replacement of one amino acid by another and can affect the structure of the synthesized protein, although often they are insignificant,nonsense mutations, leading to the replacement of the coding codon with a stop codon,mutations leading to splicing disorder:


Rice. 24. Mutation schemes

Also, according to the mechanism of action on the protein, mutations are isolated, leading to frame shift readings such as insertions and deletions. Such mutations, like nonsense mutations, although they occur at one point in the gene, often affect the entire structure of the protein, which can lead to a complete change in its structure. when a segment of a chromosome rotates 180 degrees Rice. 28. Translocation

Rice. 29. Chromosome before and after duplication

Genomic mutations

Finally, genomic mutations affect the entire genome, that is, the number of chromosomes changes. Polyploidy is distinguished - an increase in the ploidy of the cell, and aneuploidy, that is, a change in the number of chromosomes, for example, trisomy (the presence of an additional homologue in one of the chromosomes) and monosomy (the absence of a homolog in the chromosome).

Video related to DNA

DNA REPLICATION, RNA CODING, PROTEIN SYNTHESIS

(If the video is not displayed, it is available on

Chapter USE: 2.6. Genetic information in a cell. Genes, genetic code and its properties. Matrix nature of biosynthetic reactions. Biosynthesis of protein and nucleic acids

More than 6 billion people live on Earth. Except for 25-30 million pairs of identical twins, then genetically all people are different. This means that each of them is unique, has unique hereditary characteristics, character traits, abilities, temperament and many other qualities. What determines such differences between people? Of course, the differences in their genotypes , i.e. set of genes in an organism. Each person is unique, just as the genotype of an individual animal or plant is unique. But the genetic characteristics of a given person are embodied in proteins synthesized in his body. Consequently, the structure of the protein of one person differs, although quite a bit, from the protein of another person. That's why the problem of organ transplants arises, that's why there are allergic reactions to foods, insect bites, plant pollen, and so on. This does not mean that people do not have exactly the same proteins. Proteins that perform the same functions may be the same or very slightly differ by one or two amino acids from each other. But there are no people on Earth (with the exception of identical twins) in whom all proteins would be the same.

Information about the primary structure of a protein is encoded as a sequence of nucleotides in a region of the DNA molecule - the gene. Gene is a unit of hereditary information of an organism. Each DNA molecule contains many genes. The totality of all the genes of an organism makes up its genotype.

Hereditary information is encoded using genetic code . The code is similar to the well-known Morse code, which encodes information with dots and dashes. Morse code is universal for all radio operators, and the differences are only in the translation of signals to different languages. The genetic code is also universal for all organisms and differs only in the alternation of nucleotides that form the genes and code for the proteins of specific organisms.

Properties of the genetic code : triplet, specificity, universality, redundancy and non-overlapping.

So what is the genetic code? Initially, it consists of triplets ( triplets ) DNA nucleotides combined in different sequences. For example, AAT, HCA, ACH, THC, etc. Each triplet of nucleotides encodes a specific amino acid that will be built into the polypeptide chain. So, for example, the CHT triplet encodes the amino acid alanine, and the AAG triplet encodes the amino acid phenylalanine. There are 20 amino acids, and there are 64 possibilities for combinations of four nucleotides in groups of three. Therefore, four nucleotides is enough to encode 20 amino acids. That is why one amino acid can be encoded by several triplets. Some of the triplets do not encode amino acids at all, but start or stop protein biosynthesis.

The actual genetic code is sequence of nucleotides in an mRNA molecule, because it removes information from DNA ( transcription process ) and translates it into a sequence of amino acids in the molecules of synthesized proteins ( translation process ). The composition of mRNA includes nucleotides of ACGU. The nucleotide triplets of mRNA are called codons. The already given examples of DNA triplets on mRNA will look like this - the CHT triplet on mRNA will become the GCA triplet, and the DNA triplet - AAG - will become the UUC triplet. It is the codons of mRNA that reflect the genetic code in the record. So, the genetic code is triplet, universal for all organisms on earth, degenerate (each amino acid is encrypted by more than one codon). Between the genes there are punctuation marks - these are triplets, which are called stop codons . They signal the end of the synthesis of one polypeptide chain. There are tables of the genetic code that you need to be able to use to decipher mRNA codons and build chains of protein molecules (complementary DNA in brackets).

is a way of encoding the amino acid sequence of proteins using the sequence of nucleotides in the DNA molecule, characteristic of all living organisms.

The implementation of genetic information in living cells (that is, the synthesis of a protein encoded in DNA) is carried out using two matrix processes: transcription (that is, mRNA synthesis on a DNA matrix) and translation (synthesis of a polypeptide chain on an mRNA matrix).

DNA uses four nucleotides - adenine (A), guanine (G), cytosine (C), thymine (T). These "letters" make up the alphabet of the genetic code. RNA uses the same nucleotides, except for thymine, which is replaced by uracil (U). In DNA and RNA molecules, nucleotides line up in chains and, thus, sequences of “letters” are obtained.

In the nucleotide sequence of DNA there are code "words" for each amino acid of the future protein molecule - the genetic code. It consists in a certain sequence of nucleotides in the DNA molecule.

Three consecutive nucleotides encode the "name" of one amino acid, that is, each of the 20 amino acids is encrypted by a significant code unit - a combination of three nucleotides called a triplet or codon.

At present, the DNA code has been completely deciphered, and we can talk about certain properties that are characteristic of this unique biological system, which provides the translation of information from the "language" of DNA to the "language" of protein.

The carrier of genetic information is DNA, but since mRNA, a copy of one of the DNA strands, is directly involved in protein synthesis, the genetic code is most often written in the "RNA language".

Amino acid Coding RNA triplets
Alanine GCU GCC GCA GCG
Arginine TsGU TsGTs TsGA TsGG AGA AGG
Asparagine AAU AAC
Aspartic acid GAU GAC
Valine GUU GUTS GUA GUG
Histidine CAU CAC
Glycine GSU GGC GGA GYY
Glutamine CAA CAG
Glutamic acid GAA GAG
Isoleucine AAU AUC AUA
Leucine TSUU TSUT TSUA TSUG UUA UUG
Lysine AAA AAG
Methionine AUG
Proline CCC CCC CCA CCG
Serene UCU UCC UCA UCG ASU AGC
Tyrosine UAU UAC
Threonine ACC ACC ACA ACG
tryptophan UGG
Phenylalanine uuu uuc
Cysteine UGU UHC
STOP UGA UAG UAA

Properties of the genetic code

Three consecutive nucleotides (nitrogenous bases) encode the "name" of one amino acid, that is, each of the 20 amino acids is encrypted by a significant code unit - a combination of three nucleotides called triplet or codon.

Triplet (codon)- a sequence of three nucleotides (nitrogenous bases) in a DNA or RNA molecule, which determines the inclusion of a certain amino acid in the protein molecule during its synthesis.

  • Unambiguity (discreteness)

One triplet cannot encode two different amino acids; it encodes only one amino acid. A certain codon corresponds to only one amino acid.

Each amino acid can be defined by more than one triplet. Exception - methionine And tryptophan. In other words, several codons can correspond to the same amino acid.

  • non-overlapping

The same base cannot be present in two adjacent codons at the same time.

Some triplets do not encode amino acids, but are a kind of "road signs" that determine the beginning and end of individual genes (UAA, UAG, UGA), each of which means the cessation of synthesis and is located at the end of each gene, so we can talk about the polarity of the genetic code.

In animals and plants, in fungi, bacteria and viruses, the same triplet encodes the same type of amino acid, that is, the genetic code is the same for all living beings. In other words, universality - the ability of the genetic code to work in the same way in organisms different levels complexity from viruses to humans.The universality of the DNA code confirms the unity of pthe origin of all life on our planet. Genetic engineering methods are based on the use of the universality property of the genetic code.

From the history of the discovery of the genetic code

For the first time the idea of ​​existence genetic code formulated by A. Down and in 1952 - 1954. Scientists have shown that a nucleotide sequence that uniquely determines the synthesis of a particular amino acid must contain at least three links. Later it was proved that such a sequence consists of three nucleotides, called codon or triplet .

The questions of which nucleotides are responsible for incorporating a certain amino acid into a protein molecule and how many nucleotides determine this inclusion remained unresolved until 1961. The theoretical analysis showed that the code cannot consist of one nucleotide, since in this case only 4 amino acids can be encoded. However, the code cannot be a doublet either, that is, a combination of two nucleotides from a four-letter "alphabet" cannot cover all amino acids, since only 16 such combinations are theoretically possible (4 2 = 16).

Three consecutive nucleotides are enough to encode 20 amino acids, as well as a “stop” signal, which means the end of the protein sequence, when the number of possible combinations is 64 (4 3 = 64).

GENETIC CODE(Greek, genetikos referring to origin; syn.: code, biological code, amino acid code, protein code, code nucleic acids ) - a system for recording hereditary information in the nucleic acid molecules of animals, plants, bacteria and viruses by alternating the sequence of nucleotides.

Genetic information (Fig.) from cell to cell, from generation to generation, with the exception of RNA-containing viruses, is transmitted by reduplication of DNA molecules (see Replication). The implementation of DNA hereditary information in the process of cell life is carried out through 3 types of RNA: informational (mRNA or mRNA), ribosomal (rRNA) and transport (tRNA), which are synthesized on DNA as on a matrix using the RNA polymerase enzyme. At the same time, the sequence of nucleotides in a DNA molecule uniquely determines the sequence of nucleotides in all three types of RNA (see Transcription). The information of a gene (see) encoding a proteinaceous molecule is carried only by mRNA. The end product of the implementation of hereditary information is the synthesis of protein molecules, the specificity of which is determined by the sequence of their amino acids (see Translation).

Since DNA or RNA contains only 4 different nitrogenous bases[in DNA - adenine (A), thymine (T), guanine (G), cytosine (C); in RNA - adenine (A), uracil (U), cytosine (C), guanine (G)], the sequence of which determines the sequence of 20 amino acids in the protein, the problem of G. to., i.e., the problem of translating a 4-letter alphabet of nucleic acids into the 20-letter alphabet of polypeptides.

For the first time, the idea of ​​matrix synthesis of protein molecules with the correct prediction of the properties of a hypothetical matrix was formulated by N.K. Koltsov in 1928. In 1944, Avery et al. established that DNA molecules are responsible for the transfer of hereditary traits during transformation in pneumococci . In 1948, E. Chargaff showed that in all DNA molecules there is a quantitative equality of the corresponding nucleotides (A-T, G-C). In 1953, F. Crick, J. Watson and Wilkins (M. H. F. Wilkins), based on this rule and data from X-ray diffraction analysis (see), came to the conclusion that a DNA molecule is a double helix, consisting of two polynucleotide strands linked together by hydrogen bonds. Moreover, only T can be located against A of one chain in the second, and only C against G. This complementarity leads to the fact that the nucleotide sequence of one chain uniquely determines the sequence of the other. The second significant conclusion that follows from this model is that the DNA molecule is capable of self-reproduction.

In 1954, G. Gamow formulated the problem of G. to. in its modern form. In 1957, F. Crick expressed the Adapter Hypothesis, assuming that amino acids interact with the nucleic acid not directly, but through intermediaries (now known as tRNA). In the years that followed, all the principal links in the general scheme for the transmission of genetic information, initially hypothetical, were confirmed experimentally. In 1957 mRNAs were discovered [A. S. Spirin, A. N. Belozersky et al.; Folkin and Astrakhan (E. Volkin, L. Astrachan)] and tRNA [Hoagland (M. V. Hoagland)]; in 1960, DNA was synthesized outside the cell using existing DNA macromolecules as a template (A. Kornberg) and DNA-dependent RNA synthesis was discovered [Weiss (S. V. Weiss) et al.]. In 1961, a cell-free system was created, in which, in the presence of natural RNA or synthetic polyribonucleotides, protein-like substances were synthesized [M. Nirenberg and Matthaei (J. H. Matthaei)]. The problem of cognition of G. to. consisted of studying the general properties of the code and actually deciphering it, that is, finding out which combinations of nucleotides (codons) code for certain amino acids.

The general properties of the code were elucidated regardless of its decoding and mainly before it by analyzing the molecular patterns of the formation of mutations (F. Crick et al., 1961; N. V. Luchnik, 1963). They come down to this:

1. The code is universal, i.e. identical, at least in the main, for all living beings.

2. The code is triplet, that is, each amino acid is encoded by a triple of nucleotides.

3. The code is non-overlapping, i.e. a given nucleotide cannot be part of more than one codon.

4. The code is degenerate, that is, one amino acid can be encoded by several triplets.

5. Information about the primary structure of the protein is read from mRNA sequentially, starting from a fixed point.

6. Most of the possible triplets have "meaning", i.e., encode amino acids.

7. Of the three "letters" of the codon, only two (obligate) are of primary importance, while the third (optional) carries much less information.

Direct decoding of the code would consist in comparing the nucleotide sequence in the structural gene (or the mRNA synthesized on it) with the amino acid sequence in the corresponding protein. However, this way is still technically impossible. Two other ways were used: protein synthesis in a cell-free system using artificial polyribonucleotides of known composition as a matrix and analysis of the molecular patterns of mutation formation (see). The first brought positive results earlier and historically played a big role in deciphering G. to.

In 1961, M. Nirenberg and Mattei used as a matrix a homo-polymer - a synthetic polyuridyl acid (i.e., artificial RNA of the composition UUUU ...) and received polyphenylalanine. From this it followed that the codon of phenylalanine consists of several U, i.e., in the case of a triplet code, it stands for UUU. Later, along with homopolymers, polyribonucleotides consisting of different nucleotides were used. In this case, only the composition of the polymers was known, while the arrangement of nucleotides in them was statistical, and therefore the analysis of the results was statistical and gave indirect conclusions. Quite quickly, we managed to find at least one triplet for all 20 amino acids. It turned out that the presence of organic solvents, changes in pH or temperature, some cations and especially antibiotics make the code ambiguous: the same codons begin to stimulate the inclusion of other amino acids, in some cases one codon began to encode up to four different amino acids. Streptomycin affected the reading of information both in cell-free systems and in vivo, and was effective only on streptomycin-sensitive bacterial strains. In streptomycin-dependent strains, he "corrected" the reading from codons that had changed as a result of the mutation. Similar results gave reason to doubt the correctness of G.'s decoding to. with the help of a cell-free system; confirmation was required, and primarily by in vivo data.

The main data on G. to. in vivo were obtained by analyzing the amino acid composition of proteins in organisms treated with mutagens (see) with a known mechanism of action, for example, nitrogenous to-one, which causes the replacement of C by U and A by G. Useful information also provide an analysis of mutations caused by non-specific mutagens, a comparison of differences in the primary structure of related proteins in different types, the correlation between the composition of DNA and proteins, etc.

G.'s decoding to. on the basis of data in vivo and in vitro gave the coinciding results. Later, three other methods for deciphering the code in cell-free systems were developed: binding of aminoacyl-tRNA (i.e., tRNA with an attached activated amino acid) with trinucleotides of a known composition (M. Nirenberg et al., 1965), binding of aminoacyl-tRNA with polynucleotides starting with a certain triplet (Mattei et al., 1966), and the use of polymers as mRNA, in which not only the composition, but also the order of nucleotides is known (X. Korana et al., 1965). All three methods complement each other, and the results are consistent with the data obtained in experiments in vivo.

In the 70s. 20th century there were methods of especially reliable check of results of decoding G. to. It is known that the mutations arising under the influence of proflavin consist in loss or an insertion of separate nucleotides that leads to a shift of a reading frame. In the T4 phage, a number of mutations were induced by proflavin, in which the composition of lysozyme changed. This composition was analyzed and compared with those codons that should have been obtained by a shift in the reading frame. There was a complete match. Additionally, this method made it possible to establish which triplets of the degenerate code encode each of the amino acids. In 1970, Adams (J. M. Adams) and his collaborators managed to partially decipher G. to. by a direct method: in the R17 phage, the base sequence was determined in a fragment of 57 nucleotides in length and compared with the amino acid sequence of its shell protein. The results were in complete agreement with those obtained by less direct methods. Thus, the code is deciphered completely and correctly.

The results of decoding are summarized in a table. It lists the composition of codons and RNA. The composition of tRNA anticodons is complementary to mRNA codons, i.e. instead of U they contain A, instead of A - U, instead of C - G and instead of G - C, and corresponds to the codons of the structural gene (that strand of DNA, with which information is read) with the only difference being that uracil takes the place of thymine. Of the 64 triplets that can be formed by a combination of 4 nucleotides, 61 have "sense", i.e., encode amino acids, and 3 are "nonsense" (devoid of meaning). There is a fairly clear relationship between the composition of triplets and their meaning, which was discovered even when analyzing the general properties of the code. In some cases, triplets encoding a specific amino acid (eg, proline, alanine) are characterized by the fact that the first two nucleotides (obligate) are the same, and the third (optional) can be anything. In other cases (when encoding, for example, asparagine, glutamine), two similar triplets have the same meaning, in which the first two nucleotides coincide, and any purine or any pyrimidine takes the place of the third.

Nonsense codons, 2 of which have special names corresponding to the designation of phage mutants (UAA-ocher, UAG-amber, UGA-opal), although they do not encode any amino acids, but they have great importance when reading information, encoding the end of the polypeptide chain.

Information is read in the direction from 5 1 -> 3 1 - to the end of the nucleotide chain (see Deoxyribonucleic acids). In this case, protein synthesis proceeds from an amino acid with a free amino group to an amino acid with a free carboxyl group. The start of synthesis is encoded by the AUG and GUG triplets, which in this case include a specific starting aminoacyl-tRNA, namely N-formylmethionyl-tRNA. The same triplets, when localized within the chain, encode methionine and valine, respectively. The ambiguity is removed by the fact that the beginning of reading is preceded by nonsense. There is evidence that the boundary between mRNA regions encoding different proteins consists of more than two triplets and that the secondary structure of RNA changes in these places; this issue is under investigation. If a nonsense codon occurs within a structural gene, then the corresponding protein is built only up to the location of this codon.

The discovery and decoding of the genetic code - an outstanding achievement of molecular biology - had an impact on all biol, sciences, in some cases laying the foundation for the development of special large sections (see Molecular genetics). G.'s opening effect to. and the researches connected with it compare with that effect which was rendered on biol, sciences by Darwin's theory.

The universality of G. to. is direct evidence of the universality of the basic molecular mechanisms of life in all representatives organic world. Meanwhile, the large differences in the functions of the genetic apparatus and its structure during the transition from prokaryotes to eukaryotes and from unicellular to multicellular ones are probably associated with molecular differences, the study of which is one of the tasks of the future. Since the research of G. to. is only a matter recent years, the significance of the results obtained for practical medicine is only indirect, allowing us to understand the nature of diseases, the mechanism of action of pathogens and medicinal substances. However, the discovery of such phenomena as transformation (see), transduction (see), suppression (see), indicates the fundamental possibility of correcting pathologically altered hereditary information or its correction - the so-called. genetic engineering (see).

Table. GENETIC CODE

First nucleotide of the codon

Second nucleotide of the codon

Third, codon nucleotide

Phenylalanine

J Nonsense

tryptophan

Histidine

Glutamic acid

Isoleucine

Aspartic

Methionine

Asparagine

Glutamine

* Encodes the end of the chain.

** Also encodes the beginning of the chain.

Bibliography: Ichas M. Biological code, trans. from English, M., 1971; Archer N.B. Biophysics of cytogenetic defeats and a genetic code, L., 1968; Molecular genetics, trans. from English, ed. A. N. Belozersky, part 1, M., 1964; Nucleic acids, trans. from English, ed. A. N. Belozersky. Moscow, 1965. Watson J.D. Molecular biology gene, trans. from English, M., 1967; Physiological Genetics, ed. M. E. Lobasheva S. G., Inge-Vechtoma-va, L., 1976, bibliogr.; Desoxyribonucleins&ure, Schlttssel des Lebens, hrsg. v „E. Geissler, B., 1972; The genetic code, Gold Spr. Harb. Symp. quant. Biol., v. 31, 1966; W o e s e C. R. The genetic code, N. Y. a. o., 1967.