Medical Genetics

Principles of Molecular Genetics

What genes do

Genes are inherited units of information that determine phenotype at both the gross (morphology, e.g. eye colour) and molecular (polypeptide) level. Each amino acid in a protein is coded for by a codon (a 3 base-pair sequence). There are 22 pairs of homologous chromosomes and the sex chromosomes, so most genes are present twice (two alleles). Most multicellular organisms have two sets of chromosomes, that is, they are diploid; and these chromosomes are referred to as homologous chromosomes. Diploid organisms have one copy of each gene (and therefore one allele) on each chromosome. If both alleles are the same, they are homozygotes. If the alleles are different, they are heterozygotes.

Genotypic interactions between the two alleles at a locus can be described as dominant or recessive, according to which of the two homozygous genotypes the phenotype of the heterozygote most resembles. Where the heterozygote is indistinguishable from one of the homozygotes, the allele involved is said to be dominant to the other, which is said to be recessive to the former.

Mutations can occur in exons as well as in introns. If they occur in exons, they are classified as point mutations, single base insertions/deletions and multiple base insertions/deletions. If a single point mutation takes place, the result may be either:

  • no change in the amino acid sequence of the protein, because 4^^3 = 64 codons code for 20 amino acids and therefore there is a large degree of redundancy in the genetic code
  • a change in the amino acid coded for, with results dependent on the importance of that AA in the protein, e.g. if the AA is primarily a scaffolding element, little loss of function, while if it is at a binding side, complete loss of function can occur
  • if the mutation results in the mutated triplet coding for a stop codon, a truncated protein will result with (usually) complete loss of function.

Single base insertions or deletions lead to mis-sense or frame shift mutations affecting all codons after the mutation. Usually, a stop codon will end up occurring early and there will be a truncated protein, but even if there is not, the mis-sense protein will virtually always be without function.

Insertions or deletions of anything other than multiples of three bases will lead to frame-shift mutations, which, as with single base insertions or deletions, will usually lead to loss of function. Insertions/deletions of multiples or three lead to the insertion or deletion of entire amino acids. The loss of even one AA can be devastating, as illustrated by cystic fibrosis, in which in the most common delta-F508 genotype, the loss of Phe at position 508 (out of 1480) results in trafficking failure of the CFTR protein to the epithelial membrane.

Dynamic mutations are when nucleotide repeats expand generation by generation - when a certain threshold of repeats is reached, disease becomes apparent, e.g. Huntington's. This accounts for the phenomenon of anticipation, where a phenotype becomes progressively more severe from one generation to the next.

Most mutations are associated with loss of function of the protein but rarely, there can be a gain of function, e.g. a mutation in growth factor receptor makes it always be activated.

Mutations can also occur in introns and if the change is in a regulator element such as a repressor or promoter, this can cause inappropriate up or down regulation of gene expression. Mutations in splice sites prevent correct exon splicing and lead to the inclusion (or exclusion) of exons, as well as premature stop codons and frame-shift problems.

Connection between gene structure and function

Genes consist of deoxyribonucleic acid (DNA), which is a long polymer of nucleotides. Exon sequence is highly conserved between individuals, while intron sequence is not. Around 95% of the genome is in introns.

DNA consists of two long polymers made of nucleotides. The backbones are made of sugars and phosphate groups joined by ester bonds. Nucleosides (a nucleobase and a ribose/deoxyribose sugar) are the unphosphorylated form (without a phosphate group on the sugar's primary alcohol group). Amino acid metabolism produces nucleosides and they can be generated de novo in the liver. There are four nucleotides in DNA: the purine bases adenine and guanine and the pyrimidine bases, thymine and cytosine. Base pairing occurs as adenine-thymine (2 hydrogen bonds) and guanine-cytosine (three hydrogen bonds).

The DNA backbone is made from alternating phosphate and sugar residues; the sugar is 2-deoxyribose, which has 5 carbons. The sugars are linked together by phosphate groups that form phosphodiester bonds between the 3rd and 5th carbon atoms of adjacent sugar rings. This gives DNA a direction: the 5' end has a terminal phosphate group and the 3' end a hydroxy group. Naming goes from the 5' to the 3' end. Directionality has consequences, because DNA polymerase can only add nucleotides to the 3' end in a DNA strand. In RNA, the sugar is ribose instead of 2-deoxyribose.

In replication, the double helix is unwound (replication fork) and each strand acts as a template for the next strand (a semi-conservative process). DNA polymerase synthesises a new strand by extending the 3' end of a primer (a short initial sequence), adding nucleotides matched to the template strand one at a time via the formation of phosphodiester bonds. The energy comes from two of the three total phosphates attached to each unincorporated base. When a nucleotide is being added, two of the phosphates are removed and the energy released produces a phosphodiester bond that attaches the remaining phosphate to the growing chain. This explains why DNA is synthesised from the 5' to the 3' end - if it were the other way around, the energy would come from the 5' end of the growing strand instead of from free nucleotides. DNA polymerases are extremely accurate, making less than 1 mistake per 10 million added nucleotides. After synthesis, the RNA primer is removed by the exonuclease activity of DNA polymerase I, which then synthesises DNA to fill the gap. The DNA fragments are joined by DNA ligase.


Transfer RNA (tRNA) is an adaptor molecule composed of RNA used to bridge the four-letter genetic code (ACGU) in messenger RNA (mRNA) with the twenty-letter code of amino acids in proteins. One end of the tRNA carries the genetic code in a three-nucleotide sequence called the anticodon. The anticodon forms three base pairs with a codon in mRNA during protein biosynthesis. The mRNA encodes a protein as a series of contiguous codons, each of which is recognized by a particular tRNA. On the other end of its three-dimensional structure, each tRNA is covalently attached to the amino acid that corresponds to the anticodon sequence. This covalent attachment to the tRNA 3’ end is catalyzed by aminoacyl-tRNA synthetases. Each type of tRNA molecule can be attached to only one type of amino acid, but, because the genetic code contains multiple codons that specify the same amino acid, tRNA molecules bearing different anticodons may also carry the same amino acid.


Regulation of gene expression

Housekeeping genes are expressed ubiquitously in cells at constant levels but many other genes are expressed dynamically, with control being exerted either internally or externally or being reversible or permanent. Transcription factors are the protein products of genes that control the expression of other genes. These internal signals cause long-term or permanent effects on cell differentiation. The regulatory motifs are enhancer proteins with a DNA binding motif and a transcriptional activation motif (most common type is the helix-turn-helix). They are said to act in a "trans" manner, affecting genes far away in the genome and act by recruiting other regulatory proteins, which bind to the gene in question and increase the accessibility of the initiation site to RNA polymerase (or exposing other regulatory sites) - that is, they perturb the chromatin structure of the gene they regulate.

External signals are typified by steroid hormones, generally reversible when the steroid is withdrawn. The lipid-soluble hormone diffuses from the extracellular fluid into cells, where they bind to receptor proteins, with the receptor complex then diffusing into the nucleus where it binds to receptor elements in the DNA and either up- or down-regulate gene expression. Hormones can also act via 2nd messenger systems to affect gene expression (e.g. cAMP-sensitive response elements). Transcription factors that increase/inhibit gene transcription are known as enhancers and silencers, respectively.

DNA is tightly packaged around proteins to form chromatin. The modification of the packaging proteins (as well as the DNA), alters the packaging, an important mechanism in the regulation of gene expression. In histone proteins, the most abundant DNA packaging proteins, linker regions and histone tails can wrap around the DNA to reduce its accessibility. Histone acetyltransferase acetylates the histone tails and promotes their association with the DNA, typically inhibiting gene expression in that region.

Transcription, RNA processing and translation

A number of RNAs must be produced during the process of transcription:

  • mRNA codes for the protein itself (normal complementary pairing occurs except uracil is paired with adenine)
  • ribosomal RNA (rRNA), synthesised by RNA polymerase I, is the RNA component of the ribosome and provides a mechanism for decoding mRNA into amino acids
  • transfer RNA (tRNA), synthesised by RNA polymerase III, is an adaptor used to form the amino acid sequence as described above and
  • small nuclear RNA (snRNA), synthesises by polymerase II, has various roles in the nucleus including splicing

RNA polymerase II selects the correct ribonucleotide triphosphate and catalyses the formation of the phosphors ester bond. The RNA molecule does not require a primer and is synthesised in the 5' to 3' direction. A single enzyme transcribes a complete RNA. There is no proof reading mechanism since RNA is not inherited.

The RNA as transcribed by polymerase II is pre-mRNA and requires processing. It is first capped by the addition of a G residue on the 5' end of the pre-mRNA, to protect the 5' terminal from degradation and to improve ribosomal recognition. A poly(A) tail is then added (polyadenylation) - the consensus sequence AAUAAA signals that pre-mRNA is to be cleaved about 20 bases downstream by an endonuclease and then the poly(A) polymerase adds the poly(A) cap, which consists of hundreds of residues and is thought to enhance translation and improve stability. Finally, spliceosomes consisting of snRNA and associated proteins, splice out introns to produce the final mRNA.

mRNA leaves the nucleus through pores in the nuclear membrane and moves to the cytoplasm, where ribosomes on the cytoplasmic face of rough ER (for proteins to be secreted, targeted to organelles or inserted into the membrane) or free in the cytoplasm (for proteins to remain in cytoplasm) translate them into proteins. Ribosomes consist of a large and small subunits made up of a large number of proteins and one or more RNA molecules, with the key reactive sites being almost entirely RNA.

The large ribosome complex has E, P and A sites. Met-tRNA binds to the P site and the next tRNA (as specified by the mRNA) binds to the A site. A peptide bond is formed between the two amino acids by peptidyl transferase. The ribosome then translocates along the RNA towards the 3' end by one codon so that the AAs previously bound to the P and A sites are now bound to the E and P sites (this process is driven by GTP and elongation factors). The newly liberated A site can accept a new tRNA and the sequence is repeated until the stop codon is reached in the mRNA. The stop codon is recognised by release factor eRF1, which causes dissociation of the ribosome from the newly synthesised polypeptide.

For the majority of proteins to be secreted, which are synthesised in the RER, there is a signal sequence that allows the protein to enter the RER lumen as it is synthesised. The signal sequence is then cleaved. For membrane-bound proteins, there is believed to be an equivalent signal sequence allowing them to be inserted into the plasma membrane.

Organisation of the genome

The mammalian genome has single copy sequences, which code for specific function (about half). Multiple copy genes usually encode for products that must be expressed rapidly, such as his tones and genes for ribosomal RNA. Highly repeated non-coding regions are found throughout the non coding regions and make up a third of the total genome. Their functions are not understood but they have been implicated in disease. They can be either short interspersed nuclear elements (300 bps long) or long INEs (6000 bps long).

Characterisation of genes at a molecular level

Cloning refers to methods used to assemble recombinant DNA molecules and to direct their replication within a living host. Generally, DNA to be cloned is obtained from an organism of interest, treated with enzymes to cut it into smaller pieces which are ultimately connected using cloning vectors to produce the full-length strand of interest. This is inserted into e.g. E coli where it is reproduced exponentially to produce a population of bacteria with the cloned DNA.

Restriction endonucleases cleave DNA sequences at specific nucleotide sequence locations. Two incisions are made, once through each sugar-phosphate backbone bond of the DNA double helix. A cloning vector is a piece of DNA into which a foreign DNA fragment can be inserted. The insertion of the fragment is carried out by treating both the fragment and the vehicle with restriction enzymes that create the same overhang, then ligating the segments together. Genetically engineered plasmids (DNA molecules replicating separately from nuclear DNA in bacteria) and bacteriophages are most commonly used.

PCR is used to amplify a DNA sequence, using sample DNA acting as a template, Taq DNA polymerase which catalyses DNA replication and can survive the high temperature of the reaction, nucleotides as building blocks and primers which bind to the DNA template and allow its elongation to create new DNA copies. The PCR steps in outline:

  • 95 degrees C allows separation of the DNA stands, giving access to the primer sequences
  • 55 degrees C subsequently permits the annealing of the primers to the sequences
  • 72 degrees C allows the polymerase to bind and elongate the DNA at the primer sites

Repeated cycling through these steps allows the exponential amplification of a specific DNA sequence, with the primers conferring specificity and permitting the detection of polymorphisms or the presence/absence of a gene. If the primer is complementary to the target sequence of DNA, there will be amplification detectable by electrophoresis. If the primer is not complementary (due to the target sequence being absent), amplification will not occur and no electrophoresis band will be seen.

Electrophoresis allows the separation of DNA molecules based on their charges and weight. Negatively charged DNA stands are drawn through a gel towards a positive electrode at a rate dependent on their size and charge.

Southern blotting is a technique for detecting specific genes or polymorphisms. Genomes can be isolated and fragmented using restriction enzymes, passed through gel electrophoresis and then transferred to a membrane through blotting (at which point a hybridisation probe can be used to detect the presence or absence of a gene.

DNA sequencing is done using dideoxynucleotides in low concentrations within a PCR reaction. These molecules can be added to the growing sequence but lack the hydroxyl group to continue the growth of the chain. When added at low concentrations, they terminate some of the nucleotide chains each time the specific nucleotide is added. Four reactions are run for each sequence. In each case, a PCR reaction is set up and a single dideoxynucleotide (eg. ddATP) added, along with only one primer so that only one sequence is produced. The completed reactions can be run through a gel so that the resulting bands correspond to the presence of the chain-terminating dideoxynucleotide at that position in the sequence. This process has been sped up by the use of fluorescence molecules which allow automated computerised reading of the results.


Chromosomal abnormalities occur due to problems at meiosis and are usually extremely severe and account for half of early miscarriages. The normal chromosome has a centromere, from which two arms project (long and short). Chromosomes condense during cell division and consist of two identical chromatids joined at the centrimere (where the spindle/microtubules attach). In metacentric chromosomes, the centromere is central, in submetacentric chromosomes it is partially towards one end and in acrocentric chromosomes, the short arms contains only junk DNA and copies of ribosomal genes. Locations on the short arm are labelled 'p' (petit) counting outwards from the centromere and locations on the long arm are labelled 'q'. There is further subdivion into bands e.g. P11.1. Chromosomes exist as a single chromatid for most of the cell cycle, with the second copy being produced during the S phase of the cell cycle and separation occuring during meiosis II.

Polyploidy involves additional complete sets of chromosomes. Triploid (3n) embryos result from the simultaneous fusion of two sperm with an egg. It is denoted eg. 69,XXX and embryos rarely survive to birth (polyploidy contributes to 15% of miscarriages). Abnormal diploid embryos form due to the transfer of only one parental chromosome set and do not survive but can lead to hyaditiform moles.

Single chromosome abnormalities of the autosomal chromosomes occur when the wrong number of a specific chromosome are inherited as a result of incorrect separation during meiosis. Monosomy is invariably lethal in utero. Trisomies, where an extra copy of a chromosome is inherited, can survive to birth, but only one type (trisomy 21, Down syndrome) permits survival to adulthood. Trisomy 13 (Patau syndrome) and trisomy 18 (Edward's syndrome) lead to death in the first few weeks of life. These chromosomes have the least numbers of genes, hence their higher viability. Down syndrome is associated with heart defects, facial flatness, a gap between the first two toes, retardation and infertility. Its incidence increases with maternal age.

Numerical abnormalities of the sex chromosomes is compatible with life because only one sex chromosome is sufficient in men and women do not have a Y. Turner's syndrom, monosomy of chromosome X in women (XO) is common at conception and although usually lethal, is seen in 1/3000 live births. Patients are infertile, skip puberty and are abnormally short, often with coarctation of the aorta. If detected early, Turner's can be treated with hormone therapy and almost entirely corrected (except infertility).

Structural abnormalities are divided into:

  • translocation, the exchange of segments between non-homologous chromosomes, can be asymptomatic if the DNA break point does not disrupt a gene and if the translocation is balanced (eg. Robertsonian translocations, where the long arms of acrocentric chromosomes fuse).
  • deletions, due to the loss of part of a chromosome, so partial monosomy, are incompatible with life if large and can lead to disease if small. Cri-du-chat syndrome is a rare (1/50000 births) deletion of the short arm of chromosome 5 and is characterised by a cat-like cry, short stature, facial abnormalities, breathing problems and congenital heart defects as well as some learning disabilities. Smaller deletions, e.g. Prader-Willi syndrome, a deletion of the paternally derived chromosome 15, leads to short stature, obesity and learning disability; and Angleman's syndrome, deletion of the maternally derived chromosome 15, leading to apraxic gait, jerky movements, epilepsy, inability to speak and severe learning disabilities
  • inversions are the reversals of a section of a chromosome and are balanced, leading to normal phenotypes, unless tthe break point disrupts a gene; they can be pericentric, involving the centromere, or paracentric, where only one arm is affected.

Ring chromosomes occur when the arms break and join, losing the distal fragments, leading to serious phenotypes; they are unstable during mitosis, with some cells losing the ring and being monosomic.

It is essential that one chromosome in each pair is from the mother and the other from the father, because paternal and maternal chromosomes have different roles at different points in development and may be preferentially expressed in different cell types.

Genetics of disease

Since gene disorders

Mitochondral inheritance

Genes in populations

The human genome, mapping and diagnosis

DNA polymorphisms

Genetic linkage

Mutation and human disease

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License