Excitement and opportunities in systems biology

Not so many years ago, in February 2001, two groups in the US published the first drafts of the human genome sequence in the journals Nature and Science. The sequence is that of approximately three billion "letters" that run through strands of DNA, the primary component of our chromosomes where our genes reside. It has been known for a long time that a DNA letter can only be one of four – namely, A, T, C, or G – and it never ceases to amaze me that the linear sequence of these letters contains virtually all the information required to create a complex organism like us (see note [1] at the end of the article for meaning of acronyms). Recent estimates on human genes put their number at about 30,000, which corresponds to how many proteins our cells can make. The translation of the sequence of DNA letters to proteins requires the so-called "genetic code" that instructs cells what amino acids (the components of proteins) should be assembled for a given sequence of letters. The elucidation of the genetic code and understanding the molecular machinery for expressing genes to proteins are major achievements of molecular biology. The proteins are the workhorses of cells; they carry out various enzymatic functions such as catalyzing metabolic reactions, structural functions such as those of protein filaments and other cellular fibers, and they orchestrate physiological processes such as cell division and differentiation. So this means that the human genome sequence information tells us everything we need to know about human biology, right? Wrong.

To be fair, it is perfectly understandable why we should be excited by the availability of a human genome sequence. If we know where the genes and their sequences are, we can compare a normal gene from a mutated one (having at least one letter changed in the sequence), analyze the changes in the expressed protein and then predict the consequences of this mutation on human health. A classic example is a mutation in the hemoglobin gene – namely, a specific GAG sequence in the gene is changed to GTG – which leads to the disease called sickle-cell anemia. So it is easy to see why there is a lot of expectation regarding the utility of the genome sequence toward developing cures for genetic diseases.

The discipline called Bioinformatics is concerned with the analysis of genome sequences, including discovering genes in the three-billion letter sequence of human DNA – work that is far from complete. But even when we get to know all of our genes, a far more daunting task will still be ahead of us. Genes can be turned "on" or "off" – meaning that they may or may not be producing proteins. It is the regulation of gene expression that is turning out to be horrendously complex. The making of proteins from DNA actually occurs via another class of molecules called RNA (see note [1]). A gene sequence on the DNA is "transcribed" to the corresponding RNA molecule, and this RNA is "translated" to the protein. The complexity stems from the fact that information flow is not linear. Many proteins can bind to DNA and affect RNA transcription. Many proteins can also bind to RNA and influence the protein translation process. RNA molecules can also affect both transcriptional and translational processes. And as if the picture is not complex enough, proteins interact with other proteins, with metabolites or other molecules when they carry out their cellular functions. How are we to cope with such complexity? First, we try to identify and classify all the players in the system. We are in the era of Omics revolutions brought about by high-throughput data-acquisition technologies. Genomics is churning out genome sequences of hundreds of organisms; transcriptomics is developing microarray technologies that monitor RNA levels (the transcriptome); and proteomics aims to identify and characterize the set of cellular proteins (the proteome). There is also metabolomics which aims to measure all the metabolites produced in a cell, and interactomics which assays protein-protein and other molecular interactions (yes, scientists do get carried away!). Essentially these Omics will provide us with a comprehensive parts list of our cells. Secondly, it will be the integration of these parts and the analysis of the dynamics of their interactions that ultimately will bring about a comprehensive understanding of biological systems – this is where Systems Biology enters the picture.

The basic premise of Systems Biology is that the whole is greater than the sum of its parts. In other words, there are properties attributed to the system as a whole which cannot be predicted from studying the parts in isolation because these properties emerge from the interactions of the parts. An unmistakable attribute of Systems Biology is that it is interdisciplinary. The complexity of biological systems demands collaboration among biologists, chemists, physicists, mathematicians, computer scientists, etc.; this explains the recent proliferation of institutes of systems biology in North America, Europe, and Japan.

I was trained as a physical chemist in graduate school, and taught university chemistry courses for many years. After getting my tenure in the late 1990s, and with all the buzz about genomics and transcriptomics at that time, I saw the opportunity of refocusing my research to join in the fun. My re-education in molecular cell biology was wonderfully quick and painless – thanks to the excellent Internet infrastructure in my university. I cannot overemphasize this enough: the Internet is a great global equalizer in scientific research. With a good connection, anybody in the world can access a universe of information, instantly. There are many free or open-access online scientific journals (see note [2]) and downloadable lecture notes and videos from eminent academic institutions such as the Massachusetts Institute of Technology in the US (for the open courseware, see note [3]). There are even popular biology textbooks that one can access online (see note [4]). To tell you the truth, I hardly go to our library now because Google searching (see note [5]) is often sufficient! However, it is always a good practice to double check information obtained from the Web.

My pet research project is in an area called cancer systems biology. Cancer is a complex disease involving the interplay among many cellular processes, including the cell division cycle, apoptosis (programmed cell death), angiogenesis (formation of blood vessels required to feed tumors), senescence (aging of cells), and others. Years of research in the cancer community, and more recently the abovementioned Omics technologies, are providing unprecedented details of relevant molecular pathways. For those interested in cellular pathways in general, visit pathguide (see note [6]) to access over 200 databases of biological pathways. I think the analysis of these pathways is where a lot of research and business opportunities lie, not just in cancer research but also in understanding other human diseases. It is no surprise that big pharmaceutical companies are keenly interested in reliable pathways databases and are actively developing computer software for analyzing these pathways. Sprouting from academia in the US are biotech companies that are established very cheaply by a few smart students with modest computing power (visit, for example, the website given in note [7]). I believe that smart students in the Philippines can do the same. At the moment, a key problem in using online pathways databases is how to integrate them, cope with the uncertainties in the data, and extract network models for diseases of interest. In a recent international bioinformatics conference held in Brazil (see note [8]), I presented my ideas on this issue in a set of tutorial lectures which can be downloaded from the website given in note [9]. There is an urgent need for a network analysis software that could identify drug targets. I think multi-drug therapeutic strategies will be the answer for cancer, and for many other diseases.

In some of those regular Friday night gatherings (of UPLB Chemical Society members) when I was still an undergraduate at the University of the Philippines at Los Baños (to play bidding bridge, the Society’s unofficial card game at that time), I recall asking the question, "How can inanimate molecules organize and interact to create life?" Although I may not have been conscious about it, the longing for an answer steered the education and research I have pursued for almost three decades. My naivete was exposed as the years went by and I became more skeptical that I would ever hear an answer in my lifetime – until recently, when my optimism began to build up into excitement because I feel that the systems biology approach may ultimately provide us the opportunity to formulate a good answer.
* * *
Notes:

[1] DNA = deoxyribonucleic acid; "letters" A, T, C, G correspond to the first letter of the following chemical bases: adenine, thymine, cytosine, guanine. RNA = ribonucleic acid.

[2] www.plos.org and www.biomedcentral.com

[3] ocw.mit.edu

[4] www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Books

[5] www.google.com

[6] www.pathguide.org

[7] www.gnsbiotech.com

[8] www.ismb2006.cbi.cnptia.embrapa.br/tutorials.html

[9] www.mbi.ohio-state.edu/publications/pub2006.html
* * *
Dr. Baltazar Aguda is currently a visiting scholar at the Mathematical Biosciences Institute, Ohio State University, USA where he is writing a book on mathematical models of cell-fate regulation. He obtained his PhD in Chemistry (chemical physics program) in 1986 from the University of Alberta (Canada) and his BSc Agricultural Chemistry degree from the University of the Philippines at Los Baños in 1978. He can be e-mailed at bdaguda@gmail.com.

Show comments