AgBioForum
Volume 17 // Number 1 // Article 6
PDF Comment on this article Issue contents Previous article Next article
The Existence and the Socio-Economic Implications of Genetic Networks: A Meta-Analysis
University of Rome Tor Vergata
Genetic networks are recent paradigms for the inheritance of traits from parents. They can be defined as relational structures composed of genes, some of which carry genetic information, and linkages with structural or regulating properties. Previous studies have found that biological networks are characterized by scale-free or power-law algorithms, with the development of highly influential genes with many—but sparse—connections to other genes, and with key regulatory roles in phenotypical expression. The highly influential genes are “hotspots” of regulatory functions and represent the top of the network. Seven basic typologies of genetic networks have been discovered; in these networks, genes play different roles. In this article, we address the issue of the characteristics of genetic networks and their social and economic implications by reviewing recent literature on the subject and developing a meta-analysis of a sample of recent studies. We explore the implications of a model where the desirable traits depend not only on the properties of the individual genes, but also on their connections and the architecture of the network. The model suggests that under reasonable hypotheses and interpretations of past research, several important consequences follow for the interpretation of the roles of genes in everyday life, their interaction with the environment, and their socio-economic role. A major implication for agricultural research on biotechnology is that a strategy aimed to select varieties on the basis of topological properties of the underlying genetic network, and their regulatory role, may be more successful than one depending only on focusing on the direct association between specific genes and desirable traits.
Key words: network analysis, co-expression networks, genetic networks.
Introduction

In the recent past, the scientific community experienced a renewed interest in the study of complex networks. Numerous scientists from different fields were interested in studying the topological features and the interactions among the components of complex networks. Intense research activity was directed towards biological networks, where network theory finds its natural application and genes are considered nodes with links as the interconnections among them.

These studies have produced remarkable progress, not only in understanding the topological and chemical structures of the genes (which, today, can be described and determined precisely), but also on improving agricultural crops. Thanks to the deep genetic knowledge acquired, genes can be modified and recombined into the cells of living organisms. For this reason, scientists started to use the rDNA technique to improve crop productivity or to make crops more resistant to stress, diseases, and chemical treatments.

In order to unleash the potential of cultivated crops, agriculturists can use the rDNA technique—which is improving crop breeding—also in an indirect way. This means that scientists do not introduce novel genes, but they are going to achieve good results thanks to the development of marker technologies. This kind of technology allows scientists to identify and map parts of a chromosome, which contain the genes that cover relevant areas of agronomic interest.

These studies have been developed for most crop species and have led to important discoveries. In studying the individual loci that control a quantitatively inherited trait, scientists thought at first that complex traits were determined by a large number of genes, but they more recently noticed that the effects of these loci are very uneven and in many cases only few loci can affect complex traits. Following this discovery, plant biologists performed different experimental studies focusing the attention on the genome of different crops.

In order to define gene targeting for functional identification and for investigations of regulatory mechanisms, plant biologists constructed models where traits are the result of the cooperative expression (co-expression) of genes, organized according to the topology of networks. A co-expression network is constructed by determining the tendency of m transcripts to exhibit similar expression patterns across a set of n microarrays (Ficklin & Feltus, 2011). The information contained in the co-expression network is the key to understanding the biological systems at a molecular level as a series of relationships among co-expression modules in addition to gene-to-gene relationships (Obayashi & Kinoshita, 2010). On the other hand, to obtain reliable estimates of co-expressed gene relationships, biologists need a large amount of data from DNA microarray experiments. Today, these data are available in large public data repository sources (containing information about a huge variety of crops) that are widely used.

Genetic networks have important implications for further research strategies. They also imply socio-economic consequences, in that both conceptual representations of evolution and genetic causality are likely to be substantially changed by a new structuralist paradigm, as an innovative basis for policies and practices both in the private and public domain. More specifically, the chain of causation suggested by the emerging network paradigm suggests that gene expression is a complex phenomenon, where the structure of the genes and their relation may dominate the individual nature of the genes involved. At the same time, and perhaps paradoxically, selected individual genes may be more important because of their roles as hubs of complex networks, where their position as crossroads of connections and engines of co-expression may be key to understand how traits are inherited and expressed in practice.

While the traditional view of population genetics sees evolution as a process involving the change in frequency of distinct gene variants (alleles) differing in fitness over time, the molecular basis of phenotypic variance seems to depend very little on gene and protein sequences, especially for the characters that appear to confer adaptive benefit to the bearers. Moreover, comparison of homologous DNA sequences of various species shows that gene expressions seem to drive evolution more forcefully than the genes themselves. Gene expression in turn may depend in part on the fact that genes are linked to each other in functional networks whose products exhibit interrelated expression profiles. This has led some authors to propose a new theory of the “selfish gene network” (Boldogkoi, 2004), based on four main propositions: (1) Instead of individual genes, gene networks (GNs) are responsible for the determination of traits and behaviors. (2) The primary source of microevolution is the intraspecific polymorphism in GNs and not the allelic variation in either the coding or the regulatory sequences of individual genes. (3) GN polymorphism is generated by the variation in the regulatory regions of the component genes and not by the variance in their coding sequences. (4) Evolution proceeds through continuous restructuring of the composition of GNs rather than fixing of specific alleles or GN variants.

Genetic networks also relate importantly to climate change and green growth. For example, a growing literature explores the way landscape patterns are related to the spread and establishment of plant pathogens and their genetic variations across their geographical distribution. The dispersal of organisms on a geographic scale (Garrett, Dendy, Frank, Rouse, & Travers, 2006; Stukenbrock, Banke, & McDonald, 2006), resilience, and adaptation to climate change may be the consequence of how genetic dispersion depends on the underlying network structure, the presence of dominant genes in a scale-free structure and this relates in turn to biodiversity, plant disease epidemiology, and human and animal pathologies (Burdon, Thrall, & Ericson, 2006; Gilligan, Brenner, & Venkatesh, 2002; Jaeger et al., 2004). Landscape pathology and ecology also appear to be related to intrinsic network properties of gene flow (Geils, 1992; Holdenrieder, Stieber, & Pawel, 2004; Lundquist & Hamelin, 2005; Pautasso, Holdenrieder, & Stenlid, 2005).

A second, most important, field where research on GNs may have significant socioeconomic consequences is the management of gene conservation and crop genetic diversity. Traditional practices, consisting of mere crop or character diversification, may fail because they are approximate and largely based on expressed characters. On the other hand, information on the underlying genetic structure—such as the role played by key nodal genes in the co-expression networks—may be crucial to improve the management of crop diversity both off and on farm.

Finally, economic theory and research may itself be influenced by the development of the new level of inquiry into network connections and dominant genes due to its attention to evolutionary patterns as underlying determinants of socioeconomic systems. Evolutionary economics and some of its more daring hypotheses, in particular, may be linked to the interpretive model provided by gene networks and co-expression clusters. For example, in an important field known as genetic programming—originally developed Holland (1975) and further extended by Koza (1992)—based on biological analogy, artificial adaptive agents (genes or genetic algorithms) with the capacity for autonomous discovery (autonomous programming ability) are introduced in economic modeling. The study of the behavior of a network of such agents in a market-like environment suggests close resemblance with the ways in which genetic network theories are taking shape (see, for example, Andrews & Praeger,1994; Chen & Yeh,1996, 1997, 1999, 2000a, 2000b; Lensberg,1999).

In this study we focus our attention on three crops: maize, rice, and Arabidopsis thaliana to perform a meta-analysis of recent studies about co-expression GNs. Data from these studies provide detailed information about GN constituents, such as the number of genes, the number of edges, the number of co-expressed genes, the number of clusters, and the number of modules, as well as on the related co-expression traits. These data allowed us to analyze the relationship that links the nodes and the edges of the GNs constructed by the studies examined, as well as the other network variables and the co-expression traits.

Our objective in the study is to uncover evidence of self organization that may be relevant for socioeconomic analysis and future research strategies from the three points of view illustrated above—the intrinsic network structure of gene expression and its importance for management of the environment, climate change adaptation and green growth, the implications for gene conservation and crop diversification, and the potential relevance of the new paradigm for evolutionary economics.

In line with the current literature on complex networks (Menezes & Barabasi, 2008), we find that a measure of traffic dispersion (in our case the number of linkages) is linked to a measure of information flow (that we assume proportional to the number of nodes) by a simple, scale-free function, whereby the number of edges tend to increase percentage-wise more than proportionally with respect to the number of nodes, with an important negative effect on such a positive relationship being exercised by co-expression.

In what follows, we introduce the recombinant DNA technique and its application to the biotechnology then present the network theories. We then present the studies and model, followed by our conclusions.

From Multiplicity of Characters to Co-expression

Over the past 100 years, the knowledge of genetic engineers has grown exponentially. Since the 1970s, the classical genetic approaches used for improving organisms were replaced by a more advanced and efficient approach, known as recombinant DNA (rDNA) technique. This new approach has allowed scientists to carry out procedures using genes and DNA that are extremely innovative and powerful. As a consequence, remarkable progress has been made, especially in understanding genes’ chemical structures, divisions, and risk assessment.

The most profound consequence of rDNA technology is our increased knowledge of fundamental life processes. Today, genes do not constitute any more black holes, but can be described in precise chemical terms, manipulated, and reintroduced into the cells of living organisms, with enormous potential for further innovation and progress. An intensive research activity has stimulated the increasing production of hormones, vaccines, therapeutic agents, and diagnostic tools. In turn, this has given great impulse to the biotechnology industry with the creation of a whole range of new products through the development of an intimate relation between universities and industries. Recently, the public debate on rDNA techniques have enhanced the public interest in molecular engineering research even though these kinds of discussions are often exacerbated by the interests in receiving grants and in enhancing the commercialization.

While in the 1970s the principal concern was the effects of rDNA on public health and safety, today the focus is on ethical, legal, and environmental issues and the use of genetically modified plants and animals. The concern consists in the fact that the gene therapy can alter human germ-line genes, modifying the somatic cells.

The scientific community is concerned by the evidence that this technique could modify the germ-line genes with uncontrolled consequences. The environmental and ethical neutrality of modified genes is questionable, and there is no absolute assurance of complete safety. Furthermore, the consequences of the creation of new germ lines are still unknown, so that it appears justified the widespread concern, even in the scientific community, over the uncertainties surrounding the replication of these new forms of life that could unleash energies that cannot be controlled.

rDNA consists of DNA sequences resulting from laboratory methods that bring together genetic material from multiple sources. In this way, it is possible to create sequences that would not have otherwise been found in nature and that constitute a new unit, representing something different and more than the sum of the single original elements.

One of the most important applications of rDNA technologies is the improvement of agricultural crops. Research on maps of genetic linkages have made it possible to study the chromosomal locations of genes for improving crops and other complex traits playing an important role in agriculture (Tanksley & McCouch, 1997).

Scientists succeeded in isolating genes responsible for main adaptive and improvement traits and were able to determine their chemical structure, together with their functions. This knowledge was then used to develop the potential of our wild and cultivated germplasm resources for improving agricultural crops. This means that scientists use DNA to investigate biological information and encoded genetic instructions for the development and the functioning of all known living organisms. Figure 1 illustrates the structure of part of the DNA.

Figure 1. A section of DNA.
Source: Wikipedia page (http://en.wikipedia.org/wiki/DNA)

Because of growing population pressure and lagging production, in recent years higher agricultural productivity has been pursued through various strategies that combine the use of greater farming inputs such as pesticides, fertilizers, and water and innovative agronomic practices. Agriculturists have also pursued genetic crop improvements by crossing genetically related modern varieties. These crops, which can be defined as cross-genetic transgenic and are genetically less productive, have been improved, thanks to the modification of only few genes, which allow them to be more resistant to cold temperatures and impede early germination if they are planted in cold countries.

Although the above techniques and the ensuing construction of seed banks are still important, they are not a guarantee of success for future productivity because scientists must gain deeper knowledge on how to use the genetic material developed. Crop improvement, in fact, requires the ability to introduce into crops genes from a wide variety of sources and is the result of an accurate selection originated from the gene banks.

Furthermore, this procedure works well when it is required only to improve resistance to diseases and insects because this typically requires the introduction of one single dominant gene, but fails for the most important traits in agriculture, which are conditioned not by a single gene, but by several different ones. Scientists noticed that gene banks contain a huge variety of genes, and there is the possibility that, among them, some favorable ones are not yet discovered. In the past few years, biological researchers focused their attention on the genome of different crops. The aim of these analyses was to improve understanding of the domestication and the agricultural improvements of these crops and to set the stage for further investigations. One of the major objectives was the discovery and decoding of genomes of plants, which has been followed by the genome-wide identification of genes. Within crops, understanding complex interactions underlying agronomic traits is of great importance to improve plant breeding.

Plant biologists have performed many different experimental studies to define gene targeting for functional identification, investigation of regulatory mechanisms, or to find potential partners in protein-protein interactions. One increasingly important method used to identify interacting gene sets is represented by the construction of gene co-expression networks. While gene expression is the process by which information from a gene is used in the synthesis of a functional gene product, co-expression networks include genes involved in related biological pathways, which are expressed cooperatively for their functions. Co-expression networks thus constitute a strategy for storing information on the discovery of non-random gene-gene expression dependencies.

In order to build a reliable estimation of co-expressed gene relationships, biologists need a large amount of data from DNA microarray experiments. In recent years, and especially since the decoding of the Arabidopsis genome, all the information relating to the genome sequences has been stored in different databases. Thanks to the various experiments performed by plant biologists, many microarray datasets have been assembled, such as different tissues and chemical treatments, which could be used to predict co-expressed genes and to provide biological information to investigate gene functions (Ogata, Suzuki, Sakurai, & Shibata, 2010).

Co-regulation vs. Co-expression

Since the dawn of microarray technology, scientists were able to investigate and obtain information about gene-to-gene functional relationships. A vast amount of expression data (which were stored in the public databases) were thus produced from various species and evaluated on the basis of the similarity of expression patterns. Gene co-expression data can be used not only to simply classify genes, but also to create gene maps. Furthermore, these data enable scientists to identify new genes that are functionally related to a phenomenon under investigation. Accumulating evidence of this sort indicates that gene order is not completely random and that genes with similar expression levels tend to be clustered within the same genomic neighborhoods (Michalak, 2008). One important goal of the analysis of gene expression data is to qualify the differences between co-regulated and co-expressed genes. Co-regulated genes are those genes which are regulated by common transcription factors, while co-expressed genes are the genes that share similar expression patterns or whose expression levels are highly correlated. Transcription of a gene is determined by the interaction of regulatory proteins (that is, transcription factors) with DNA sequences in the gene’s promoter region (Yeung, Medvedovic, & Bumgarner, 2004). Co-expressed genes are not randomly distributed and tend to cluster within genomic neighborhoods. Co-regulated genes are defined as genes that are regulated by at least one commonly known transcription factor and co-expressed genes as genes that share similar expression patterns as discovered by cluster analysis (Yeung et al., 2004).

In order to investigate the relations between co-regulation and co-expression, Zhang, Zha, Wang, and Chu (2004) retrieved regulator-regulon pairs from the Yeast Promoter Database and examined the expression profiles of the regulons with the same regulator. The data were retrieved by Cho et al. (1998) and the authors used them to generate the plots in Figure 5.

Figure 2 shows a partial co-expression relationship between genes. This means that genes can be co-regulated only in some particular phase of cell cycle or cell development. However, genes can also be regulated by multiple regulators, as it can be noted in Figure 3. Co-regulation does not guarantee a global similarity in gene profiles (Zhang et al., 2004), so new clustering algorithms that consider also the internal connections are needed. To date, despite the many studies which tried to investigate the driving force behind the formation of co-expression clusters, the scientific community still lacks a comprehensive understanding of the creation of a mosaic of co-expression clusters. The different studies performed on a wide variety of species have noted large discrepancies regarding the sizes and locations of these clusters in the same species with no clear-cut boundaries.

Figure 2. Expression profiles of co-regulated genes.
Source: Zhang et al. (2004)

Figure 3. Expression profiles of co-regulated gene groups. Each curve represents the expression profile of a gene. Each sub-plot represents gene expression profiles of a co-regulated gene group. The time range is from 0 min to 160 min.
Source: Zhang et al. (2004)

The Importance of Networks in Biotechnology

After spending decades to disassemble nature, and having provided a wealth of knowledge about the individual components and their functions, biological scientists addressed their attention to a holistic alternative paradigm of investigation, according to which nothing happens in isolation in nature and most of the characteristics of living beings derive from the interactions among their constituents. Scientists thus developed a theory of complexity as a compelling architecture where everything depends on everything else, with networks being the dominant topology. In the words of E.O. Wilson (Strogatz, 2001, p. 1):

    “The greatest challenge today not just in cell biology and ecology but in all of science, is the accurate and complete description of complex systems. Scientists have broken down many kinds of systems. They think they know most of the elements and forces. The next task is to reassemble them, at least in mathematical models that capture the key properties of the entire ensemble.”
The availability of large databases on the topology of various real networks and the increased computing power has offered scientists the chance to investigate in quantitative terms networks of millions of nodes. New high-level data-collection techniques, with the widespread use of microarrays, allow scientists to analyze the status of a cell’s component at any given time. Furthermore, the new technological platforms are able to determine whether single molecules can interact with each other (Barabasi & Oltvai, 2004).

Understanding and unraveling the interactions between the elements of a cell constitutes a major goal for biologists of the genome era. The structure of the interaction network appears relevant to the functioning of the cell, and the network approaches are used to integrate various types of genomics data in order to increase the reliability of predicted interactions. In particular, one can envision that the topology of intracellular networks may provide constraints for the manipulation and the design of cells (Noort, Snel, & Huynen, 2004). An example is represented by the yeast protein interaction network, which is illustrated in Figure 4.

Figure 4. Yeast protein interaction network.
Source: Barabasi and Oltvai (2004)

Intracellular networks can be reconstructed using genomes as sources of data. Examples are represented by the protein interaction networks, genomic association networks, and the evolutionarily conserved co-expression networks. It is possible also to translate gene co-expression networks into discrete networks, because, although these networks are continuously observable, their underlying principles are discrete—the sharing of regulatory elements (Noort et al., 2004).

The studies performed on these networks indicate that the intricate interwoven relationships that govern cellular functions follow a universal law, which is shared by the majority of the complex networks present in nature. In other words, these networks share the same architectural features that characterize other complex networks because they are scale-free, modular, hierarchical, and small world types, characterized by short paths between any two nodes with highly clustered connections.

An example can be represented by the gene regulation system or co-expression network, where the genes are the nodes, which are connected to each other when co-expressed. Moreover, not only the physical interactions between molecules can be represented using graph theory, but also more complicated functional interactions can be analyzed through this nomenclature (Barabasi & Oltvai, 2004). Gene regulation is a general name for a number of sequential processes, the most well known and understood being transcription and translation, which control the level of a gene’s expression, and ultimately result with specific quantity of a target protein. More specifically, a gene regulation system consists of genes, cis-elements, and regulators. The regulators are most often proteins, called transcription factors, as well as small molecules, such as RNAs and metabolites. The interactions and binding of regulators to cis-elements in the cis-region of genes is responsible for the mode and the level of gene expression during transcription. The genes, regulators, and the regulatory connections between them—together with an interpretation scheme—form gene networks.

Gene networks thus appear to conform to the evolution of complex systems, which first originate from an orchestrated activity of many interacting components that can be represented as a series of nodes connected with each other through links or edges. A link connects two nodes (or vertices), and the ensemble constituted by nodes and links generates a graph. A simple, undirected graph with three vertices and three graphs is represented in Figure 5.

Figure 5. A simple undirected graph with three vertices and three edges. Each vertex has degree two, so this is also a regular graph.

Erdos and Renyi (1959) proposed a random growth network theory, based on the hypothesis that a fixed number of nodes are connected randomly to each other. In random networks, the node degree (defined as the number of nodes to which a node is connected) follows a Poisson distribution, which shows that most nodes have approximately the same number of links, and nodes that have significantly more or less links than the average degree are very rare (Barabasi & Oltvai, 2004).

The random graph model was used for decades, until in 1999 Barabasi and Albert proposed an alternative model based on the observation that most real networks are open systems. These systems grow by the continuous addition of new nodes and, in contrast to the Poisson degree distribution, with the fractions of nodes having k edges, in most cases, according to a power law. The Barabasi-Albert model (1999) was inspired by the topological structure of the World Wide Web, a network in continuous evolution and where the number of sites increases dynamically. By exploring several large databases describing the topology of large networks, Barabasi (2003) found that, for most large networks, the degree distribution deviates from the Poisson law and that, in most of cases, follows a power-law for large k.

P(k) ~ kγ (1)

In Equation 1, k stands for the node degree, which represents the number of edges incident with the node, while P stands for the probability that a node chosen uniformly at random has degree k. The value of the exponent γ typically varies between 2 and 3.

The topological characteristic of a scale-free network is determined by two mechanisms that interact inside the network and are absent in the classical random network model: growth and preferential attachment. Growth and preferential attachment have a common origin in protein networks, where the scale-free topology traces back to a biological mechanism called gene duplication. Duplicated genes produce identical proteins, which interact with the same protein partners (Barabasi & Oltvai, 2004).

In this way, highly connected proteins have an advantage because they are more likely to gain new links when a protein is duplicated than the weakly connected ones. However, even though this feature is able to lead to a scale-free topology, there is no certain proof that this mechanism is the only one which is responsible for the creation of power laws in cellular networks.

Recently, an important development in our understanding of the cellular network architecture was the finding that different cellular networks exhibit scale-free topology, at least approximately. The main example is represented by metabolic networks, which have been analyzed with respect to 43 different organisms, with the results indicating a scale-free topology. An additional scale-free network is represented by the protein-protein interactions in different eukaryotic species, where most of the proteins participate in only a few interactions, while a few participate in dozens (Barabasi & Oltvai, 2004). This feature is typical of scale-free networks and may have great relevance to explain dynamic phenomena, such as the capacity of ecosystems to rebound under severe exogenous stress, and, more generally, the very uneven correspondence between phenotypical expression and gene presence.

The analysis of intracellular network topology allows us to include genetic regulatory networks in the typical scale-free organizations, even though not all networks within the cell are characterized by a scale-free distribution. The signature of scale-free networks is represented by the power-law distribution that predicts the number of different genes that interact with a transcription factor. The power law synthetically expresses the fact that many nodes have only few connections, but a small but still significant number of nodes are characterized by many connections. Cellular networks, in fact, are characterized by a huge number of highly connected nodes, which play a fundamental role in determining the network behavior.

Gene co-expression networks have been observed and translated into discrete networks, taking into consideration the sharing of regulatory elements. This analysis has been performed in 2004 by Noort et al. In these networks, genes are the nodes that are connected to each other when they are co-expressed. With respect to other kinds of intracellular networks, this model covers a more inclusive array of functional relations between gene products (Noort et al., 2004). The results of Noort et al.’s analysis show that the distribution of number of links per node is scale-free and, although the average number of connections is 32, most genes are connected to only one specific gene, demonstrating the presence of hubs inside the network.

Furthermore, scale-free networks are robust to failures, and that is the reason why:

    “we rarely notice the effect of router errors and why the disappearance of a few species doesn’t lead to an environmental catastrophe” (Barabasi, 2003, p. 121).
Understanding the topology of complex networks is important to analyze their robustness. With the term robustness—or tolerance to errors—we generally mean the ability of the system to maintain the connectivity after a random deletion of only a fraction of nodes and edges. Network robustness together with network structure creates the foundations of the network’s functional organization.

Random networks appear fragile because they tend to disintegrate in response to the removal of a critical fraction of the nodes. On the contrary, scale-free networks are characterized by a different topology which allows them to be very robust to accidental failures. Scale-free networks do not exhibit a critical threshold for disintegration, because they are composed of hubs with a large amount of links. So, random failures mainly affect the numerous low-degree nodes, whose absence does not disrupt network integrity. Despite their robustness, however, scale-free networks have an Achilles’ heel since they are extremely vulnerable to attacks if several of the largest hubs are simultaneously removed. When random nodes are dismantled, in fact, network integrity is not in danger. The accidental removal of a single hub will not be fatal either, but if we no longer select the nodes randomly, an attack to the hubs would result in a major disruption. Today, the understanding of cellular network robustness is still far from complete, even though recent results supported the hypothesis that these types of networks are robust to many varied perturbations (Barabasi & Oltvai, 2004).

Summarizing, among the inherent properties of networks, robustness and adaptation appear important, with vulnerability arising when hubs are attacked and disrupted. Several studies showed also the existence of an interlink between robustness and modularity, because the ability of a module to evolve plays a key role in developing or limiting robustness (Barabasi & Oltvai, 2004).

Dorogovtsev, Goltsev, and Mendes (2002) found that in the deterministic scale-free networks, clustering coefficients behave according to the following expression.

C(k) ~ k−1 (2)

This indicates that the nodes that have few links are characterized by a high clustering coefficient and highly interconnected small modules, while highly interconnected hubs have a low value of the clustering coefficient, and these nodes are characterized by isolated modules.

The existence of a scaling law in the degree of clustering and the scale-free property were both used by Ravatz and Barabasi (2003) to identify the existence of hierarchical organizations in complex networks. These authors noticed that both properties have generated considerable attention and that a scale-free topology and a high degree of clustering coexist in a large number of real networks.

In gene networks, on the other hand, the strength and the temporal aspects of the interactions must be considered. Despite the significant advances in the past few years, scientists know little about the temporal aspects of the interactions, while they have gained more information about the intensity of connections in the genetic-regulatory networks.

Studies on the degree to which each pair of genes are co-expressed indicate that GNs are characterized by several “hot” links characterized by significant correlation coefficients and that are embedded into a web of less active interactions (Barabasi & Oltvai, 2004). Highly correlated pairs appear to correspond to direct regulatory and protein interactions, as the correlations are higher between proteins that are in the same interactive cluster rather than for proteins that do not interact directly.

The Socio-economic Implications

Gene networks that are dominated by power law present important implications for the economic organization of society and its capacity to evolve and adapt. Randomly dispersed genes would cause evolution to occur as the alteration of the frequency of distinct gene variants (alleles). GNs, on the other hand, based as they are on dominant genes, determine traits and behaviors (through intra-specific polymorphism) mainly by the variation in the regulatory regions of its major component genes and through continuous restructuring of the components of the network.

The first implication of this changing paradigm is a new concept of robustness and fragility of the environment. The power law distribution of genes within networks, in fact, implies robustness toward random attacks, since dominant genes are protected by their low number and the plurality of their connections within the network. At the same time, GNs are vulnerable to attacks directed to connections rather than genes, since these attacks end up by compromising the “hubs” and ultimately threaten the collapse of the whole network.

What does this mean in practice? Under the old paradigm, ecotypes that were threatened by lack of diversification and extinction of selected species could be contrasted by limiting environmental damage and by storing DNA information in gene banks. The new paradigm, instead, implies a different strategy, in that protection of the environment should be highly selective, preserve critical genes, and control contaminants according to their capacity to attack not only directly genes, but also gene connections.

The selective robustness of the ecotypes also provides for different strategies against external shocks, since the network architecture implies a much higher capacity to rebound after seemingly destructive violent shocks, but potentially irreversible destruction under more continuous damage directed to the key elements of the network. Thus, for example, oil spills and other one-shot effects of climatic disasters, such as tornados, hurricanes, and inundations, may be less harmful than persistent contamination of the environment, because the former are unselective attacks that reach the “hub genes” with only low probability, while the latter engineer pervasive damage that attack the connections rather than the genes and thus may compromise the functions of those genes that are connected in more ways with the other genes.

More generally, the results on the widespread existence and regularity of GNs point to more profound potential implications for social life. The genetic-evolutionary paradigm, in fact, has long been based on the notion of bio-organisms as stable entities, inheriting the basic building blocks of their potential—DNA—from their parents and with DNA remaining largely unchanged over lifetimes. This perspective has been assumed true for all biological organisms, including humans, and years of scientific inquiry have been framed in terms of “genes by environment” and “nature versus nurture.” The very basis of the theory of evolution, in this respect, has been that genes are physiologically autonomous from the external environment. In the case of human civilization, this has meant that individuals have been considered fundamentally separate from their ever-changing social environment, despite their obvious dependence on it, on the basis of the notion that genes influence behavior, but not the other way around.

GNs, on the other hand, propose a different model of functioning whereby genes are polymorphous entities, whose expression and frequent co-expression depend on multiple influences of the environment on the other genes and the linkages involved in complex networks. According to this model, genes can be turned on and off by environmental conditions and, because of the power function regulating the distribution of their links in the network, a few crucial genes may be more important than the majority of the other genes involved in determining phenotypical expressions ensuing from network reaction to different environmental conditions. Consistently with these hypotheses, a growing body of evidence (Slavich & Cole, 2013) shows that specific molecular mechanisms mediate the effects of external social conditions on gene expression and that these dynamics can cause social experiences to become biologically embedded.

Meta Analysis

The discussion on rDNA technique and its application on agricultural crops directed our attention to the regulatory mechanisms in protein-to-protein or gene-to-gene interactions. In the course of the past decade, plant biologists were able to decode different genomes of plants and the information stored have been used and combined in a systems genetic approach to analyze the molecular subsystems underlying complex traits.

The most important crops that were subjected to these studies are those included in the Poaceae family and specifically rice (Oryza sativa), maize (Zea mays), wheat (Triticum spp.), and sugarcane (Saccharum officinarum). Within these crops, understanding the complex interactions underlying agronomic traits is of great significance to improve plant breeding along the difficult road of genetic engineering, especially for what concerns the trade-off that the new varieties seem to offer between desirable and non-desirable traits.

In recent years, the complete Arabidopsis genome was decoded and all the relevant information relating to the genome sequences have been stored in different databases. This has allowed plant biologists to perform various experiments where they could assemble many microarray datasets such as different tissues and chemical treatments that could be used to predict co-expressed genes and to provide biological information in order to facilitate investigation of gene functions (Ogata et al., 2010).

The storage of information in different databases not only has facilitated the diffusion of knowledge of the individual genes, but has also allowed the cross-species comparison of relevant co-expressed gene groups. This kind of comparative analysis can be performed by associating co-expressed gene modules with biological information, such as gene ontology and metabolic pathways, and then compare these modules across plant species. The results showed that genes that are highly connected all participate in similar biological processes.

For this reason, in recent years, plant gene co-expression networks have become increasingly popular in different fields of research. Several research groups have identified subsets of highly correlated genes within large gene co-expression networks in Arabidopsis thaliana, rice (Oryza sativa), and maize (Zea mays).

Maize represents an important model organism for fundamental research into the inheritance and functions of genes, the linkages of genes to chromosomes, the recombination, and transposition (Schnable, 2009). Rice plays an important role as a staple food for more than one-half of the world’s population, and it is also one of the most studied grasses. Rice is characterized by different attractive properties: its genome-size is compact, its genome-sequence is known, and high-density genetic maps are available. Because of its close relationship with other cereals, furthermore, rice can provide a rich source of genetic hypotheses associated with complex traits which could easily be translated to other grasses with poor genetic resources.

Arabidopsis thaliana, even though it is not an agricultural species, plays an important role in plant biology because of its experimental characteristics (germination time, observability, measurability of relevant traits, etc.), which became especially attractive when the sequencing of its genome was completely decoded. Since then, thousands of experiments have been performed, and the array data obtained were stored in public databases.

In this article, we report our analysis of the GN experiments performed on these three crops (maize, rice, and Arabidopsis thaliana) by focusing on the outcomes of different research projects aimed to identify gene co-expression networks by examining the co-expression patterns of genes over a large number of experimental conditions. Although gene co-expression networks may model and estimate gene interdependencies for a broad range of plants, they suffer from limitations because they cannot capture all the possible interactions arising among genes in different environmental or temporal conditions. Furthermore, co-expression can be measured only if genes are consistently co-expressed, in the sense that correlations among them can be observed over a sufficient broad range of outcomes. Given these limitations, however, co-expression networks appear to constitute a valuable instrument that can be used to provide glimpses into complex gene-gene interactions.

The different research reports examined show that numerous analytical tools have been used to extract gene relationships and functions from microarray data. Most of these tools are based on the weighted gene correlation network analysis (WGCNA), the Pearson correlation coefficient, and Fisher’s test, with most scientists using methods of cluster analysis to group genes that show similar expression patterns under well-designed experimental conditions (Lee, 2009).

The aim of the clustering methods is to calculate pair-wise relations and similarity measures among genes and gene clusters. Other methods used, for which we refer the interested reader to the specialized literature, include the K-means clustering algorithms, the random matrix theory (RMT), the enrichment analysis, the regularized graphical Gaussian method (GGM), the clique finder, and the robust multi-array average method. We first analyzed 28 studies and collected the data presented on their results; we then extended our analysis to a total of 57 studies. Data from these studies include the number of genes (nodes), the number of edges, the number of co-expressed genes, the number of clusters, modules, and the number of genes that can be assigned to a cluster. In order to develop comparisons across species, we also collected the information available about the traits associated with co-expressed gene modules and the 41 and 101 networks analyzed.

The Estimates

Our approach is based on the idea that edges and nodes can be considered proxies for the information circulating in a network. This information is exchanged by the nodes and contributes to determine the traits expressed in the plant. We conjecture that topological complexity is at the root of the difficulties that bio-scientists encounter in identifying the correspondence between the underlying genes and the observed traits. While co-expression further increases this complexity, it can also provide an important avenue to develop networks or clusters of genes that can simultaneously enhance more traits. Given these premises, we separately estimated the following two equations by means of ordinary least squares (OLS).

Li = α 0 + α 1Ni + α 2 X′ + ei (3)

Ci = α 0 + α 1Ni + α 2 X′ + vi (4)

In Equation 3, L indicates natural logarithm of the number of edges connecting pairs of genes of the network studied, N indicates the natural logarithm of the number of genes in the network, while X′ denotes a vector of dummy variables (see Table 1). Finally ei denotes the error terms.

Table 1. List of dummy variables used in our estimations.
Dummy variables: 1=Presence of the plant/trait/criterion; 0=Absence
Dummy Arabidopsis thaliana
Dummy seed development
Dummy biosynthesis
Dummy resistance
Dummy photosynthesis
Dummy metabolic pathways
Dummy signaling pathways
Dummy stress response
Dummy Pearson correlation coefficient (PCC)

In Equation 4, C denotes the logarithm of the number of co-expressed genes, while N stands for the logarithm of the number of genes inside the network and X′ denotes the same vector of dummy variables as in Equation 3. Finally, vi denotes the error terms, which are assumed to be normally distributed with zero mean and constant variance.

Tables 2, 3, 4, and 5 present the estimation results. We performed two different kinds of estimations of the first model so that they can be divided into two groups of analysis. The first contains only 19 observations, while the second one contains 41 observations. In the first set of regressions, we considered only the studies that contained the same information, while in the second set we tried to consider all the variables by attributing estimated values1 to the missing observations. In general, results appear to be robust, and significant coefficients show the same signs across all regressions.

Our findings in Table 2 point to a scale-free relationship between the number of edges and the number of genes, with a significant increase in the percentage of edges in response to a percentage increase in nodes inside the network. The robustness of this outcome is confirmed by the fact that this relationship is significant and quantitatively stable in all model specifications. The response revolves around (and is not significantly different) from unity. For the variables representing the traits, biosynthesis capacity, stress response, and seed development exhibit a negative impact, while photosynthesis capacity appears to be associated positively with the number of edges.

Table 2. OLS estimate of the (log-log) relationship between the number of edges and the number of nodes (genes) in a sample of genetic networks.
Dependent variable Number of edges
(1) (2) (3) (4) (5) (6)
Nodes 1.13***
(7.65)
0.99***
(7.35)
1.37***
(6.26)
1.36***
(8.21)
1.01***
(8.71)
0.36***
(4.28)
Coex           -0.40***
(-4.45)
Dummy Arabidopsis 0.99*
(1.75)
  -1.73*
(-1.83)
     
Dummy biosynthesis           -1.57***
(-3.28)
Dummy metabolic pathways            
Dummy photosynthesis   2.30**
(3.43)
1.57*
(1.82)
     
Dummy seed development     -0.41
(-0.44)
     
Dummy signaling pathways   1.48**
(2.23)
    1.27**
(2.26)
 
Dummy stress response     -0.16
(-1.27)
    -0.81***
(-2.65)
PCC threshold* edges       3.01***
(3.62)
4.86***
(5.30)
 
C -0.05
(-0.03)
0.96
(0.79)
0.95
(0.43)
-0.54
(0.37)
0.99
(0.94)
13.97***
(16.63)
Observations 41 41 19 21 41 101
R-squared 0.61 0.69 0.82 0.85 0.77 0.36
We control for the imputation by inserting dummy variables, bootstrapped standard errors
* significant at 10%; ** significant at 5%; *** significant at 1%.
Absolute value of t statistics in parenthesis

Results in Table 3, on the other hand, show that a significant scale-free relation can also be estimated between the genes co-expressed in a cluster and the total number of genes of an organism. This relationship, however, is strictly non proportional with the number of co-expressed genes exhibiting an elasticity between 0.6 and 0.79 with respect to the total number of genes.

Table 3. OLS estimate of the (log-log) relationship between the number of co-expressed genes and the number of genes in a sample of genetic networks.
Dependent variable Number of co-expressed genes
(1) (2) (3)
Nodes 0.60***
(5.44)
0.79***
(10.72)
0.65***
(5.82)
Dummy Arabidopsis     0.33
(0.85)
Dummy biosynthesis     -0.83*
(-1.73)
Dummy metabolic pathways -0.75**
(-2.17)
-0.53**
(-2.81)
 
Dummy photosynthesis     -0.30
(-0.60)
Dummy seed development      
Dummy signaling pathways     -1.24***
(-3.08)
Dummy stress response -1.25***
(-3.35)
-0.90***
(-3.42)
-0.83*
(-1.73)
C 2.56*
(2.40)
2.34
(2.20)
1.70
(1.60)
Observations 54 101 54
R-squared 0.56 0.61 0.55
We control for the imputation by inserting dummy variables, bootstrapped standard errors
* significant at 10%; ** significant at 5%; *** significant at 1%.
Absolute value of t statistics in parentheses

The results in Table 4 show that also for the degree distribution (number of edges per nodes), a significant scale-free relationship can be measured both with respect to the number of edges and the number of co-expressed genes, with similar elasticities (around 0.5), but of opposite signs. This implies that the number of edges per node tend to vary positively with the square root of the number of genes and negatively with the square root of the co-expressed genes. In other words, a given percentage increase in the number of genes is met with roughly 0.5% increase in the number of edges per gene (the “degree” of the gene), while a 1% increase in the number of co-expressed genes is met with a reduction of 0.5% increase in the same degree.

Table 4. OLS estimates of the (log-log) relationship between the number of the edges/nodes and the number of genes in a sample of genetic networks.
Dependent variable Number of edges/number of nodes
(1) (2) (3)
Nodes 1.39***
(3.68)
  -0.57***
(-6.74)
Edges   0.55***
(5.95)
 
Coex -0.96**
(-2.62)
-0.53***
(-5.77)
-0.47***
(-5.25)
Dummy Arabidopsis      
Dummy biosynthesis     -0.97***
(-3.06)
Dummy metabolic pathways      
Dummy seed development   0.56*
(1.68)
 
Dummy resistance     1.68**
(2.37)
Dummy signaling pathways      
Dummy stress response   -0.60*
(-1.81)
-1.21***
(-3.78)
PCC threshold* edges   -0.21*
(-1.80)
 
C -1.44
(-0.89)
1.40
(0.84)
14.22***
(16.84)
R-squared 0.45 0.52 0.57
Observations 21 101 101
We control for the imputation by inserting dummy variables, bootstrapped standard errors
* significant at 10%; ** significant at 5%; *** significant at 1%
Absolute value of t statistics in parentheses

Table 5 shows that the ratio between co-expressed genes and edges is also a scale-free function of both the number of nodes and of edges, and that this relationship is less than proportional for the nodes and more than proportional for the edges.

Table 5. OLS estimates of the (log-log) relationship between the number of co-expressed genes/number of edges and the number of genes in a sample of genetic networks.
Dependent variable Number of co-expressed genes/number of edges
(2) (3)
Nodes 0.47***
(5.66)
0.40***
(4.77)
Nodes^2    
Edges -1.44***
(-16.01)
-1.39***
(-15.53)
Edges^2    
Coex    
Dummy Arabidopsis    
Dummy biosynthesis -0.58*
(-1.78)
-0.59*
(-1.86)
Dummy metabolic pathways    
Dummy seed development    
Dummy resistance    
Dummy signaling pathways    
Dummy Stress Response -1.24***
(-4.21)
-1.09***
(-3.71)
PCC threshold* edges   7.84**
(2.62)
C 10.35***
(8.05)
9.99***
(7.95)
R-squared 0.73 0.75
Observations 101 101
We control for the imputation by inserting dummy variables, bootstrapped standard errors
* significant at 10%; ** significant at 5%; *** significant at 1%
Absolute value of t statistics in parentheses

Tables 6-8 summarize the effects of the various traits considered in the studies on connectivity. With few exceptions, these effects are negative, suggesting that the presence of the traits tends to reduce the number of nodes, edges, co-expressed genes, and their ratios.

Table 6. Effect of traits on network connectivity.
Dependent variable Number of edges
(1) (2) (3) (4) (5) (6)
Dummy biosynthesis           -1.57***
(-3.28)
Dummy metabolic pathways            
Dummy photosynthesis   2.30**
(3.43)
1.57*
(1.82)
     
Dummy seed development     -0.41
(-0.44)
     
Dummy signaling pathways   1.48**
(2.23)
    1.27**
(2.26)
 
Dummy stress response     -0.16
(-1.27)
    -0.81***
(-2.65)
Observations 41 41 19 21 41 101

Table 7. Effect of traits on network connectivity.
Dependent variable Number of co-expressed genes
(1) (2) (3)
Dummy biosynthesis     -0.83*
(-1.73)
Dummy metabolic pathways -0.75**
(-2.17)
-0.53**
(-2.81)
 
Dummy photosynthesis     -0.30
(-0.60)
Dummy seed development      
Dummy signaling pathways     -1.24***
(-3.08)
Dummy stress response -1.25***
(-3.35)
-0.90***
(-3.42)
-0.83*
(-1.73)
Observations 54 101 54

Table 8. Effect of traits on network connectivity.
Dependent variable Number of edges/number of nodes
(1) (2) (3)
Dummy biosynthesis     -0.97***
(-3.06)
Dummy metabolic pathways      
Dummy seed development   0.56*
(1.68)
 
Dummy resistance     1.68**
(2.37)
Dummy signaling pathways      
Dummy stress response   -0.60*
(-1.81)
-1.21***
(-3.78)
PCC threshold *edges   -0.21*
(-1.80)
 
R-squared 0.45 0.52 0.57

In sum, connectivity, defined as the number of edges that links a given number of genes, appears to be a network characteristic that is associated through a scale-free relationship to the size of the network (as measured by the number of genes). This relationship can be interpreted as the result of information exchanges, i.e., as a relationship between the information contained and the information exchanged by the genes. Co-expression appears to be a strategy to achieve the same level of information content with lower connection costs and, interestingly, this strategy appears to be echoed by the presence of several co-expressed traits, but not by photosynthesis.

The scale-free relationship between the probability that a node is of degree k and the degree is also known in statistics as the Pareto law and can be expressed mathematically as (K | A,V) = (AV A)/K. In order to test the hypothesis that the average degree of a genetic node follows this law, we first fitted the Pareto distribution to the distribution of the ratios between edges and nodes in our sample. As Table 9 and Figure 6 show, the Pareto distribution fits well the data, and the values of the test allow us to reject the null hypothesis (that the underlying distribution is not a Pareto). Subsequently, we generated the probability values corresponding to our observations and regressed the logarithms of these values against the logarithms of the ratios (i.e., the average degree of each observed node) and the other variables representing the structure of the network (co-expressed genes, clusters, and groups). The results (Table 10) confirm the relationship estimated by Barabasi and Albert (1999), namely P(K) = K, with γ = 1.21. Finally, Table 11 performs the empirical distribution test for K=number of edges/nodes.

Table 9. Effect of traits on network connectivity.
Dependent variable Number of co-expressed genes/number of edges
(2) (3)
Dummy biosynthesis -0.58*
(-1.78)
-0.59*
(-1.86)
Dummy metabolic pathways    
Dummy seed development    
Dummy resistance    
Dummy signaling pathways    
Dummy stress response -1.24***
(-4.21)
-1.09***
(-3.71)
PCC threshold *edges   7.84**
(2.62)
Observations 101 101

Figure 6. Empirical distribution test for ratio.

Table 10. OLS estimates of the scale free (log-log) relationship between the probability that an average node is of degree K, the degree K, and other network characteristics.
Dependent variable Log(P(K))
Log (K) -1.21***
(-263.61)
Log (co-expressed genes) -0.02***
(-4.42)
Log (clusters) -0.01***
(-2.82)
Log (group) -0.13***
(-18.19)
R-squared 0.99
Observations 101

Table 11. Empirical distribution test for K=number of edges/nodes.
Method Test for K
Cramer-von-Mises (W2) 3.61***
Watson (U2) 2.51***
Anderson-Darling 24.34***

Conclusions

Our analysis confirms the existence of several scale-free relations between the components of bionetworks. By focusing on GNs and in particular on maize, rice, and Arabidopsis thaliana, the novelty of our approach consists in the investigation of the role of genes, edges, and co-expressed genes across a wide variety of experiments based on the hypothesis that gene expression follows the network topology. We found that a number of scale-free relations fit well the experimental data and suggest that as the number of the genes increases inside the network, the number of edges increases proportionally, while the number of co-expressed genes increases less than proportionally. We also found that the number of edges per node increases more than proportionally with the number of nodes and less than proportionally both with respect to an increase in the number of edges and a decrease in the number of co-expressed genes. Finally, we found that the probability that a gene has a given number of edges (the “degree”) also follows a scale-free relation with the number of edges of the type found by Barabasi and Albert (1999), and a negative relation with other connective properties of the networks, such as co-expression, clustering, and grouping. These findings appear robust and suggest several conclusions.

First, the hypothesis of GNs appears well supported by quantitative analysis, which confirms both the ubiquity and the consistency of the gene-to-gene relations evidenced by numerous experimental studies. Second, the results of these studies also consistently support the scale-free hypothesis on the topology of the networks and the existence of dominant, hub genes that render the networks particularly robust to non-targeted exogenous shocks. Third, in spite of the ecological resilience which can be inferred from the scale-free topology, the regulatory function of the networks implies an interactive relationship between the genes and the environment, which can make them more sensitive than it was formerly believed. Somewhat paradoxically, and perhaps even more importantly for sustainable economic development, one can thus expect ecotypes to be more robust to external shocks and, at the same time, gene co-expression more sensitive to the conditions of the environment. Finally, a strategy of bio-technological research aimed to identify relevant clusters of co-expression may be more successful than one aimed at identifying single traits or groups of traits and corresponding gene determinants. The fact that the presence of desirable traits is mostly associated with a reduction of the number of genes, edges, and other network connections seems to reinforce this conjecture.

Endnotes

1 For the missing data, we used the values predicted of the dependent variable (based on the equativo estimated with the smaller number of observations) and the average values for the independent variable.

References

Andrews, M., & Praeger, R. (1994). Genetic programming for the acquisition of double auction market strategies. In K. Kinnear (Ed.), Advances in genetic programming. Cambridge, MA: MIT Press.

Angelovici, R., Fait, A., Zhu, X., Szymanski, J., Feldmesser, E., Fermie, A.R., & Galili, G. (2009). Deciphering transcriptional and metabolic networks associated with lysine metabolism during Arabidopsis seed development. Plant Physiology, 151(4), 2058-2070.

Aoki, K., Ogata, Y., Shibata, D. (2007). Approaches for extracting practical information from gene co-expression networks in plant biology. Plant & Cell Physiology, 48(3), 381-390.

Barabasi, A.L. (2003). Linked. New York: Plume.

Barabasi, A.L. (2009). Scale-free networks: A decade and beyond. Science, 325(5939), 412-413.

Barabasi, A.L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286, 509.

Barabasi, A.L., & Oltvai, Z.N., (2004). Network biology: Understanding the cell’s functional organizations. Nature Reviews Genetics, 5, 101-113.

Bassel, G.W., Glaab, E., Marquez, J., Holdsworth, M.J., & Bacardit, J. (2011). Functional network construction in Arabidopsis using rule-based machine learning on large-scale data sets. The Plant Cell, 23(9), 3101-3116.

Boldogkoi, Z. (2004). Gene network polymorphism is the raw material of natural selection: The selfish gene network hypothesis. Journal of Molecular Evolution, 59(3), 340-357.

Burdon, J.J., Thrall, P.H., & Ericson, L. (2006). The current and future dynamics of disease in plant communities. Annual Review of Phytopathology, 44, 19-39.

Carrera, J., Rodrigo, G., Jaramillo, A., & Elena, S. (2009). Reverse engineering the Arabidopsis thaliana transcriptional network under changing environmental conditions. Genome Biology, 10(9), R96.

Chen, S.H., & Yeh, C.H. (1996). Genetic programming learning and the cobweb model. In P. Angeline (Ed.), Advances in genetic programming (pp. 443-466). Cambridge, MA: MIT Press.

Chen, S.H., & Yeh, C.H. (1997). Towards a computable approach to the efficient market hypothesis: An application of genetic programming. Journal of Economic Dynamics and Control, 21, 1043-1063.

Chen, S.H., & Yeh, C.H. (1999). Modeling the expectations of inflation in the OLG model with genetic programming. Soft Computing, 3(2), 53-62.

Chen, S.H., & Yeh, C.H. (2000a). Simulating economic tradition processes by genetic programming. Annals of Operations Research, 97, 265-286.

Chen, S.H., & Yeh, C.H. (2000b). Evolving traders and the business school with genetic programming: A new architecture of agent-based artificial stock market. Journal of Economic Dynamics and Control, 25, 363-393.

Childs, K.L., Davidson, R.M., & Buell, C.R. (2011). Gene co-expression network analysis as a source of functional annotation for rice genes. PLOS ONE, 6(7), e22196.

Cho, R.J., Campbell, M.J., Winzeler, E.A., Steinmetz, L., Conway, A., Wodicka, L., et al. (1998). A genome-wide transcriptional analysis of the micotic cell cycle. Molecular Cell, 2(1), 65-73.

Davidson, R.M., Hansey, C.N., Gowda, M., Childs, L.K., Lin, H., Vaillancourt, B., et al. (2011). Utility of RNA sequencing for analysis of maize reproductive transcriptomes. The Plant Genome, 4(3), 191-203.

Derbyshire, P., Drea, S., Shaw, P.J., Doonan, J.H., Dolan, L. (2008). Proximal-distal patterns of transcription factor gene expression during Arabidopsis root development. Journal of Experimental Botany, 59(2), 235-245.

Dorogovtsev, S.N., Goltsev, A.V., & Mendes, J.F.F. (2002). Pseudofractal scale-free web. Physical. Review E, 65, 066122.

Erdos, P., & Renyi, A. (1959). On random graphs. Publicationes Mathematicae (Debrecen), 6, 290-297.

Ficklin, S.P., & Feltus, A. (2011). Gene co-expression network alignment and conservation of gene modules between two grass species: Maize and rice. Plant Physiology, 156(3), 1244-1256.

Ficklin, S.P., Luo, F., & Feltus, A. (2010). The association of multiple interacting genes with specific phenotypes in rice using gene co-expression networks. Plant Physiology, 154(1), 13-24.

Fu, F.F., & Xue, H.W. (2010). Co-expression analysis identifies rice starch regulator: A rice AP2/EREBP family transcription factor as a novel rice starch biosynthesis regulator. Plant Physiology, 154(2), 927-938.

Fukushima, A., Kanaya, S., & Arita, M. (2009). Characterizing gene co-expression modules in Oryza sativa based on a graph-clustering approach. Plant Biotechnology, 26(5), 485-494.

Garrett, K.K., Dendy, S.P., Frank, E.E., Rouse, M.N., & Travers, S.E., (2006). Climate change effects on plant disease: Genomes to ecosystems. Annual Review Phytopathology, 44, 489-509.

Geils, BW. (1992). Analyzing landscape patterns caused by forest pathogens: A review of the literature. In S. Frankle (Ed.), Proceedings of the 40th Western International Forest Disease Work Conference. Durango, CO: USDA, Forest Service, Pacific Southwest Region, 21-32.

Gifford, M.L., Dean, A., Gutiérrez, R.A., Coruzzi, G.M., & Birnbaum, K.D. (2008). Cell-specific nitrogen responses mediate developmental plasticity. Proceedings of the National Academy of Sciences, 105(2), 803-808.

Gilligan, P., Brenner, S., & Venkatesh, B. (2002). Fugu and human sequence comparison identifies novel human genes and conserved non-coding sequences. Elsevier Science, 294(1-2), 35-44.

Gutiérrez, R.A., Lejay, L.V., Dean, A., Chiaromonte, F., Shasha, D.E., & Coruzzi, G.M. (2007). Qualitative network models and genome-wide expression data define carbon/nitrogen-responsive molecular machines in Arabidopsis. Genome Biology, 8(1), R7.

Gutiérrez, R.A., Gifford, M.L., Poultney, C., Wang, R., Shasha, D.E., Coruzzi, G.M., Crawford, N.M. (2007). Insights into the genomic nitrate response using genetics and the Sungear Software System. Journal of Experimental Botany, 58(9), 2359-2367.

Gutiérrez, R.A., Stokes, T.L., Thum, K., Xu, X., Obertello, M., Katari, M.S., et al. (2008). Systems approach identifies an organic nitrogen-responsive gene network that is regulated by the master clock control gene CCA1. Proceedings of the National Academy of Sciences, 105(12), 4939-4944.

Hamada, K., Hongo, K., Suwabe, K., Shimizu, A., Nagayama, T., Abe, R., et al. (2011). OryzaExpress: An integrated database of gene expression networks and osmic annotations in rice. Plant Cell Physiology, 52(2), 220-229.

Hirai, M.Y., Yano, M., Goodenowe, D.B., Kanaya, S., Kimura, T., Awazuharas, M., et al. (2004). Integration of transcriptomics and metabolomics for understanding of global responses to nutritional stresses in Arabidopsis thaliana. Proceedings of the National Academy of Sciences, 101(27), 10205-10210.

Holdenrieder, S., Stieber, P., & Pawel, J. (2004). Circulating nucleosomes predict the responses to chemotheraphy in patients with advanced non-small cell lung cancer. Clinical Cancer Research, 10, 5981-5987.

Holland, J.L. (1975). Dilemmas and remedies. Personnel and Guidance Journal, 53, 517-519.

Honys, D., & Twell, D. (2004). Transcriptome analysis of haploid male gametophyte development in Arabidopsis. Genome Biology, 5(11), R85.

Horan, K., Jang, C., Bailey-Serres, J., Mittler, R., Shelton, C., Harper, J.F., et al. (2008), Annotating genes of known and unknown function by large scale co-expression analysis. Plant Physiology, 147(1), 41-57.

Jaeger, J., Blagov, M., Kosman, D., Kozlov, K.N., Manu, Myasnikova, E., et al. (2004). Dynamical analysis of regulatory interactions in the gap gene system of drosophila melanogaster. Genetics, 167(4), 1721-1737.

Jen, C.H., Manfield, I.W., Michalopoulos, I., Pinney, J.W., Willats, W.G., Gilmartin, P.M., & Westhead, D.R. (2006). The Arabidopsis co-expression tool (act): A WWW-based tool and database for microarray-based gene expression analysis. Plant Journal, 46(2), 336-348.

Koichiro, A, Go, S., Keita, S., Tokunori, H., Hirokazu, T., Katsuhiro S., et al. (2011). Comprehensive network analysis of anther-expressed genes in rice by the combination of 33 laser microdissection and 143 spatiotemporal microarrays. PLOS ONE, 6(10), e26162.

Koza, J.R. (1992). Genetic programming: on the programming of computers by means of natural selection. Cambridge, MA: MIT Press.

Lee, T.H., Kim, Y.K., Pham, T.T., Song, S.I., Kim, J.K., Kang, K.Y., et al. (2009). RiceArrayNet: A database for correlating gene expression from transcriptome profiling and its application to the analysis of co-expressed genes in rice. Plant Physiology, 151(1), 16-33.

Leea, I., Seo, Y.S., Coltrane, D., Hwang, S., Oh, T., Marcotte, E.M., & Ronald, P.C. (2011). Genetic dissection of the biotic stress response using a genome-scale gene network for rice. Proceedings of National Academy of Sciences, 108(45), 18548-18553.

Lensberg, T. (1999). Investment behavior under Knightian uncertainty: An evolutionary approach. Journal of Economic and Dynamic Control, 23, 1587-1604.

Liu, X., Fu, J., Gu, D., Liu, W., Liu, T., Peng, Y., et al. (2008). Genome-wide analysis of gene expression profiles during the kernel development of maize (Zea mays L.). Genomics, 91(4), 378-387.

Lian, X., Wang, S., Zhang, J., Feng, Q., Zhang, L., Fan, D., et al. (2006). Expression profiles of 10,422 genes at early stage of low nitrogenstress in rice assayed using a cDNA microarray. Plant Molecular Biology, 60, 617-631.

Lundquist, J.E., & Hamelin, R.C. (2005). Forest pathology: From genes to landscapes. American Phytopathological Society, 175.

Ma, S., Gong, Q., & Bohnert, H.J. (2007). An Arabidopsis gene network based on the graphical Gaussian model. Genome Research, 17(11), 1614-1625.

Mao, L., Hemert, J., Das, S., & Dickerson, J.A. (2009). Arabidopsis gene co-expression network and its functional modules. BMC Bioinformatics, 10, 346.

Meier, S., Gehring, C., Ross, C., MacPherson, Kaur, M., Maqungo, M., et al. <(2008). The promoter signatures in rice LEA genes can be used to build a co-expressing LEA gene network. Rice, 1(2), 177-187.

Menezes, M.A., & Barabasi, A.L. (2004). Fluctuations in network dynamics. Physical Review Letters, 92, 028701.

Michalak, P. (2008). Coexpression, coregulation and cofunctionality of neighboring genes in aukaryotic genomes. Genomics, 91(3), 243-248.

Movahedi, S. Van De Peer, Y. & Vanderpoele, K. (2011), Comparative network analysis reveals that tissue specificity and gene function are important factors influencing the mode of expression evolution in Arabidopsis and rice. Plant Physiology, 156, 1316-1330.

Nikiforova, V., Freitag, J., Kempa, S., Adamik, M., Hesse, H., & Hoefgen, R. (2003). Transcriptome analysis of sulfur depletion in Arabidopsis thaliana: Interlacing of biosynthetic pathways provides response specificity. The Plant Journal: For Cell and Molecular Biology, 33(4), 633-650.

Nikiforova, V.J., Daub, C.O., Hesse, H., Willmitzer, L., Hoefgen, R. (2005). Integrative gene-metabolite network with implemented causality deciphers informational fluxes of sulphur stress response. Journal of Experimental Botany, 56(417), 1887-1896.

Noort, V., Snel, B., & Huynen, M.A. (2004). The yeast coexpression network has a small-world, scale-free architecture and can be explained by a simple model. European Molecular Biology Organization Report, 5(3), 280-284.

Obayashi, T., Kinoshita, K., Nakai, K., Shibaoka, M., Hayashi, S., Saeki, M., et al. (2007). ATTED-II: A database of co-expressed genes and cis-regulatory elements for identifying co-regulated gene groups in Arabidopsis. Nucleid Acid Research, 35, D863-869.

Obayashi, T., & Kinoshita, K. (2010). COEXPRESdb: A database to compare gene co-expression in seven model animals. Nucleic Acid Research, 39, D1016-1022.

Ogata, Y., Suzuki, H., Sakurai, N., & Shibata, D. (2010) Cop: A database for characterizing co-expressed gene modules with biological information in plants. Bioinformatics, 26(9), 1267-1268.

Pautasso, M., Holdenrieder, O., & Stenlid, J. (2005). Susceptibility to fungal pathogens of forests differing in tree diversity. In M. Scherer-Lorenzen, C. Koerner, & D. Schulze (Eds.), Forest diversity and function, 263-289.

Persson, S., Wei, H., Milne, J., Page, G.P., & Somerville, C.R. (2005). Identification of genes required for cellulose synthesis by regression analysis of public microarray data sets. Proceedings of the National Academy of Sciences, 102(24), 8633-8638.

Qiu, D., Xiao, J., Xie, W., Hongbo, L., Xianghua, L., Xiong, L., & Wang, S. (2008). Rice gene network inferred from expression profiling of plants overexpressing OsWRKY13 a positive regulate of disease resistance. Molecular Plant, 1(3), 538-551.

Ravatz, E., & Barabasi, A.L. (2003). Hierarchical organization in complex networks. Physical Review, E67, 026112.

Ruan, J., Perez, J., Hernandez, B., Lei, C., Sunter, G., & Sponsel, M.V. (2011). Systematic identification of functional modules and cis-regulatory elements in Arabidopsis thaliana. BMC Bioinformatics, 12, S2.

Schnable, P.S., Ware, D., Fulton, R.S., Stein, J.C., Wei, F., Pasternak, S., et al. (2009). The B73 maize genome: Complexity, diversity and dynamics. Science, 326(5956), 1112-1115.

Shinozaki, K., & Yamaguchi-Shinozaki, K. (2006). Gene networks involved in drought stress response and tolerance. Journal of Experimental Botany, 58(2), 221-227.

Slavich, G.M., & Cole, S.M. (2013). The emerging field of human social genomics. Clinical Psychological Science, 1(3), 331-348.

Stein, R.J., & Waters, B.M. (2012). Use of natural variation reveals core genes in the transcriptome of iron-deficient Arabidopsis thaliana roots. Journal of Experimental Botany, 63(2), 1039-1055.

Strogatz, S.H. (2001). Exploring complex networks. Nature, 410, 268-276.

Stukenbrock, E.H., Banke, S., & McDonald, B.A. (2006). Global migration patterns in the fungal wheat pathogen Phaeosphaeria nodorum. Molecular Ecology, 15, 2895-2904.

Swarbreck, S.M., Defoin-Platel, M., Hindle, M., Saqi, M., & Habash, D.Z. (2011). New perspectives on glutamine synthetase in grasses. Journal of Experimental Botany, 62(4), 1511-1522.

Tanksley, S.D., & McCouch, S.R. (1997). Seed banks and molecular maps: Unlocking genetic potential from the wild. Science, 277(5329), 1063-1369.

Vandepoele, K., Quimbaya, M., Casneuf, T., Veylder, L., & Van De Peer, Y. (2009). Unraveling transcriptional control in Arabidopsis using cis-regulatory elements and co-expression networks. Plant Physiology, 535-549.

Wang, R., Guegler, K., LaBrie, S.T., & Crawford, N.M. (2000). Genomic analysis of a nutrient response in Arabidopsis reveals diverse expression patterns and novel metabolic and potential regulatory genes induced by nitrate. The Plant Cell, 12(8), 1491-509.

Wei, H., Persson, S., Mehta, T., Srinivasainagendra, V., Chen, L., Page, G.P., et al. (2006). Transcriptional coordination of the metabolic network in Arabidopsis. Plant Physiology, 142(6), 762-774.

Wille, A., Zimmermann, P., Vranová, E., Fürholz, A., Laule, O., Bleuler, S., et al. (2004). Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana. Genome Biology, 5, R92.

Wilson, E.O. (1998). Consilience. New York: Knopf.

Xue, L.J., Zhang, J-J, & Xue, H. (2012). Genome-wide analysis of complex transcriptional networks of rice developing seeds. PLOS ONE, 7(2), e31081.

Yang, T.J.W., Lin, W-D., & Schmidt, W. (2010). Transcriptional profiling of the Arabidopsis iron deficiency response reveals conserved transition metal homeostasis networks. Plant Physiology, 152(4), 2130-2141.

Yeung Ka, Y., Medvedovic, M., & Bumgarner, R.E. (2004). From coexpression to coregulation: How many microarray experiments do we need? Genome Biology, 5(7), R48.

Zhang, Y., Zha, H., Wang, J.Z., & Chu, C.H. (2004). Gene co-regulation vs. co-expression. University Park, PA: The Pennsylvania State University.

Zheng, X., Liu, T., Yang Z., & Wang, J. (2011). Large clique in Arabidopsis gene co-expression network and motif discovery. Plant Physiology, 168(6), 611-618.

Appendix

Tables A1 and A2 summarize the studies considered in our analysis.

Table A1. Summary of the studies considered in our analysis.
Study title Authors Review Type of plant
The association of multiple interacting genes with specific phenotypes in rice using gene co-expression networks Ficklin, Luo, and Feltus (2010) Plant Physiology Rice
Gene co-expression network alignment and conservation of gene modules between two grass species: Maize and rice Ficklin and Feltus (2011) Plant Physiology Rice and maize
Gene co-expression network analysis as a source of functional annotation for rice genes Childs, Davidson, and Burrell (2011) PLOS ONE Rice
RiceArrayNet: A database for correlating gene expression from transcriptome profiling, and its application to the analysis of co-expressed genes in rice Lee et al. (2009) Plant Physiology Rice
Utility of RNA sequencing for analysis of maize reproductive transcriptomes Davidson et al. (2011) The Plant Genome Maize
Comprehensive network analysis of anther-expressed genes in rice by the combination of 33 laser microdissection and 143 spatiotemporal microarrays Koichiro et al. (2011) PLOS ONE Rice
Genetic dissection of the biotic stress response using a genome-scale gene network for rice Leea et al. (2011) Proceedings of National Academy of Sciences (PNAS) Rice and maize
The promoter signatures in rice LEA genes can be used to build a co-expressing LEA gene network Meier et al. (2008) Rice Rice
Rice gene network inferred from expressing profiling of plants overexpressing OsWRKY13, a positive regulate of disease resistance Qiu et al. (2008) Molecular Plant Rice
Genome-wide analysis of complex transcriptional networks of rice developing seeds Xue, Zhang, and Xue (2012) PLOS ONE Rice
Genome-wide analysis of gene expression profiles during the kernel development of maize (Zea Mays L.) Liu et al. (2008) Genomics Maize
Characterizing gene co-expression modules in Oryza sativa based on a graph-clustering approach Fukushima, Kanaya and Arita (2009) Plant Biotechnology Rice
Coexpression analysis identifies rice starch regulator: A rice AP2/EREBP family transcription factor as a novel rice starch biosynthesis regulator Fu and Xue (2010) Plant Physiology Rice
OryzaExpress: An integrated database of gene expression networks and osmic annotations in rice Hamada et al. (2011) Plant Cell Physiology Rice
Comparative network analysis reveals that tissue specificity and gene function are important factors influencing the mode of expression evolution in Arabidopsis and rice Movahedi, Van De Peer, & Vanderpoele (2011) Plant Physiology Arabidopsis thaliana and rice
Arabidopsis gene co-expression network and its functional modules Mao, Hmert, Das, and Dickerson (2009) BMC Bioinformatics Arabidopsis thaliana
Systematic identification of functional modules and cis-regulatory elements in Arabidopsis thaliana Ruan et al. (2011) BMC Bioinformatics Arabidopsis thaliana
An Arabidopsis gene network based on the graphical Gaussian model Ma, Gong, and Bohnert (2007) Genome Research Arabidopsis thaliana
Annotating genes of known and unknown function by large scale co-expression analysis Horan et al. (2008) Plant Physiology Arabidopsis thaliana
Qualitative network models and genome-wide expression data define carbon/nitrogen-responsive molecular machine in Arabidopsis Gutiérrez et al. (2007) Genome Biology Arabidopsis thaliana
Functional network construction in Arabidopsis using rule-based machine learning on large-scale data sets Bassel, Glaab, Marquez, Holdsworth, and Bacardit (2011) Plant Biology Arabidopsis thaliana
Unraveling transcriptional control in Arabidopsis using cis-regulatory elements and co-expression networks Vandepoele, Quimbaya, Casneuf, Veylder, & Van De Peer (2009) Plant Physiology Arabidopsis thaliana
Transcriptional coordination of the metabolic network in Arabidopsis Wei et al. (2006) Plant Physiology Arabidopsis thaliana
Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana Wille et al. (2004) Genome Biology Arabidopsis thaliana
The Arabidopsis co-expression tool (act): A WWW-based tool and database for microarray-based gene expression analysis Jen et al. (2006) Plant Journal Arabidopsis thaliana
Large clique in Arabidopsis gene co-expression network and motif discovery Zheng, Liu, Yang, and Wang (2011) Plant Physiology Arabidopsis thaliana
ATTED-II: A database of co-expressed genes and cis-elements for identifying co-regulated gene groups in Arabidopsis Obayashi et al. (2007) Nucleid Acid Research Arabidopsis thaliana
Reverse engineering the Arabidopsis thaliana transcriptional network under changing environmental conditions Carrera, Rodrigo, Jaramillo, and Elena (2009) Genome Biology Arabidopsis thaliana
Integrative gene-metabolite network with implemented causality deciphers informational fluxes of sulphur stress response Nikiforova et al. (2005) Journal of Experimental Botany Arabidopsis thaliana
Deciphering transcriptional and metabolic networks associated with lysine metabolism during Arabidopsis seed development Angelovici et al. (2009) Plant Physiology Arabidopsis thaliana
Transcriptome analysis of haploid male gametophyte development in Arabidopsis Honys and Twell (2004) Genome Biology Arabidopsis thaliana
Proximal-distal patterns of transcription factor gene expression during Arabidopsis root development Derbyshire, Drea, Shaw, Doonan, and Dolan (2008) Journal of Experimental Botany Arabidopsis thaliana
Transcriptome analysis of sulfur depletion in Arabidopsis thaliana: Interlacing of biosynthesis pathways provides response specificity Nikiforova et al. (2003) The Plant Journal Arabidopsis thaliana
New perspective on glutamine synthease in grasses Swarbreck, Defoin-Platel, Hindle, Saqi, and Habash (2011) Journal of Experimental Botany Rice
Approaches for extracting practical information from gene co-expression networks in plant biology Aoki, Ogata, and Shibata (2007) Plant and Cell Physiology Arabidopsis thaliana
Identification of genes required for cellulose synthesis by regression analysis of public microarray data sets Persson, Wei, Milne, Page, and Somerville (2005) PNAS Arabidopsis thaliana
Integration of transcriptomics and metabolomics for understanding of global responses to nutritional stresses in Arabidopsis thaliana Hirai et al. (2004) PNAS Arabidopsis thaliana
Use of natural variation reveals core genes in the transcriptome of iron-deficient Arabidopsis thaliana roots Stein and Waters (2012) Journal of Experimental Botany Arabidopsis thaliana
Transcritional profiling of the Arabidopsis iron deficiency response reveals conserved transition homeostasis networks Yang, Lin, and Schmidt (2010) Plant Physiology Arabidopsis thaliana
Gene networks involved in drought stress response and tolerance Shinozaki and Yamaguchi-Shinozaki (2006) Journal of Experimental Botany Arabidopsis thaliana
Insights into the genomic nitrate response using genetics and the Sungear software system Gutiérrez et al. (2007) Journal of Experimental Botany Arabidopsis thaliana
Systems approach identifies an organic nitrogen-responsive gene network that is regulated by the master clock control gene CCA1 Gutiérrez et al. (2008) PNAS Arabidopsis thaliana
Genomic analysis of a nutrient response in Arabidopsis reveals diverse expression patterns and novel metabolic and potential regulatory genes induced by nitrate Wang, Guegler, LaBrie, and Crawford (2000) The Plant Cell Arabidopsis thaliana
Cell-specific nitrogen responses mediate developmental plasticity Gifford, Dean, Gutiérrez, Coruzzi, and Birnbaum (2008) PNAS Arabidopsis thaliana
Expression profiles of 10,422 genes at early stage of low nitrogen stress in rice assayed using a cDNA microarray Lian et al. (2006) Plant Molecular Biology Rice

Table A2. Summary of the studies examined for our analysis.
Item Hypothesis Analysis performed Features of the network Results Traits
The association of multiple interacting genes with specific phenotypes in rice using gene co-expression networks Construct a gene co-expression network WGCNA,A RMT,B and Fisher’s test 4,528 genes;
43,144 edges;
45 modules
The network is scale-free, modular, small-world, and hierarchical Molecular functions
Gene co-expression network alignment and conservation of gene modules between two grass species: Maize and rice Identify co-expressed gene sets between maize and rice WGCNA, RMT- and Functional Enrichment Analysis The maize network: 31,983 edges
2,708 genes
The rice network: 2,257 genes
43,144 edges
This method is successful in identifying candidate genes for specific traits Seed germination, biotic and abiotic stresses, seed development, influorescence, signaling pathways
Gene co-expression network analysis as a source of functional annotation for rice genes Identify modules of highly co-expressed genes WGCNA 18,598 genes The condition-dependent gene expression experiment is extremely useful as a modules identifier Biotic and abiotic stress, seed germination time course, inflorescenze, seed development, signaling pathway, cytokinin, photoperiod, thermoperiod time courses
RiceArrayNet: A database for correlating gene expression from transcriptome profiling, and its application to the analysis of co-expressed genes in rice Construct a database able to study co-expression patterns in rice Pearson’s r, p-value, and z-score 58,417 genes It gives information on co-expressed patterns in rice Response to abiotic stress
Utility of RNA sequencing for analysis of maize reproductive transcriptomes Identify modules of highly correlated genes WGCNA 8,751 genes;
17 modules
Modules of genes have been identified Developing seed, male, female, leaf tissues
Comprehensive network analysis of anther-expressed genes in rice by the combination of 33 laser micro-dissection and 143 spatiotemporal microarrays Construct high-resolution co-expressed sub networks Pearson’s r, Functional enrichment analysis 24,258 genes Meiosis and pollen-wall synthesis networks were created Meiosis and pollen-wall synthesis
Genetic dissection of the biotic stress response using a genome-scale gene network for rice Create a network that can predict the functions of different rice genes Bayesian log likelihood scoring scheme, weighted sum method 18,377 genes;
588,221 edges
RICENET is highly predictive for diverse processes in rice Biotic stress response
The promoter signatures in rice LEA genes can be used to build a co-expressing LEA gene network Identify groups of co-expressed genes during certain cellular processes MPSSC analysis 41,047 genes; 11,488 co-expressed genes 73.36% of the genes co-expressed with the LEA genes Late embrypgenesis abundant genes, maturing rice embryos
Rice gene network inferred from expressing profiling of plants overexpressing OsWRKY13, a positive regulate of disease resistance Verify that OsWRKY13 is a regulator of disease resistance SAM,D Fisher’s test 22,295 genes;
12,597 co-expressed genes
OsWRKY13 is a regulator of disease resistance Disease resistance and biotic and abiotic stress response
Genome-wide analysis of complex transcriptional networks of rice developing seeds Verify if the transcriptome analysis can provide clues on rice development GC_RMAE method and PCCF 46,857 genes;
31,451 co-expressed genes
The transcriptome analysis can provide clues on rice development Rice development
Genome-wide analysis of gene expression profiles during the Kernel development of maize (Zea Mays L.) Examine the pathways during the kernel development k-means clustering algorithm 58,000 genes;
3,445 co-expressed genes
The pathways during the kernel development have been elucidated Cell division and kernel filling
Characterizing gene co-expression modules in oryza sativa based on a graph-clustering approach Identify co-expression modules in rice GC_RMA, PCC 4,495 genes;
32,544 edges;
1,220 clusters
This network has a modular structure and a scale-free nature Seed development, biotic stress
Co-expression analysis identifies rice starch regulator: A rice AP2/EREBP family transcription factor as a novel rice starch biosynthesis regulator Studying the starch biosynthesis PCC, qRT-PRCG 46,284 genes 928 genes of 46,284 are associated with starch biosynthesis Starch biosynthesis
OryzaExpress: An integrated database of gene expression networks and osmic annotations in rice Construct a gene co-expression network RMA,H Correspondence analysis (CA) calculations 15,743 genes OryzaExpress provides integrated information on GENs in rice Metabolic pathways, protein-protein interactions
Comparative network analysis reveals that tissue specificity and gene function are important factors influencing the mode of expression evolution in Arabidopsis and rice Measure expression context conservation (ECC) Comparative analysis of relationships 4,600 genes;
3,565 co-expressed genes
Incorporating expression information helps to provide better results Expression context conservation, tissues specificities and protein evolution
Arabidopsis gene co-expression network and its functional modules Comparing gene co-expression and random networks Pearson’s r 6,206 genes;
512,936 edges
The co-expression network is scale-free and modular Metabolic pathways, photosynthesis, chloroplast organization, biogenesis, DNA metabolism, cofactor metabolism
Systematic identification of functional modules and cis-regulatory elements in Arabidopsis thaliana Construct a parameter-free model for generating a gene co-expression network PCC, mKNN (mutual k-nearest neighbor), HQcut (module detect algorithm) Co-expression network:
22,591 genes
707,602 edges
Random network:
3,183 genes
42,354 edges
This method is effective in finding functional modules Biosynthesis and signaling
An Arabidopsis gene network based on the graphical Gaussian model Verify the robustness of the GGM model Regularized GGMI Pilot study:
820 genes
828 edges
Real network:
6,760 genes
18,625 edges
GGM reveals important gene interactions Metabolic functions, stress response, biochemical pathways, cell wall metabolism, and cell response
Annotating genes of known and unknown function by large scale co-expression analysis Analyze the genome-wide protein of unknown function inside the network Clusters analysis and enrichment analysis 11,077 genes;
916 clusters
It allows to identify many groups of potentially co-regulated genes Abiotic stress response, photosynthesis, cell wall pathways, and ribosome assembly
Qualitative network models and genome-wide expression data define carbon/nitrogen-responsive molecular machine in Arabidopsis Identify carbon and nitrogen metabolites in the Arabidopsis whole genome ANOVAJ 7,635 genes;
230,900 edges
Carbon and nitrogen coordinates regulation of sets of molecular machines in the plant cell Interactions between carbon and nitrogen, metabolic pathways, protein degradation, and auxin signaling
Functional network construction in Arabidopsis using rule-based machine learning on large-scale data sets Investigate the machine learning methods BioHELK, MCODEL 13,532 genes; 146,933 edges It was demonstrated the utility of seed co prediction network (SCoPNet) Seed development, signal pathways and germination
Unraveling transcriptional control in Arabidopsis using cis-regulatory elements and co-expression networks Investigate the applicability of co-expression networks to infer functional information on genes PCC, clustering method, enrichment analysis, PCC, RMA ATHO90:
19,716 genes
13,580,283 edges
ATH95:
18,861 genes
6,765,135 edges
ATH99:
14,187 genes; 1,504,781 edges
It was possible to link many genes and motifs to specific biological functions Photosynthesis, biosynthesis, light reaction, starch metabolism
Transcriptional coordination of the metabolic network in Arabidopsis Verify if genes belonging to the same pathways are more co-expressed than genes from different metabolic pathways Regression analysis 1,330 genes Genes belonging to the same pathways are more co-expressed than genes from different metabolic pathways Metabolic pathways
Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana Elucidate the regulatory network of the two isoprenoid biosynthesis pathways GGM 40 genes
178 edges
Interactions between the two isoprenoid biosynthesis pathways have been reported Isoprenoid biosynthesis
The Arabidopsis co-expression tool (ACT): A WWW-based tool and database for microarray-based gene expression analysis Construct a new WWW-based tool for plant gene analysis Clique finder 21,891 genes ACT offers a wide range of analysis features Biosynthesis, response to environmental stimuli
Large clique in Arabidopsis gene co-expression network and motif discovery Detect large clusters and predict corresponding motifs PCC, clique-finding algorithm 1,087,660 edges
20,796 genes
23 clusters
4,600 motifs
TFBs detection is a powerful approach Biosynthesis, defense response, metabolism pathways
ATTED-II: A database of co-expressed genes and cis-elements for identifying co-regulated gene groups in Arabidopsis Construct a trans-factor and cis-element prediction database RMA 22,263 genes
300 co-expressed genes
ATTED-II is a useful trans-factor and cis-element prediction database Biosynthesis, defense response, metabolic pathways
Reverse engineering the Arabidopsis thaliana transcriptional network under changing environmental conditions To dissect the transcriptional control of Arabidopsis thaliana Robust multi-array average method, Gauss-Seidel methodInferGene algorithm 22,094 genes; 18,169 co-expressed genes High connectivity in terms of transcriptional regulations among cellular functions Response and adaptation to changing environment
Integrative gene-metabolite network with implemented causality deciphers informational fluxes of sulphur stress response To reconstruct an unbiased gene-metabolites network of correlations from transcripts and metabolite profiles PCC, trM, tMIN 541 nodes;
5,212 edges
Reconstruction of the gene-metabolite network with implemented causal directionality provides an extension of systems biology Stress response
Deciphering transcriptional and metabolic networks associated with lysine metabolism during Arabidopsis seed development Study the metabolic regulations system GC-MS (gas chromatography-mass spectometry), ANOVA 1,400 genes;
300 co-expressed genes;
4 clusters
This study explores novel metabolic and transcriptional network interactions associated with Lys metabolism and the behavior of these interactions Biosynthesis, stress response, seed development, metabolic pathways
Transcriptome analysis of haploid male gametophyte development in Arabidopsis Define the male haploid transcriptome throughout development Scatter-plot analysis, RT-PCR, EPCLUST (clustering analysis software) 22,591 genes;
13,977 co-expressed genes
The pollen is characterized by large-scale repression of early program genes Male development
Proximal-distal patterns of transcription factor gene expression during Arabidopsis root development Develop a high-throughput method of whole-mount in situ hybridization of a large number of genes MPSS 137 genes
112 co-expressed genes
The transcripts of the majority of root-expressed genes are present throughout the root tip but a smaller sum of genes are expressed in discrete cell type-specific patterns Seed development, germination, defense response, root development, stress, phytormone
Transcriptome analysis of sulfur depletion in Arabidopsis thaliana: Interlacing of biosynthesis pathways provides response specificity Create an array hybridization/transcript profiling method to analyze the temporal expression behavior (R), scatterplot analysis 7,200 genes The complex co-ordination of systemic responses to sulfur depletion is provided via integration of flavonoid, auxin and jasmonate pathways elements. Biosynthesis, stress response, biosynthesis, signaling pathways, resistance
New perspective on glutamine synthease in grasses Explore patterns of co-correlated metabolic pathways PCC 15,743 genes
100 co-expressed genes
Differences in co-expression between chloroplastic and cytosolic GS genes Metabolic pathways
Approaches for extracting practical information from gene co-expression networks in plant biology Describe co-expression network analysis PCC 22,263 genes Network analysis has provided an intuitive way to represent complex co-expression patterns between many genes Metabolic pathways
Identification of genes required for cellulose synthesis by regression analysis of public microarray data sets Reveal networks of genes involved in linked processes RMA 23,750 genes A number of genes are co-regulated to varying degrees with two distinct types of CESA (cellulose synthase) genes Cell wall synthesis, biosynthesis
Development and evaluation of an Arabidopsis whole genome Affymetrix probe array Describe the development of a high-density Arabidopsis whole gonome oligonucleotide probe array for expression analysis PCR 26,155 genes
12,985 co-expressed genes
60% of the genes showed evidence of expression 3-acetic acid traited seedling
Integration of transcriptomics and metabolomics for understanding of global responses to nutritional stresses in Arabidopsis thaliana Investigate the gene-to-metabolite networks regulating sulfur and nitrogen nutrition in Arabidopsis PCA (principal component analysis), BL-SOM (batch-learning self-organizing map), DNA array analysis, GC-MS 9,000 genes It was depicted a whole picture of regulation of stress plant metabolism Metabolic pathways, stress response
Use of natural variation reveals core genes in the transcriptome of iron-deficient Arabidopsis thaliana roots Identify a large number of genes regulated by Fe deficiency in roots of three Arabidopsis ecotypes RT-PCR, MAS5 algorithm 1,842 genes Demonstrates the use of natural variation to identify central iron-deficiency-regulated genes and identified genes with potential new roles in signaling during iron deficiency Seed development, metabolic pathways, stress response
Transcriptional profiling of the Arabidopsis iron deficiency response reveals conserved transition homeostasis networks Investigate the Fe deficiency syndrome according to gene co-expression RT-PCR, PCC 1,436 genes The authors attributed important roles for gene candidates that have not previously described function in the iron deficiency response Photosynthesis, metabolic pathways, biosynthesis, iron deficiency
Gene networks involved in drought stress response and tolerance Analyze the gene expression during the drought stress response in plants cDNA Microarray technology analysis 8,000 genes (A); 1,700 genes (R) Elucidate the functions of genes implicated in the stress response and/or tolerance Stress response, biosynthesis
New perspectives on glutamine synthetase in plants Search genes having patterns of expression similar to glutamine synthetase PCC 51,275 genes
100 co-expressed genes
New hypotheses have emerged from searching forco-expressed genes across multiple unfiltered experimental data sets Photosynthesis, metabolic pathways, stress response
Insights into the genomic nitrate response using genetics and the Sungear software system Identify genes that respond to nistrate when the production of downstream metabolites of nitrate is blocked PCR, Q-PCR analyses 1,596 genes
595 co-expressed
345 genes
260 co-expressed
Almost 10% of the detectable transcriptome responds rapidly to low nitrate and roots were more responsive than shoots under these conditions Nitrate assimilation, biosynthesis, metabolic pathways
Systems approach identifies an organic nitrogen-responsive gene network that is regulated by the master clock control gene CCA1 Identify gene networks whose expression is regulated by Glu Glu-derived metabolites in plants RT-qPCR, network model of plant gene interactions 5,904 genes The metabolic gene network provides molecular evidence for regulation of nitrogen-use at the level of gene expression Biosynthesis, metabolic pathways
Genomic analysis of a nutrient response in Arabidopsis reveals diverse expression patterns and novel metabolic and potential regulatory genes induced by nitrate Perform a microarray analysis of nitrate-induced gene expression RNA gel blot analysis 5,524 genes They found more than 15 new nitrate-responsive genes Metabolic pathways
Cell-specific nitrogen responses mediate developmental plasticity The authors used cellular profiling of five Arabidopsis root cell types in response to an influx of nitrogen to uncover cell-specific responses ANOVA 6,092 genes Coordinated developmental response in distinct cell types or tissues Metabolic pathways
Expression profiles of 10,422 genes at early stage of low nitrogen stress in rice assayed using a cDNA microarray Understand how plant genes respond to low nitrogen stress Microarray analysis 11,494 genes
471 co-expressed
Thois study helped to characterize the functional roles of the low nitrogen stress response gene in nitrogen metabolism Biotic and abiotic stress, photosynthesis, metabolic pathways
A Weighted gene correlation network analysis
B RMT is used for automatic threshold (signal-to-noise) identification
C Massively parallel signature sequencing (MPSS) provides a comprehensive assessment of gene expression by generating short sequence tags, each 20 bp long, produced from a defined position for each transcript
D Significance analysis of microarrays
E GeneChip robust multi-array average
F Pearson correlation coefficient
G Real time polymerase chain reaction
H Robust multi-array average
I Graphical Gaussian method
J Analysis of variance
K Bioinformatics-oriented hierarchical evolutionary learning
L mCODE is a cytoscape that finds clusters
M Transformed Pearson correlation coefficient
N Transformed mutual Information


Suggested citation: Scandizzo, P.L., & Imperiali, A. (2014). The existence and the socio-economic implications of genetic networks: A meta-analysis. AgBioForum, 17(1), 44-69. Available on the World Wide Web: http://www.agbioforum.org.
© 2014 AgBioForum | Design and support provided by Express Academic Services | Contact ABF: editor@agbioforum.org