The World Wide Web can be seen as a directed graph in which each page is a node and each hyperlink between two pages is an edge. This graph can be naturally partitioned into different hosts: if we collapse all pages in the same host to a single node, keeping their links to pages in other hosts, we obtain a new graph called the Hostgraph . This graph is studied extensively in , in which hosts are grouped into country-code top-level domains; they observe geographical connections between the most linked countries.
We focus on the relationship between Web links and commercial trade, using several Web collections obtained using Web crawls between 2002 and 2004 . Table 1 shows the characteristics of them; for each collection, we show the number of external links to country-code top-level domains, excluding .com, .net, .org, .biz and other generic domains.
|Country||Pages||Number of external links|
We obtained the number of links to pages in other countries; they are shown in Figure 1. A good model for the number of links is an exponential distribution (with CDF ). The parameters for the fit in different countries are quite similar, with .
For the data about commercial trade, we use the Commodity Trade Database (COMTRADE) of the United Nations Statistics Division (available online at http://unstats.un.org/unsd/comtrade/). The distribution of the exports to other countries from our collection is shown in Figure 2.
In  the authors considered an unweighted version of this data, and used a graph considering that two countries are linked if their volume of trade is above a certain threshold. They found that this graph exhibit scale-free properties. In our case, except for the first few countries (roughly 10) at the beginning, which appear to follow a power-law, the behavior for most of the trade partners is roughly an exponential distribution with parameter for the exports, and for the imports, which means that in these countries the exports are slightly more diversified than imports. The variations in the parameter depend on the diversification of the trade of the country. Chile has the smaller diversification in this sample, and Spain the larger.
We found that there is a relationship between the number of links to pages in other countries and the amount of trade with those countries, as shown in Figure Figure 3. We include in the calculation only pairs of countries where the trade and link is more than of the total, as lower than that threshold, the data becomes very noisy. We also have removed 1 or 2 outliers from some graphs to improve the fit, they are marked with a cross in their graphs.
One explanation for this correlation is that the Web captures economical relationships, another is that the correlations observed are just recovering link and trade ``popularity'' of the countries, this is, the receivers of the larger amount of links and trade will always be the same countries, no matter which collection we are observing. To test this idea, we measured the correlation of the ranked lists of links of each country with the ranked lists of trade to every country in our collection. It is expected that a country's links will be more similar to that country's trade than to other countries. The results are mixed, as shown in Table 2.
Using the total number of links tends to generate less confusion (between different countries) than using the number of different sites. In the latter case, the sample of Brazil seems too small to give meaningful results. For Chile, Greece, and Spain, the results are better as the countries are more related to themselves than to other countries; this metric tends to put the United Kingdom and Spain closer to Greece than to themselves, this suggests that there might be a relationship between the ranked lists of trade partners of the U.K., Spain and Greece.
Preliminary results suggest that the ordering of trade partners is indeed strongly correlated to geographical distance and cultural ties, and we are currently analyzing this relationship. So far, our results show that the high-level structure of Web links among country-code domains is clearly related to commercial trade, and to the best of our knowledge this relationship had never been depicted in the past.
We worked with Vicente Lopez in the study of the Spanish Web, with Efthimis N. Efthimiadis in the study of the Greek Web, with Felipe Ortiz, Barbara Poblete and Felipe Saint-Jean in the studies of the Chilean Web and with Marco Modesto and Nivio Ziviani for obtaining the data collection of the Brazilian Web. Marcin Sydow provided valuable comments on a preliminary version of this paper.
We also thank the Laboratory of Web Algorithmics, Dipartimento di Scienze dell Informazione, Universita degli studi di Milano, http://law.dsi.unimi.it/ for making their Web collections available for research.
R. Baeza-Yates, C. Castillo, and
Characterization of national Web domains.
Technical report, Universitat Pompeu Fabra, July 2005.
 K. Bharat, B. W.
Chang, M. Henzinger, and M. Ruhl.
Who links to whom: Mining linkage between web sites.
In ICDM, pp. 51-58, San Jose, California, USA, 2001. IEEE CS.
 S. Dill, R. Kumar,
K. S. Mccurley, S. Rajagopalan, D. Sivakumar, and
Self-similarity in the web.
ACM Trans. Inter. Tech., 2(3):205-223, 2002.
 A. M. Serrano and
Topology of the world trade web.
Physical Review E, 68(1):015101+, July 2003.