Two small sample files (about 2200 web sites total) were mapped to the "cybergrid" using a test version of cybermap.aml that uses only the top-level domain (TLD) and 2 characters form the second-level domain of each URL. Preliminary results revealed that cyberspace is still mainly "ocean" (empty space), scattered with about 210 "islands" (2-letter TLDs), and 7 "continents" (3-letter TLDs). Of the 7 continents, one (.int) appears to be virtually uninhabited, while another (.com) looks seriously overcrowded. A cyberspace map may prove to be a convenient method of depicting the rapid changes occuring in this newborn universe, especially in these first few years after the "big (cyber)bang".
The ability to spatially perceive the real world does not help us navigate the Web for one simple reason: In cyberspace, there is no coordinate system.
The Office of Mines and Minerals at the Illinois Department of Natural Resources maintains a library of links to Internet resources on GIS, GPS, geology, the mining industry, and environmental regulations. Early attempts to categorize web sites by content were frustrated by the fuzzily defined topic boundaries and sites that defiantly refused to be pigeonholed into a specific category. A need was perceived for an objective arrangement that could precisely map each Internet address without ambiguity.
The current project is merely one of several approaches to the cybermapping cartographic challenge. Some efforts, such as Matrix Information and Directory Services, project cyberspace to real space, showing servers and backbones as points and lines existing in latitude and longitude. Others, such as Boardwatch Magazine, map the Internet by tracing its connections, emphasizing topological continuity more than precise location. An excellent, link-filled compendium of cybergeography research can be found in Martin Dodge's Atlas of Cyberspaces at the Centre for Advanced Spatial Analysis at University College London.
The pages in the LRD Web-site library are organized by URL fields and arranged alphabetically within these fields. The general format of the organization scheme is:
http://h.s.m.t/d/f where t = top-level domain m = main domain s = subdomain(s) h = host d = directory(ies) f = fileFields m and t are present in every URL. If h and f are present, they occur only once. s and d may be present multiple times. The URLs are alphabetized by fields in the following order:
t -> m -> s -> h -> d -> f ->
One way to create a grid for plotting URL points would be to use IP numbers, which are unique to each host. This would be less complicated for two reasons:
if [ $# -lt 2 ] then echo 'Usage: mapsites <infile> <outfile>' else echo Generating coordinate file $2 from URL list $1... grep -i 'http://' $1 | sed 's@http://@{@' > xx00 cut -d'{' -f2 xx00 | cut -d'/' -f1 | tr '[A-Z]' '[a-z]' > xx01 grep '\.' xx01 | grep '[a-z]' | sort -u | cut -d: -f1 > xx02 cut -d'.' -f1 xx02 > xf1 cut -d'.' -f2 xx02 > xf2 cut -d'.' -f3 xx02 > xf3 cut -d'.' -f4 xx02 > xf4 cut -d'.' -f5 xx02 > xf5 cut -d'.' -f6 xx02 > xf6 cut -d'.' -f7 xx02 > xf7 paste -d@ xf7 xf6 xf5 xf4 xf3 xf2 xf1 | tr -s '@' '@' > xx03 cut -d@ -f2-3 xx03 | od -tu1 | cut -c9-80 > xx04 sed 's/010/@/g' xx04 | sed 's/064/Z/g'> xx05 paste -d' ' -s xx05 | tr '@' '\012' | sed 's/^ //' | nl > $2 echo Your coordinate file is $2. fi
/* cybermap.aml: Converts decimal dump of URL list to X-Y coordinate file /* Revised 03 April 1998 /* &args infile outfile &if [null %infile%] or [null %outfile%] &then &do &type 'Usage: &r cybermap <infile> <outfile>' &return &end &else /* /* Open input and output files, and read input file: /* &s openin [open %infile% openinstat -read] &s openout [open %outfile% openoutstat -write] &s line [read %openin% readstatus] &do &while %readstatus% = 0 /* /* Extract elements from current line and process them: /* &s id [extract 1 [unquote [before %line% Z]]] &s n1 [extract 2 [unquote [before %line% Z]]] &s n2 [extract 3 [unquote [before %line% Z]]] &s n3 [extract 4 [unquote [before %line% Z]]] &s n4 [extract 1 [unquote [after %line% Z]]] &s n5 [extract 2 [unquote [after %line% Z]]] &s n6 [extract 3 [unquote [after %line% Z]]] &s n7 [extract 4 [unquote [after %line% Z]]] &s x1 [calc ( %n1% - 110 ) * 100] &s y0 [calc ( %n2% - 110 ) * 100] &if [null %n3%] &then &s y1 %y0% &else &s y1 [calc %y0% + ( %n2% - 50 )] &s x2 [calc ( %n4% - 110 ) * 10] &s y2 [calc ( %n5% - 110 ) * 10] &if ^ [null %n7%] &then &do &s x3 [calc %n6% - 110 ] &s y3 [calc %n7% - 110 ] &end &else &do &s x3 0 &s y3 0 &end &s x [calc %x1% + %x2% + %x3%] &s y [calc %y1% + %y2% + %y3%] /* /* Write record to output file, and read next line: /* &s record %id% %x% %y% &s write [write %openout% [quote %record%]] &s line [read %openin% readstatus] /* /* Close files and exit /* &end &s closein [close %openin%] &s closeout [close %openout%] &type Your output file is %outfile%. &return
Arc: generate ccmap Generate: input genxy Generate: points Creating points with coordinates loaded from genxy Generate: q Externalling BND and TIC...
Arc: build ccmap point Arc: additem ccmap.pat ccmap.pat URL 40 40 c Arc: additem ccmap.pat ccmap.pat TLD 3 3 c
The figure above shows the TLD-based cyberspace grid with two point coverages plotted. The cyan coverage -- representing 1600 randomly selected sites -- shows the high density clustering of sites in some of the 3-letter domains, especially .com (left center), .edu (bottom left), and .net (bottom center). The 600 red points were collected primarily from four sites known to have links to a large number of top-level domains:
The figure below is an ArcView display of URL text for numerous sites in the northwest part of the map. Arcview can append a field from a TLD table to the active theme table, so that when a point is clicked on, the text in the "Identify Results" box lists the country associated with the TLD. The selected element (shown in yellow) is identified as a web site in Azerbaijan.