255 lines
13 KiB
Plaintext
255 lines
13 KiB
Plaintext
Anarchy Internet University, Fall Session '93 - Lesson 2
|
|
|
|
What you want is what you get is what you want
|
|
or,
|
|
I never Metadata I didn't like.
|
|
|
|
(An)Archie, Veronica, and Jughead--
|
|
You don't need a Weatherbee to know which way the wind blows
|
|
|
|
Remember at the end of the film Brazil, when Michael Palin was wearing that
|
|
funny mask and torturing Sam, all part of the Information Retrieval process?
|
|
You might wish that you could take to the internet with a good pair of
|
|
needlenose and wrest its riches out by force, but fortunately its physical
|
|
resources are so scattered that you'd probably do more damage to your PC or
|
|
work site (which in some cases wouldn't be that bad a thing :>) than convince
|
|
the net to fork over what you're looking for.
|
|
|
|
Net hacks have written tools that allow you to search metadata--data about data.
|
|
This is a key problem in the proliferation of network access points (the
|
|
internet is 'growing' at 12% per month): How do you keep track of such vast
|
|
volumes of data, text, pictures, sounds, movies, numbers? The task of
|
|
organizing such a heterogeneous mix of resources has never been attempted and
|
|
is such a complex task that it will never fully succeed :) but is an interesting
|
|
library research problem.
|
|
|
|
The tools that go out and graze the net use the various protocols. We already
|
|
talked about ftp, the File Transfer Protocol. There are many other protocols
|
|
on the internet that serve many different functions that the end user never
|
|
need know about--this message came to you courtesy of the SMTP: Simple Mail
|
|
Transfer Protocol (and the number 25). A handful of the protocols, though
|
|
are of worth a cursory familiarity if you want to optimize your time in front
|
|
of the screen.
|
|
|
|
PROTOCOL ACRONYM EXPANSION CLIENT WHAT IT DOES
|
|
-------------------------------------------------------------------------
|
|
FTP File Transfer ftp Get/Put files on remote site
|
|
Protocol Remote file system manipulation.
|
|
|
|
GOPHER Gopher Information Server gopher Browse menu heirarchy and
|
|
Protocol xgopher retrieve data. Character based,
|
|
Not an acronym :) graphics by separate program.
|
|
|
|
NNTP Network News Transport nn,rn Read/Post news articles.
|
|
Protocol
|
|
|
|
WAIS Wide Area Information waissearch Search and retrieve documents
|
|
Server xwais Networked database access.
|
|
|
|
HTTP HyperText Transport xmosaic Browse and search networked
|
|
Protocol cello hypertext using above protocols.
|
|
midas Unix X, Mac, Windows clients.
|
|
viola incorporates support for movies,
|
|
lynx images, sound, point and click
|
|
tkWWW graphic interface. slick.
|
|
|
|
Each protocol has its own meta-data search method. Computer geeks have
|
|
taken the name for the ftp meta-data search system, archie, and extended it
|
|
to include gopher's search system, veronica, and just last week I heard
|
|
announced a meta-data knowbot for HTTP, jughead. Remember what Anarchy said:
|
|
"You don't need a Weatherbee to know which way the wind blows."
|
|
|
|
ARCHIE - FTP
|
|
|
|
The most second commonly used protocol on the internet is FTP. In the last
|
|
lesson, we talked about how to log onto a ftp site and retrieve a file.
|
|
Unless you have a network of friends who constantly use the most popular
|
|
protocol on the internet SMTP to keep you up to date on whats new out there,
|
|
you have to have ways to ftp what you want. Archie is the tool you use to
|
|
do this. Its easy. Just type:
|
|
|
|
% archie country-codes
|
|
|
|
You get back a set of citations that include ftp sites, pathnames and
|
|
filenames. I've edited this for space. The real query has many more
|
|
hits, but I've left some to show you how the country codes distribute
|
|
in an archie query., For the first cite, the host is plaza.aarnet.edu.au.
|
|
The pathname is /usrnet/FAQs/alt.answers/mail and the filename is
|
|
country-codes. (FAQ = Frequently Asked Questions) Read for stuff you
|
|
are interested in and you won't bother people with the most common questions.
|
|
|
|
Host plaza.aarnet.edu.au AUSTRALIA -too far away
|
|
Location: /usenet/FAQs/alt.answers/mail
|
|
FILE -r--r--r-- 19681 Oct 13 02:10 country-codes
|
|
|
|
Host rzsun2.informatik.uni-hamburg.de GERMANY -too far away
|
|
Location: /pub/doc/news.answers/mail
|
|
FILE -rw-r--r-- 18947 Oct 13 10:27 country-codes
|
|
|
|
Host bloom-picayune.mit.edu MIT - a good, fast bet
|
|
Location: /pub/usenet-by-group/alt.answers/mail
|
|
FILE -rw-rw-r-- 19681 Oct 13 02:10 country-codes
|
|
|
|
Host charon.mit.edu Mega MIT server - may be busy but fast
|
|
Location: /pub/usenet-by-group/alt.answers/mail
|
|
FILE -rw-rw-r-- 19612 Sep 1 06:30 country-codes
|
|
|
|
Host sunsite.unc.edu Mega Univ of NC server - fast and busy
|
|
Location: /pub/docs/about-the-net
|
|
FILE -rw-r--r-- 20137 Jun 3 15:40 country-codes
|
|
|
|
Host grasp1.univ-lyon1.fr FRANCE - blew '68 so why ftp from them?
|
|
Location: /pub/faq-by-newsgroup/alt/alt.internet.services/mail
|
|
FILE -rw-r--r-- 19560 Sep 1 05:01 country-codes
|
|
|
|
Host han.hana.nm.kr KR? check out the country-codes for info
|
|
Location: /netinfo/sh.cs.net
|
|
FILE -rw-r--r-- 16442 Jan 14 1993 country-codes
|
|
|
|
Host nnsc.nsf.net .net = network support site
|
|
Location: /info
|
|
FILE -rw-rw-r-- 17455 Oct 20 1992 country-codes
|
|
|
|
Host svin02.info.win.tue.nl NETHERLANDS - slow link
|
|
Location: /pub/usenet/news.answers/mail
|
|
FILE -rw-r--r-- 19652 Sep 1 02:00 country-codes
|
|
|
|
Host ugle.unit.no NORWAY - exotic, but slow link
|
|
Location: /faq/comp.answers/mail
|
|
FILE -rw-rw-r-- 19633 Sep 1 05:04 country-codes
|
|
|
|
Usually the information in the citation helps you decide if its worth your
|
|
time to go check it out you get the file size in bytes (characters) the
|
|
last modification data and its location. You need to check out the location
|
|
of the site too. Although the network is fast, you are limited in the speed
|
|
of your search by the slowest link in the virtual circuit between you and the
|
|
ftp site you are on. Usually, the closer the ftp site to you physically the
|
|
faster the transfer and response will be faster, all things being equal. If
|
|
I can get something from Berkeley or MIT instead of from Finland, New Zealand
|
|
or Taiwan, I'll get it from the backbone instead of from the spines. Anyway,
|
|
if you want to find out the country codes so you don't ftp the otherside of
|
|
the world, this archie search will do it for you:
|
|
|
|
All you have to do is
|
|
|
|
1) ftp to the ftp site,
|
|
2) cd to the pathname,
|
|
3) get file.
|
|
|
|
VERONICA - GOPHER
|
|
|
|
Gopher is a menu-driven networked information retrieval system developed at
|
|
the University of Minnesota. I never read the manual on gopher and it was
|
|
just totally intuitive to use if you have ever used a computer for anything.
|
|
All you have to do is hit return and use the up/down arrow keys, pgup and pgdn
|
|
and the 'u' key to go up levels. Otherwise follow instructions and you can't
|
|
go wrong. Have fun in 'gopherspace.'
|
|
|
|
The Veronica server is located off of the root gopher at gopher.umich.edu.
|
|
All you have to do is:
|
|
|
|
1) run gopher. Typing gopher automatically connects to the root gopher
|
|
server at the Univ of Minn: gopher.tc.umn.edu. You can also connect to
|
|
any other gopher server by typing '% gopher host.id.domain' where
|
|
host.id.domain is the real id for the system you are interested in.
|
|
|
|
% gopher
|
|
|
|
2) select menu item 8. Other Gopher and Information Servers/
|
|
slink down the menu with the down arrow key till you get to number
|
|
8 and then hit return. You can also type 8 and then return.
|
|
(names ending with a / mean that there is another gopher level
|
|
available when you hit return for that item. names without a /
|
|
are terminal, in that they point to a resource.)
|
|
|
|
3) select item 2. Search titles in Gopherspace using veronica/.
|
|
|
|
4) You will get a list of available search methods. You can experiment
|
|
to find which works better for you. There are some search methods that
|
|
return results that are 'protected' and you can't read. But you won't
|
|
know until you try :>. All of these menu items (ending with a <?>)
|
|
are searchable indexes. If you hit return on them, you will be prompted
|
|
for a search term. When you enter it, blammo, a new gopher level is
|
|
created for you of all gopher items that matched your query.
|
|
|
|
WAIS
|
|
|
|
WAIS is a searching protocol based on the NISO Z39.50 Information Retrieval
|
|
standard. It currently exists as a networked set of database servers that
|
|
can register with central sites. You search by querying the central site (a
|
|
centralization that cannot continue for long if the WAIS server community
|
|
continues to grow as it has, but is now quake.think.com) and it returns a set
|
|
of servers that you would use in a second query, that would return documents
|
|
or referrals to other sources. It, like all of the protocols discussed, is
|
|
a client-server system, which means that there are clients that operate on
|
|
many different platforms, PC's, Windows, Mac's, probably VMS as well as
|
|
several UNIX X front-ends.
|
|
|
|
Most all of this software is freely distributed in source form (this means
|
|
that you have the human-readable program) which means that anyone who knows
|
|
the programming language can alter it to suit their needs. This public domain
|
|
software is cool because its free, its a standard that no corporation has
|
|
control over and the most popular serious operating system, UNIX, is
|
|
practically free in source form. The only catch is that you cannot use the
|
|
code for profit, unless you pay a hefty fee, but no one ever said that
|
|
undermining the state was a profit-motivated endeavor, at least as far as
|
|
the law is concerned :).
|
|
|
|
You can ftp to think.com and wais.com for information and sources for WAIS
|
|
servers and clients. If you are going to use WAIS to put up content that
|
|
is non-profit and contributes to making the state an endangered species, drop
|
|
me an e-mail and I can provide some pro-bono consulting. You basically take
|
|
your text (and associated images or sounds and stuff) index it through your
|
|
own database system or one that comes with WAIS. You then set up a WAIS
|
|
server.
|
|
|
|
When someone asks a question of a WAIS client it connects to a server via a
|
|
special network place (a port) that the WAIS server has been patiently listening
|
|
to. The server spins off a copy of itself to handle the search and returns
|
|
to its regimen of listening. The handler process then searches the database,
|
|
and returns a set of matches, or hits that include a direct key that contains
|
|
an unique identifier so you have all the data you need to snarf the data file
|
|
in the next transaction.
|
|
|
|
This way the server can be STATELESS in that it needn't remember any previous
|
|
transaction because each transaction contains enough data to completely
|
|
describe the query. The client puts the hit list on the screen, the user then
|
|
selects some of the hits to retrieve. The client whisks the direct-access
|
|
key off to the server, which returns the requested data. This whole
|
|
transaction session is described in formal internet protocols so that any
|
|
host that has a Z39.50 server can be queried by a user on any computer that
|
|
has a Z39.50 client.
|
|
|
|
JUGHEAD - HTTP
|
|
|
|
This package has just been announced and it probably won't be functional for
|
|
several months yet. Today someone mentioned a jughead for gopher, so its up
|
|
in the air.
|
|
|
|
The implementation of HTTP, with clients and servers and a hypertext scripting
|
|
markup language makes up the World Wide Web, or WWW, www, W3 or just the web.
|
|
The cool part is that you can edit a page in vi, entering HTML, the hypertext
|
|
markup language, descriptions of a page, that contains 'hot spot' links that
|
|
can point to any resource available on the network. One document on anarchy,
|
|
for example, can point to resources--sound, video, images, text, data retrieval
|
|
anything--around the world that make a kind of exhibit that can be accessed
|
|
by anyone else on the network so equipped. The browsers all incorporate
|
|
support for ftp, wais and gopher into their interface, so users need only
|
|
learn the HTTP client.
|
|
|
|
The so equipped part is the problem right now. Most of these clients are
|
|
networked applications, in that they do more than put characters up on a
|
|
terminal screen like a kermit or procomm type connection does. They
|
|
interoperate with special servers that run graphics screens, so you have to
|
|
have a network connection. This kind of network connections transmit so much
|
|
data (high bandwidth) that most people can't afford it. There are text-based
|
|
browsers that peruse this hypertext jungle, but they don't have the glitz of
|
|
the X (unix window system) implementation. Soon enough we will most all have
|
|
access to high-speed networking, either by the local library or at home, so
|
|
if you're into this, it shouldn't be a problem.
|
|
|
|
No test afterward. Go out, drink a beer and netnavigate the state away.
|
|
|
|
Coming soon--encryption.
|
|
|