textfiles/politics/SPUNK/sp000152.txt

255 lines
13 KiB
Plaintext

Anarchy Internet University, Fall Session '93 - Lesson 2
What you want is what you get is what you want
or,
I never Metadata I didn't like.
(An)Archie, Veronica, and Jughead--
You don't need a Weatherbee to know which way the wind blows
Remember at the end of the film Brazil, when Michael Palin was wearing that
funny mask and torturing Sam, all part of the Information Retrieval process?
You might wish that you could take to the internet with a good pair of
needlenose and wrest its riches out by force, but fortunately its physical
resources are so scattered that you'd probably do more damage to your PC or
work site (which in some cases wouldn't be that bad a thing :>) than convince
the net to fork over what you're looking for.
Net hacks have written tools that allow you to search metadata--data about data.
This is a key problem in the proliferation of network access points (the
internet is 'growing' at 12% per month): How do you keep track of such vast
volumes of data, text, pictures, sounds, movies, numbers? The task of
organizing such a heterogeneous mix of resources has never been attempted and
is such a complex task that it will never fully succeed :) but is an interesting
library research problem.
The tools that go out and graze the net use the various protocols. We already
talked about ftp, the File Transfer Protocol. There are many other protocols
on the internet that serve many different functions that the end user never
need know about--this message came to you courtesy of the SMTP: Simple Mail
Transfer Protocol (and the number 25). A handful of the protocols, though
are of worth a cursory familiarity if you want to optimize your time in front
of the screen.
PROTOCOL ACRONYM EXPANSION CLIENT WHAT IT DOES
-------------------------------------------------------------------------
FTP File Transfer ftp Get/Put files on remote site
Protocol Remote file system manipulation.
GOPHER Gopher Information Server gopher Browse menu heirarchy and
Protocol xgopher retrieve data. Character based,
Not an acronym :) graphics by separate program.
NNTP Network News Transport nn,rn Read/Post news articles.
Protocol
WAIS Wide Area Information waissearch Search and retrieve documents
Server xwais Networked database access.
HTTP HyperText Transport xmosaic Browse and search networked
Protocol cello hypertext using above protocols.
midas Unix X, Mac, Windows clients.
viola incorporates support for movies,
lynx images, sound, point and click
tkWWW graphic interface. slick.
Each protocol has its own meta-data search method. Computer geeks have
taken the name for the ftp meta-data search system, archie, and extended it
to include gopher's search system, veronica, and just last week I heard
announced a meta-data knowbot for HTTP, jughead. Remember what Anarchy said:
"You don't need a Weatherbee to know which way the wind blows."
ARCHIE - FTP
The most second commonly used protocol on the internet is FTP. In the last
lesson, we talked about how to log onto a ftp site and retrieve a file.
Unless you have a network of friends who constantly use the most popular
protocol on the internet SMTP to keep you up to date on whats new out there,
you have to have ways to ftp what you want. Archie is the tool you use to
do this. Its easy. Just type:
% archie country-codes
You get back a set of citations that include ftp sites, pathnames and
filenames. I've edited this for space. The real query has many more
hits, but I've left some to show you how the country codes distribute
in an archie query., For the first cite, the host is plaza.aarnet.edu.au.
The pathname is /usrnet/FAQs/alt.answers/mail and the filename is
country-codes. (FAQ = Frequently Asked Questions) Read for stuff you
are interested in and you won't bother people with the most common questions.
Host plaza.aarnet.edu.au AUSTRALIA -too far away
Location: /usenet/FAQs/alt.answers/mail
FILE -r--r--r-- 19681 Oct 13 02:10 country-codes
Host rzsun2.informatik.uni-hamburg.de GERMANY -too far away
Location: /pub/doc/news.answers/mail
FILE -rw-r--r-- 18947 Oct 13 10:27 country-codes
Host bloom-picayune.mit.edu MIT - a good, fast bet
Location: /pub/usenet-by-group/alt.answers/mail
FILE -rw-rw-r-- 19681 Oct 13 02:10 country-codes
Host charon.mit.edu Mega MIT server - may be busy but fast
Location: /pub/usenet-by-group/alt.answers/mail
FILE -rw-rw-r-- 19612 Sep 1 06:30 country-codes
Host sunsite.unc.edu Mega Univ of NC server - fast and busy
Location: /pub/docs/about-the-net
FILE -rw-r--r-- 20137 Jun 3 15:40 country-codes
Host grasp1.univ-lyon1.fr FRANCE - blew '68 so why ftp from them?
Location: /pub/faq-by-newsgroup/alt/alt.internet.services/mail
FILE -rw-r--r-- 19560 Sep 1 05:01 country-codes
Host han.hana.nm.kr KR? check out the country-codes for info
Location: /netinfo/sh.cs.net
FILE -rw-r--r-- 16442 Jan 14 1993 country-codes
Host nnsc.nsf.net .net = network support site
Location: /info
FILE -rw-rw-r-- 17455 Oct 20 1992 country-codes
Host svin02.info.win.tue.nl NETHERLANDS - slow link
Location: /pub/usenet/news.answers/mail
FILE -rw-r--r-- 19652 Sep 1 02:00 country-codes
Host ugle.unit.no NORWAY - exotic, but slow link
Location: /faq/comp.answers/mail
FILE -rw-rw-r-- 19633 Sep 1 05:04 country-codes
Usually the information in the citation helps you decide if its worth your
time to go check it out you get the file size in bytes (characters) the
last modification data and its location. You need to check out the location
of the site too. Although the network is fast, you are limited in the speed
of your search by the slowest link in the virtual circuit between you and the
ftp site you are on. Usually, the closer the ftp site to you physically the
faster the transfer and response will be faster, all things being equal. If
I can get something from Berkeley or MIT instead of from Finland, New Zealand
or Taiwan, I'll get it from the backbone instead of from the spines. Anyway,
if you want to find out the country codes so you don't ftp the otherside of
the world, this archie search will do it for you:
All you have to do is
1) ftp to the ftp site,
2) cd to the pathname,
3) get file.
VERONICA - GOPHER
Gopher is a menu-driven networked information retrieval system developed at
the University of Minnesota. I never read the manual on gopher and it was
just totally intuitive to use if you have ever used a computer for anything.
All you have to do is hit return and use the up/down arrow keys, pgup and pgdn
and the 'u' key to go up levels. Otherwise follow instructions and you can't
go wrong. Have fun in 'gopherspace.'
The Veronica server is located off of the root gopher at gopher.umich.edu.
All you have to do is:
1) run gopher. Typing gopher automatically connects to the root gopher
server at the Univ of Minn: gopher.tc.umn.edu. You can also connect to
any other gopher server by typing '% gopher host.id.domain' where
host.id.domain is the real id for the system you are interested in.
% gopher
2) select menu item 8. Other Gopher and Information Servers/
slink down the menu with the down arrow key till you get to number
8 and then hit return. You can also type 8 and then return.
(names ending with a / mean that there is another gopher level
available when you hit return for that item. names without a /
are terminal, in that they point to a resource.)
3) select item 2. Search titles in Gopherspace using veronica/.
4) You will get a list of available search methods. You can experiment
to find which works better for you. There are some search methods that
return results that are 'protected' and you can't read. But you won't
know until you try :>. All of these menu items (ending with a <?>)
are searchable indexes. If you hit return on them, you will be prompted
for a search term. When you enter it, blammo, a new gopher level is
created for you of all gopher items that matched your query.
WAIS
WAIS is a searching protocol based on the NISO Z39.50 Information Retrieval
standard. It currently exists as a networked set of database servers that
can register with central sites. You search by querying the central site (a
centralization that cannot continue for long if the WAIS server community
continues to grow as it has, but is now quake.think.com) and it returns a set
of servers that you would use in a second query, that would return documents
or referrals to other sources. It, like all of the protocols discussed, is
a client-server system, which means that there are clients that operate on
many different platforms, PC's, Windows, Mac's, probably VMS as well as
several UNIX X front-ends.
Most all of this software is freely distributed in source form (this means
that you have the human-readable program) which means that anyone who knows
the programming language can alter it to suit their needs. This public domain
software is cool because its free, its a standard that no corporation has
control over and the most popular serious operating system, UNIX, is
practically free in source form. The only catch is that you cannot use the
code for profit, unless you pay a hefty fee, but no one ever said that
undermining the state was a profit-motivated endeavor, at least as far as
the law is concerned :).
You can ftp to think.com and wais.com for information and sources for WAIS
servers and clients. If you are going to use WAIS to put up content that
is non-profit and contributes to making the state an endangered species, drop
me an e-mail and I can provide some pro-bono consulting. You basically take
your text (and associated images or sounds and stuff) index it through your
own database system or one that comes with WAIS. You then set up a WAIS
server.
When someone asks a question of a WAIS client it connects to a server via a
special network place (a port) that the WAIS server has been patiently listening
to. The server spins off a copy of itself to handle the search and returns
to its regimen of listening. The handler process then searches the database,
and returns a set of matches, or hits that include a direct key that contains
an unique identifier so you have all the data you need to snarf the data file
in the next transaction.
This way the server can be STATELESS in that it needn't remember any previous
transaction because each transaction contains enough data to completely
describe the query. The client puts the hit list on the screen, the user then
selects some of the hits to retrieve. The client whisks the direct-access
key off to the server, which returns the requested data. This whole
transaction session is described in formal internet protocols so that any
host that has a Z39.50 server can be queried by a user on any computer that
has a Z39.50 client.
JUGHEAD - HTTP
This package has just been announced and it probably won't be functional for
several months yet. Today someone mentioned a jughead for gopher, so its up
in the air.
The implementation of HTTP, with clients and servers and a hypertext scripting
markup language makes up the World Wide Web, or WWW, www, W3 or just the web.
The cool part is that you can edit a page in vi, entering HTML, the hypertext
markup language, descriptions of a page, that contains 'hot spot' links that
can point to any resource available on the network. One document on anarchy,
for example, can point to resources--sound, video, images, text, data retrieval
anything--around the world that make a kind of exhibit that can be accessed
by anyone else on the network so equipped. The browsers all incorporate
support for ftp, wais and gopher into their interface, so users need only
learn the HTTP client.
The so equipped part is the problem right now. Most of these clients are
networked applications, in that they do more than put characters up on a
terminal screen like a kermit or procomm type connection does. They
interoperate with special servers that run graphics screens, so you have to
have a network connection. This kind of network connections transmit so much
data (high bandwidth) that most people can't afford it. There are text-based
browsers that peruse this hypertext jungle, but they don't have the glitz of
the X (unix window system) implementation. Soon enough we will most all have
access to high-speed networking, either by the local library or at home, so
if you're into this, it shouldn't be a problem.
No test afterward. Go out, drink a beer and netnavigate the state away.
Coming soon--encryption.