October 1993 "Mining the Internet" column, The Computing Teacher

[Electronically reprinted with permission from The Computing Teacher journal, published by the International Society for Technology in Education.]

The information on this page is provided for archival purposes only. Most of the links that it contains have expired. More recent articles on similar topics can be found here: http://ccwf.cc.utexas.edu/~jbharris/Virtual-Architecture/Foundation/index.html .

Electronic "Packaging:" File Types for File Transfer

by Judi Harris

Have you ever thought about how many different types of packages are delivered by surface (non-electronic) mail? Among others, there are postcards, letters in small envelopes, longer documents in larger envelopes, small objects in padded mailers, larger objects in boxes stuffed with newspaper, and even larger objects in larger boxes cushioned with Styrofoam. All are carried by government-supported postal systems and various commercial delivery services on every business day.

Packages of electronic information (text, graphics, sound, and computer code) are similarly carried in different types of "containers" on the Internet. These electronic packages can travel attached to electronic mail messages, in response to a request for information from a Gopher, or as a result of an FTP file transfer command issued while visiting a public file archive. (For more information on Gopher tools, please see the August/September 1993 "Mining the Internet" column; for more information on FTP file transfer, see the December/January and February 1993 columns in the same series.) In order to be able to use the information included in these files, one must be able first to recognize the type of container in which it arrived, and then know how to "open" the container without damaging its contents.

Electronic Package Types

Think of an electronic mail message as a postcard. When you receive a postcard in your mailbox, all that you need to do to access the textual information that it contains is to read the writing. No special container must be dealt with before you can read the text. Although the process by which you access your electronic mail may not be as uncomplicated as picking up a postcard and reading it, remember that once you have launched the electronic mail program that you use, the information contained in the messages that you have received is directly accessible and (hopefully) comprehensible. The same is true with publicly posted newsgroup articles, electronic bulletin board messages, and computer conference items and responses; imagine postcards tacked to a large bulletin board in a public place.

Many of the publicly accessible files of information that are available on the Internet are "packaged" like postcards in the surface mail; they do not require any extra effort to open because the information that they contain is in a standard text format that is readable on any computer platform. This format is called ASCII text; ASCII is an acronym that stands for "American Standard Code for Information Interchange." Most of the files that are accessible via Gopher tools, and many files that are available for anonymous transfer from FTP archives exist in a standard text format, which, once transferred, require no special procedures to open and read. They can be viewed online or as a document displayed with a word processing program. Usually, files that exist in text format in archives on the Internet have filenames that end in one of the following extensions:

     .txt      (Example:  alice27a.txt)
     .text
     .doc
     .ascii
     .vox 

Now think about a letter that is delivered in an envelope. In order to access the information contained in the text of the letter, you must first open the envelope. If the envelope is made out of lightweight paper, you can use your fingers to extract the information from the package. But if the container is made of a thicker material, a cutting tool is probably needed to assist in opening the parcel. If there is packaging material that surrounds the object (information) that has been sent in the parcel, that, too, must be removed before the information can be viewed in a way that is understandable to you.

Binary Files

The files available in Internet archives that are not text files are considered to be binary files. Binary files can be specially formatted text files, such as those that contain Postscript printing commands (with filenames often ending in .ps), graphics files, sound files, video files, or software files. Some binary file filenames end in .bin. These are typically relatively small files that contain computer programs specific to a particular microcomputer type, such as the Macintosh. Smaller-sized software programs that run on IBM and IBM-compatible microcomputers are often stored in files whose names end in .exe. Graphics images that are contained in binary files have names that end with many different extensions, each indicating the format of the image. Some of the more common graphics file types include those whose names end with:

     .gif
     .tiff
     .jpeg
     .pict

Sound bytes that are contained in binary files have names that can end with .wav (WAVE files) or .voc (Soundblaster files). Compressed video sequences, such as those created with Macintosh Quicktime software, also have special filename extensions, such as .qt (although many have been encoded as .hqx files; see below). Sound, video, and graphic image files are relatively quite large.

The contents of binary files usually cannot be viewed online or seen in their entirety with a word processor. Instead, they must first be "unpackaged" with a special file transfer process or another piece of software that is specific to the file type. This can be as simple as printing a .ps file from a word processing program that has printer drivers for Postscript printers, or as complex as downloading and uncompressing software that is used to view .gif images, in addition to downloading the images themselves. In the same way that packages received by surface mail require different procedures to separate the items being sent from the packaging material included to protect those items while the parcel was in transit, binary files transferred via the Internet require different procedures to "unpackage" the information that they contain, making it available for our use.

Binary File Transfers

How are binary files transferred from public archives to our personal computers? A portion of the answer to that question depends upon the type of connection that you are using to access your Internet-based account. Since most readers at this point in time use modems connected to regular voice telephone lines, let's envision file transfer as a two-step process:

1. First, the file that you want to have must be transferred from the FTP archive, Gopher site, or electronic mail message attachment to your filespace in your Internet account,

2. and then it must be downloaded from your account space to your personal computer's hard drive or a floppy diskette.

For those of you lucky enough to have a direct connection (usually involving a unique I.P. address) from your personal computer to the Internet (usually via an Ethernet connection or other type of high-speed local area network), file transfer can occur in just one step, from the Internet source directly to your personal computer.

The default file transfer method enabled at most Internet-accessible sites is a method used for text files. Therefore, if you want to transfer any type of binary file from an Internet site, you must issue a command before the file transfer begins that will allow the file to be transferred as it exists at the site, rather than as a text file. (If a binary file is transferred as a text file, it will be unusable.)

Fortunately, it is simple to set the mode for file transfer to accommodate files stored in binary format. Once a non-interactive connection to an FTP archive has been established and a binary file that you would like to transfer has been located in a specific subdirectory path, before issuing the get command to obtain a copy of the file, send a binary or i command to switch the file transfer mode to binary. The remote site will respond when the command is sent to confirm the mode switch.

ftp> binary
200 Type set to I.

The mode will remain as binary for the rest of that session unless it is switched back to text mode with the ascii command.

Now that the file is stored in the proper format in your account space, it must be downloaded to your personal computer's hard drive or a floppy diskette before it can be read with your word processor and printed with a Postscript printer. How this process is accomplished depends upon the type of telecommunications software that you are using and the provisions for file transfer that your Internet account managers have made. Although you will have to consult the software manual for your telecommunications package, and perhaps the documentation supplied by the computer center for the specific procedures to follow for downloading, please remember that this second-step file transfer must also be completed in binary mode for the information in the file that you have retrieved to be viewable.

A Special Case: .hqx (BinHex) Files

As microcomputer software functions continue to evolve, an increasing number of packages support the creation of documents that combine text and graphics, or include specially formatted text (such as text written with different fonts, boldfacing, underlining, italics, or different colors, and text formatted like the text that you see on this page). These documents, though they appear to be primarily textual, must be transferred as binary files if the special formatting is to be conserved and if the graphic images are to be included. As you saw in the example above, this is not a problem if an FTP file transfer can be completed. But what if one person with an electronic mail account wants to send a binary file to another person with an electronic mail account? On some Email systems, files can be attached in their original forms to Email messages, then sent intact to addressees. But if this feature is not available to either of the correspondents, how might the direct exchange of a binary file take place?

The answer lies in a powerful piece of freeware (freely copyable and distributable software) developed by Yves Lempereur for the Macintosh called BinHex. This program takes Macintosh files created with word processors, desktop publishing programs, graphic generators, sound generators, etc. and encodes or encrypts them into files of text characters, which can then be sent via text-based services, such as electronic mail, or transferred to and from FTP archives in text mode. These text characters mean nothing to the human eye, as the following sample from a BinHexed file illustrates.

!%rm!#3%-J2S!!"2r!!N"$,rkr`$6r`!0#3br[(J+[ir#L*2r!!d*$+%4Jd!H-m$
)Nrm!$3N-U5BidlMmb@L6r`!0#3bj-Iiririm2j2r!!d*$,"Rrm%2rj`NNrm!$3N
-XCrrrRrra#66r`!0#3bbFrrahrrbIP2r!!d*$+6*rmH(rrQa8rm!$3N-Upcr(lR
rr-(6r`!0#3bhXaMrTRrrB02r!!d*$,FK`cqK2rqKdrm!$3N-X2c1[k#2rja6r`!
0#3bR"MkrZqIrcP2r!!d*$,airVqlmrrN8rm!$3N-ZF2q[lJ0rr66r`!0#3bhMrk
rNp,rmp2r!!d*$,EcrVqf5hrldrm!$3N-V"Rq[k`T2rh6r`!0#3bSK2krV-QIrG2
r!!d*$+[NrVqb'Frmdrm!$3N-SMEq[l2darl6r`!0#3bQ9[krZH6ArY2r!!d*$+E
QrVq`""[qdrm!$3N-U!Eq[k$6ZIl6r`!0#3bN4[krZC(crY2r!!d*$,I1rVqI8%c
qdrm!$3N-YJVq[jM6Q2l6r`!0#3b5@[krN!$6*[l6r`!0#3bD@[krN9l)IY2r!!d
*$*QbrVq4'E*qdrm!$3N-R+,q[j3bC2l6r`!0#3b5f[krQ1bEI02r!!d*$*0#rVq

But, once the full document is received and downloaded, it can then be decoded or unencrypted with the BinHex program to reveal its true nature.

Here's an example of a word processed document that incorporates graphics.

lease otice

...that all formatting, font styles, etc. are preserved, even though this file just "travelled" to your electronic mailbox via a text-only system.

Given its powerful functions, BinHex is a remarkably small program, available at many Internet sites as an uncompressed, plain binary file. Several of these archive addresses and the subdirectory paths for each that will lead you to copies of the BinHex software are:


     FTP Address              Subdirectory Path

     mac.archive.umich.edu    mac/util/compression
     ftp.cs.umn.edu           pub/mac/util/compression
     tamu.edu                 pub/archivers/mac
     nysernet.org             israel/software/macintosh

IMPORTANT: When transferring binary software files to a Macintosh personal computer, the MacBinary option in the telecommunications software must be selected.

Please note that many of the BinHexed files that are publicly available on the Internet were encrypted with version 4.0 of the software, rather than version 5.0, so the older version should be used first. Also, please be aware that some BinHexed files have additional, readable text inserted at the beginnings of the documents that can cause error messages to be generated by BinHex when it is asked to decode them. If that happens to you, load the BinHexed file that you are attempting to decode into a word processor, delete the readable text at the beginning of the file, leaving the rest of the file intact, and save the document with the same filename (ending in .hqx) in a "text only" file before attempting again to decrypt it with the BinHex program.

Encrypted vs. Compressed Files

The binary files that have already been discussed in this article can be considered to have been encrypted or encoded so that they can be transferred via Internet links. As you (hopefully) have seen, it is necessary to know how these files have been "packaged" to deduce how to "unwrap" them so that the information that they contain can be made comprehensible to us.

Many files that are publicly available on the Internet are too large to merely be encrypted before storing them at FTP archives or Gopher sites. Files that are this large would take up too much disk space at the archive site, and due to their size, would take much too long to be downloaded with most modems and telephone line connections.

This is why these larger files (which are often software files) are first compressed, or encoded so that they become smaller in size, before they are stored in Internet archives. These "smaller but heavier" electronic packages can then be transferred to your personal computer's hard drive or floppy diskette, but...you guessed it...the compressed file must then be uncompressed before the information in the file can be used.

Some compressed files can be downloaded (in binary format, of course) with "built-in" software that can be used to uncompress the file once it is on your personal computer's hard drive or floppy diskette. Names for files such as these typically end in .sea (which stands for self-extracting archive). Once downloaded, launching the program (using a Macintosh, this means double-clicking on the program icon) will produce step-by-step interactive instructions that will lead you through a simple uncompression sequence.

To uncompress most files retrieved from Internet archives, though, special additional software, also available via the Internet, must be used. The filename extensions (such as .zip, .tar, .sit, and .cpt) for these compressed files give clues as to which piece of software (usually also compressed) must be used to uncompress the file. These file types, how to find the software that can be used to uncompress them, and more will be addressed in the next "Mining the Internet" column.

Until then, dear readers, transfer away! ...but please wait to "open" some of your electronic packages until next month, when the many mysteries of file decompression will be revealed to you right here in The Computing Teacher.

[Judi Harris, jbharris@tenet.edu; Department of Curriculum and Instruction; 406 Education Building; University of Texas at Austin; Austin, TX 78712-1294.]

Other "Mining the Internet" columns are available on the Learning Resource Server at the College of Education, University of Illinois, Urbana-Champaign.