C & G web

Using Compression and Encoding

Whether you know it or not, you are already using compression and encoding. If you send files via email your email software probably compresses your file to a fraction of its original size, reducing the time it takes to send your messages and attachments. It is possible that it also encodes the file so that it is transported across the Internet without any problems.

The same is true when downloading a file from an FTP site or a website. The files may have been compressed to save you downloading time.

So how does compression and encoding work and how do you set about using it?

Let's look at compression first.

What is file compression?
There are two general types of compression:

Lossless file compression works by eliminating or minimising unnecessary data in a file, making your files smaller without losing any information.

Every character on your computer, every letter, digit and punctuation mark, is actually made up of several characters that make up computer code. A simple example of compression is this: If a set of characters "AAAADDDDDDD" represents a letter, one type of compression software can rewrite this as "4A7D", saving seven spaces and making that line 64% smaller. Compression software uses algorithms to do this.

Lossey file compression makes files smaller but loses data in the process. This is no use if you want to compress data where everything is important, such as a database of clients' addresses. But it is useful if you want to make images smaller and don't mind reducing the image quality slightly.

The GIF file format is a form of lossless compression. GIFs can use up to a maximum of 256 colours, which, while limiting in colour, are economical in size. GIFs are popular for creating graphics on websites as many can be used while maintaining a reasonable download time.

The JPEG file format is a form of lossey compression. JPEGs are particularly useful when preparing photographs for a web page, as they can work with 24 bit colour (16.7+ million colours) and still create file sizes suitable for the Web.


Question

Why would there be more JPEGs on a photographer's website than a cartoonist's website?
  • Cartoons are not serious enough
  • JPEGs will not lose any information so that the photographs look better
  • JPEG's lossey compression technique works better with photographs
  • GIF's small file sizes mean that more photographs can be displayed on the site

ASCII and Binary Files

You may also see the terms ASCII and Binary. At heart, there are only two types of file formats on the Internet.

ASCII (American Standard Code for Information Interchange) is used to establish a code so that every character of the alphabet will be stored in a unique way. ASCII files are text files that you can read from any word processor. Files with the extension .txt or .html are both essentially ASCII.

Binary files contain non-ASCII characters. If you display a binary file on your screen, the screen appears full of gobbledy-gook. Compressed files, images and a range of multimedia formats are essentially Binary.


Question

Which of the following are the two general file types on the Internet?
  • Binary and ASCII
  • SIT and ZIP
  • Binary and MIME

Compression Formats

A number of different programs can perform different types of compression. Some of these store files in archives:
Archives
An archive is a file that holds a number of compressed files. An archive can contain a single file or many files, which can be in many folders inside the archive. Each archive is a single document.

Most files available on ftp sites, bulletin boards and electronic services like CompuServe are distributed as archives.
There are two main benefits to using archives for electronic file distribution:
- Only one download is required to obtain all related files.
- File transfer time is minimised because the files in an archive are compressed.

Compression types vary between platforms:

PC
Files that are compressed on a Windows PC typically have names that end with .zip. The WinZip software most commonly produces these.

Let's look at one of the major formats in a little more detail.

ZIP files
Zipping is the act of packaging a set of files into a single file or archive called a zip file.

Several popular tools exist for zipping:

  • PKZIP in the DOS operating system.
  • WinZip and NetZIP in Windows.
  • MacZip for Macintosh.
  • Zip and UnZip in UNIX systems.
The result of zipping is a single file with a .zip suffix. Most zip files compress the included files.


Question

An archive is...
  • A Macintosh file format
  • A compressed file or group of files
  • A UNIX file format

Previous page...

Back to Top of Page

Computeach International Ltd

Christopher Ward London Limited

Christopher Ward London Limited