About File Formats
File Format Basics
As we move further along with the transition from paper-based media to electronic media, the sharing of electronic resources becomes increasingly important. Unfortunately, there is no single standard for all electronic files to ensure this process. In fact, the number of different electronic formats is quite large. In many cases this is due to a necessity for different file types to contain styles of information not supported by other formats. In some cases it is a matter software companies seeking competitive advantage. For users needing to share electronic files it can sometimes be a challenge to move these "inter-application," or between different software, and "cross-platform," from a PC to a Mac for instance. A newer challenge is moving from proprietary formats to the Web, which is platform independent through the use of HTML.
Often the request to "just send me and electronic file" can be almost meaningless. There may be numerous electronic files that the requestor would not even be able to open. This is particularly true in the graphics and print production field where highly specialized software applications may require proprietary formats to support features not available in common applications. The classic example of this is page layout software. QuarkXpress is the industry standard for building and producing publications. But its feature set is so specialized that there is no way to save a Quark file and send it to someone so that they can open it with Microsoft Word. Another common example is the difference between Corel WordPerfect and Microsoft Word formats. Although each tries to be somewhat compatible with the other through a conversion process, there are still enough differences to make absolute conversion unreliable.
More elaborate ways of achieving file exchange are often required. These usually involve trade-offs and limitations. There are stand-alone file translation utilities that can be very useful if you exchange files with other users with different applications or platforms. For most text and few graphic conversions these include software such as DataViz MacLink Plus for Macintosh users and DataViz Conversion Plus for PC users. For graphical formats, Adobe Photoshop has the ability to read and convert a wide variety of raster formats on either platform. Vector graphic format conversions tend to be more problematic due to the complexity of the format. For PC users one solution is IMSI HiJaak Pro.
Fortunately, there are some common formats that are supported by most the major software applications. We'll discuss the categories related to graphics and text below. These all enable some degree of cross platform or inter-application compatibility and are the most useful for our purposes.
Graphic Formats
Graphic files can be created and saved using two completely different methods. These are called "vector" and "raster" and may exist either singularly or together as part of the same file format.Vector is method where graphics are created and stored as "objects" using coordinate geometry. Each object has attributes that govern how it will display. Vector images are usually comprised of numerous objects combined to collectively portray the overall image.
One advantage of vector graphics is that they exhibit something called "device independent resolution." This simply means that they will always display or print at the highest-possible resolution since the image is transmitted mathematically to whatever display device is used. It also means that they can be scaled and modified in ways that will always preserve the highest image quality. Fonts are one example of vector graphics. Both Postscript Type 1 and TrueType fonts use vectors, also called outlines, to achieve the best possible display, at any point-size, to any printer. Adobe Illustrator is an example of a vector-based software application. There are numerous others and these are often referred to as "drawing" applications -- as opposed to raster software which is generally referred to as "painting" applications.
Raster graphics are created and stored using "pixels" to describe the image. Like the pieces of a jigsaw puzzle, pixels are the individual bits, or dots that collectively comprise the larger image. Each pixel has characteristics that affect how the image will look. These characteristics determine things such as whether the image is color, grayscale or line-art, or the amount of the resolution. Pixels are how scanners, computer monitors and digital cameras record and display information. Ultimately, even vector graphics get converted to raster images when they are viewed on or printed. This process is called "rasterization." The device independence of vectors ensures that this will always happen at the highest pixel resolution.
One strong advantage of raster graphics is that they are much better suited to depicting photographic or highly detailed images. A significant disadvantage is that raster images are much more dependent on resolution, which is a factor of their physical size often expressed in dots-per-inch (dpi), and they generally cannot be resized larger without diminishing their quality. Another disadvantage is that very large file sizes are required for high-resolution applications such as printing. Adobe Photoshop is an example of a raster-based software application.
A note about extensions: file extensions such as the old DOS style suffixes are not required for Macintosh users but are still very helpful for PC users. It is highly recommended that all users should apply this convention in order to aid file compatibility.
Postscript (PS or .ps)
Adobe Postscript is a page description, or graphic imaging language. It is the primary
technology that enabled and advanced the desktop publishing revolution. It is basically
an ASCII text file that contains coded program language that instructs a graphic interpreter,
such as a Postscript enabled printer, how to create an image. Whether it's page of
type, a vector drawing, a raster image, or any combination thereof, it can be described
by Postscript. Very few applications save files in pure Postscript format, but many
applications include Postscript code as part of their file information. In many applications
Postscript code is created-on-the-fly by the application when needed for printing.
It is also useful as a print-to-disk file that can be downloaded to printers or used
to distill PDF files. Only a few applications can "parse," or create images directly
from Postscript files. Other applications will simply show the code, or do nothing.
Encapsulated Postscript (EPS or .eps)
You cannot generally view a Postscript formatted image -- it is text file. However,
when importing Postscript files into other applications, it is always useful to be
able to see a representation of the image. Encapsulated Postscript is a variation
of Postscript that also contains a preview image. This is by far a more common and
useful version of Postscript and can be used by many applications as an exchangeable
format. It is a highly recommended vector format.
Tagged Image File Format (TIFF or .tif)
TIFF is a high-resolution raster image file format that supports RGB, CMYK, grayscale
and bitmap images. It is a very widely used format that is supported by most graphic
applications and is ideal for cross-platform and inter-application exchange. It is
also a highly favored format for photographs to be printed. TIFF does not use file
compression in the base standard. However, it does work well with certain types of
loss-less compression schemes such as LZW. When in doubt, make it a TIFF.
Joint Photographic Experts Group (JPEG or .jpg -- also know as JFIF)
JPEG is a very popular and useful format for raster images on the Internet. It is
platform independent and can be exchanged among different computers. It supports both
RGB and CMYK color modes and uses compression to minimize file sizes. The level of
compression is configurable and can be optimized to maintain better image quality
or smaller file sizes. In some cases it may be used for images intended for print.
However, this must be done carefully since JPEG's compression is lossy, meaning it
physically alters the image permanently by discarding information. This is more critical
for print images. Moreover, continual re-saving in the JPEG format can compound the
degradation. As a general rule, JPEG is best suited for Internet use.
Graphics Interchange Format (GIF or .gif)
This is the most popular format for raster images on the Internet. It is platform
independent and can be exchanged among different computers. It only supports RGB color
mode and must index colors to a color table. Consequently, although it does not support
the highest image quality, it is an extremely efficient compressed format and displays
flat color especially well. Compression is somewhat configurable by modifying the
depth of the color table. In no case may the table contain more than 256 colors. It
is not a useful format for working in print and is best suited exclusively to the
Web.
Device Independent Bitmap (BMP or .bmp -- also known as DIB)
This is the standard windows format for raster graphics. It is not a preferred production
format since it does not support CMYK color. But due to prevalence of Windows it is
a format that is frequently encountered. Professional graphic producers will typically
convert these to other formats, such as TIFF, for the final production. BMP does support
RGB, grayscale and bitmap modes.
Portable Document Format (PDF or .pdf)
This is a hybrid format based on Postscript that can contain both raster and vector
information. It provides the highly compatible exchange of all kinds of documents
across virtually all platforms. It is based on Adobe's Acrobat system of tools. In
order to view PDFs the user must have the Adobe Acrobat Reader and associated operating
system files installed. The reader and its Web browser plug-ins are freeware and are
available from the Adobe Web site. To create PDFs a special printing driver or separate
conversion utility is required. This software is called Acrobat Exchange or Acrobat
Distiller and is only available for purchase.
PDF is an extremely useful format for what it does, but due to the necessity for it to do so many things, it does have limitations. One of these is a limited ability to edit or modify files once created -- a very important aspect of print production workflow. It is a highly configurable format that can provide from low-to-high compression results depending on the fidelity of the match required for the exchange. Acrobat uses a font substitution scheme that may allow for some variation in appearance for the sake of file size. When working in professional print production, high fidelity requirements generally result in large file sizes. It is rapidly becoming a new preferred standard for dealing with professional pre-press work. It is not a recommended format when you need someone to be able to edit your file.
Photoshop Document (PSD or .psd)
Although the Photoshop format is a proprietary format, due to its prevalence as a
standard image-editing application, it is an extremely useful format for exchange.
The native PSD format is cross-platform and can be used by both Macs and PCs. The
PSD format is primarily a raster format that provides extended support for features
such as layering, alpha channels, true spot color, vector type and paths, and many
other features. Converting a PSD to another raster format will "flatten" or disable
many of these features. It is a highly recommended format where professional image
editing may be required.
Text Formats
Word Document (DOC or .doc)Due to the prevalence of Microsoft Word, its document format has become a de-facto standard for text exchange. This is a cross-platform format that can easily be shared between Mac and PC users. Some consideration much be given to font selection since these are not transportable with the file and may not match with other users who do not have the same font. Most print production shops will translate it into other formats for final production.
Plain Text (.txt)
Plain test is a bare bones, text-only format that virtually every word processing
application supports. It is uses ASCII encoding for text characters and does not support
any internal formatting for font selection or styles. It is a workhorse for text file
exchange. When in doubt, make it Plain Text.
There are also a couple of variations of Plain Text that can be useful when dealing with certain spreadsheet or database files. These include Tab-delimited Text and Comma-delimited Text. Both of these are still basic Plain Text files.
Rich Text Format (RTF or .rft)
RTF is a more robust ASCII text format that also supports some internal text styles
and formatting, such as color, bold, italic, tabs, etc. RTF is not as widely supported
as Plain Text.
WordPerfect (WPD or .wpd)
This is a proprietary word processing format. Due to limited compatibility features
between Microsoft Word and WordPerfect, it is still somewhat viable as an exchange
format. Most print production shops do not use WordPerfect and will translate it into
other formats for final production.
Portable Document Format (PDF or .pdf)
PDF is also a viable text exchange format. For more information, see the PDF description
in the above graphic section.
Hypertext Markup Language (HTML or .html - also .htm)
HTML is the file format for Web pages. It is essentially a Plain Text format with
special coded tags that control the display of text and graphics on a Web page. The
key to HTML is the tags and understanding how to use them. There are numerous resources
for this available on the Web. HTML files can be created manually using text editors
or automatically by applications that create Web pages. Making sense of pure HTML
pages can be difficult for non-programmers, though the plain text can be view by numerous
applications, including the Web browsers themselves. By far the most common use for
these files is for Web browsers to parse into formatted pages. Except for a few specialized
cases, this is not a very useful format for file exchange.