Archive and Image Files
Quick Resume:- Archive and Image files are specialised storage files that can store the data from a particular source and keep it intact so that its content can be restored at a later time. The term image file relates here to an 'image of data' and has nothing to do with pictures or photos.
Such storage files can be compressed to facilitate storage or data transfer. They can represent a collection of files and folders or alternatively store the underlying raw binary data. Our own use of the term image file or archive file to delineate what has been going on will be outlined below as will the introduction of the concept of File System Containers (FSCs).
Introduction
- Some basic understanding of how data is stored can be helpful (possibly crucial) when creating but especially when restoring from storage files and in deciding which method is most appropriate for one's circumstances. We hope to cover the basics and present an overview of the various software options on this page. Specific software will be covered elsewhere soon but if you want something to get you started maybe have a look at our BiNG Page or try a functional trial of its very straightforward cousin ImageForDOS.
Block Devices and File System Containers (FSCs)
- Confusion is often caused by a variety of terminologies used when referring to partitions, volumes, disks, drives, sets, spans and so on. The term logical often crops up but can often obfuscate rather than clarify what is really going on since so much of what goes on in software is logical or virtual. A block device (such as a CD-disk, floppy diskette or hard disk) is a binary data-container in which the data is referenced and moved around in blocks of a specific size. Many block devices can also contain a file system and we will refer to these as FSCs.
- Hard disks (Block Devices in their own right) can be sub-divided into areas contining sub-blocks usually known as partitions. Partitions can be considered as Block Devices in their own right with their own specific block sizes and some partitions may also become FSCs. Operating systems can only utilise or access specific file systems by first mapping these Block-device FSCs (mostly with letters of the alphabet) under DOS/Windows or 'mounting' them under Linux/Unix. The term mounting derives from the time when tape spools were physically mounted on early computers to make their data accessible. Extended partitions and the Master Boot Record on a hard drive are two examples of Block Devices which just contain metadata and are not therefore discrete FSCs.
Archiving -v- Imaging
- The term Imaging (often also called cloning or ghosting) will be used herein to describe the backup of a whole hard or floppy or CD/DVD disk or of individual FSCs that they may contain. Backing-up or packing or making archives of files and folders will be referred to as Archiving rather than Imaging even though there is software that combines both methods. The resultant files created by either method will be collectively called Storage Files hereafter, with Image Files being the result of imaging and Archive Files being the result of archiving. We concede that we cannot be too pedantic in this area but will at times need to differentiate between the different methodologies.
- Archiving is pre-eminently used for data storage and Imaging to allow restitution of whole partitions, CDs etc.
Understanding Metadata and its significance
- Data and metadata are, at the binary level, identical. Metadata is simply that data which is data about data. Outside computers, a file index or a table of contents (TOC) are, for example, types of metadata. In any system there may be many layers or hierarchies of metadata and that is as true of the binary data on storage media as of anything else.
- Some of the metadata relevant to data storage is within and some is outside any FSCs. Often the first and last track (or even a whole cylinder) of a hard disk will contain metadata. This could be the Master Boot Record (MBR) at the very start of a hard drive (and which delineates where to find their own primary FSCs on the disk) or RAID information stored at the end of a hard drive and which does something pretty similar. An FSC on CDs is basically a TOC and there may be one or more than one created/recreated each time a new session is closed. Special metadata sectors outside any TOC/FSC on a CD can, for example, make the CD emulate a floppy disk or a hard disk and thus augment/modify bootability by pointing to an image file of a floppy or hard disk stored in the data area.
- Within any Partition/FSC of a hard disc some of the metadata is held in its own Partition Boot Sector (PBS) and some in other special areas that map-out where within the FSC certain file-data is kept. Typically these include the File Allocation Tables on FAT file systems and the $MFT or Master File Table on NTFS systems. Unless CDs are erased the data on them is never overwritten nor fragmented. Files with changed data are rewritten in total (even on rewritable media) and one or more TOC then updated. Flash Memory storage is similar to magnetic media from the user's perspective but there are secondary layers of complexity involved, which we won't go into here just now. We will just say that Flash Memory can however be split into more than one part, where one part might mimic a hard drive and another part a floppy or a CD.
- The final metadata components within the FSCs that do concern us here are the attributes of the files themselves. Things like the name, size, date of creation and so forth. These are stored quite differently within FAT than within an NTFS or a Reiserf FSC but always in a distinctly different area than the file data itself. A file can even be empty of data (have 0 bytes) but still have a name and other attributes. Attributes are stored in various proprietary (usually simplified) forms when files are stored in archived formats such as .zip or .rar files.
- Of course files themselves can contain their own specific metadata often called "headers or header information". Storage files are no different unless they are raw image files. Raw images are true clones or literal copies of the original binary data and so have no such file headers. Other storage files will contain header information to allow the original to be reconstructed from any compressed content and maybe to allow for data integrity to be quickly checked. Another approach sometimes used is the construction of two files; one containing the data and one containing the "headers" or other metadata.
Archive Files
- The most straightforward of such files are .zip and .rar files and a variety of Linux tarballs and their ilk. In essence programs such as the native Windows MS-Backup as well as a number of proprietary back-up programs such as SyncBack and SecondCopy can create similar types of archives files. The specialist program WinImage is very versatile in the range of archives (and images) that it can produce. These utilities can typically copy either small or large numbers of files and folders and then compress and "inject" them into a single archive file along with header information about the file and folder attributes of the content. Some will also contain information about where the original was made from and so forth. When the program that created them is used to restore from the archive it first reads its own header information from the file and should thus be able to restore the content appropriately. Some utilities can make self-extracting archives whereby the content can be automatically restored to a pre-defined or to a different new location.
- Many archiving and back-up utilities of this sort can additionally add and subtract files to and from the archive file. Specialised archive or back-up files can also be created whereby both incremental and differential backup of the relevant data can be made. Others can make multiple copies of files that may have changed in the interim. These more elaborate actions are typically the provence of good back-up software, which can usually also be configured to archive or back-up the material automatically on a schedule.
- Many such utilities can additionally encrypt the contents (but if you do this do not forget the password) and break-up large storage files into more manageable chunks that can be stitched back together at restoration time. Bear in mind that you will need to run the same program to restore the files so these are not good ways (in our opinion) to attempt to backup whole systems or the full content of system FSCs. Imaging is in our considered opinion a much superior way of creating storage files that represent the full contents of any FSC or other Block Device. If creating images of system FSCs then we also highly recommend that this is not instituted or even attempted from within the operating system itself.
Image Files
- Image files can best be thought of as files that represent storage disks as a whole or of any FSCs within them or, in the case of RAID, of FSCs that are partitions within the array itself; the array being composed from a number of disks. Hard disk partitions and CDROMs are the two most commonly imaged media, with the latter's images often being referred to simply as "ISOs", whether or not they have been given the .iso file extension.
- As with Archive Files, Image Files can be compressed or not. Well-performed compression should have no effect on the underlying data when restored but the more compression, the longer such imaging and restoration is likely to take.
- Image files by their very nature are inclined to be very large and so it is likely that they should be split into smaller chunks. The first such chunk would hold the headers referencing all the other necessary chunks in the set and would be the only file used to initiate a restoration. Typical sizes would be say 600MB to fit onto CDs or 2GB or 4GB to be capable of being stored on FAT partitions.
- Imaging can be done in two basic ways. By copying the data sector by sector or by copying it file by file. In the latter case specific attention needs to be given to the FSC's metadata.
Sector by Sector Images
- Imaging any FSC by simply copying each sector inside it means that the imaging software requires no knowledge of either the metadata or of the file system. Thus hard drive partitions formatted with any file system can be imaged and restored using such software. Uncompressed (or raw or literal) images of this sort are basically pure clones of the original. The only time they fail as pure clones is when they contain any bad (and hence uncopiable) sectors. These Images can be compressed both during or after imaging without affecting the ability to restore a complete and identical clone.
- It is because these files always contain a replica of all the metadata that was on the original that they can be used to recreate an identical structure at a later time. They are "snapshots" of FSCs frozen in time. A fragmented partition would still be fragmented for example.
- Such files should be created whilst nothing else accesses that FSC during the imaging of it. Often this is best done by booting to a boot floppy or boot CD containing an imaging utility. We highly recommend this way of doing things because you are extremely likely to need these disks again for any restoration and so making the images in the first place by using the same software that you will later use to restore them has great merit. If data or partition restoration is really important to you please do this using such dedicated boot media and don't rely on such imaging from within your normal operating system.
File by File Images
- File by file imaging is in essence the creation of an archive file (a) containing all the files and folders of the relevant FSC packed or compressed into it and (b) the metadata needed to recreate a complete or partial copy of it. Because files rather than the underlying data sectors are being manipulated there is more “flexibility” of what can be done with the data.
- It is possible, for example, to zero files such as the pagefile, to allow its complete compression and thus reduce the size of the resulting image file. It is possible, as with archive files, to inject or remove files from this specialised archive and to allow differential and incremental backup files to be included in it.
- It should be relatively easy for the software to restore into different (but appropriately-sized) partitions and even in some instances to allow the restored file format to be different from the original. If the files are copied back into a freshly formatted partition one at a time then any restored partition could also be effectively defragmented at one and the same time.
- Under some operating systems (eg WinXP) many locked (in-use) files can be unlocked during imaging and so it is at least theoretically possible to create images of partitions containing files currently in use from within Windows. We personally don’t like this practice and would only ever use it for purely non-system data partitions, with no other programs running and only as a temporary measure.
- There exist elaborate, expensive and user-friendly packages that can do this sort of imaging and if this suits your purposes and encourages you to make back-ups then we wont try to deter you. We would simply reiterate that it is simple to create (or appear to create) backups but it is what you later need to do to restore them that can be the real problem. At the very least we recommend that you run a disk-checking utility before making images of this type. And take note that because the images are proprietary in nature (unlike the generic sector by sector images) that if there is file corruption of the image files themselves that accessing or repairing them is going to be much more complex. Also note that sector by sector image creation and restoration will generally be done faster than file by file methods.
Image File Uses
- As well as being used to restore your system to an earlier snapshot in time, image files have other uses. These include the ability to easily create loads-n-loads of clones, to reduce the required storage space and to allow quicker electronic transfer of data. Such files can themselves be mounted in an operating system to create "Virtual Drives" that can then function or be "Explored" just like a real drive. In particular using ISOs as virtual drives makes accessing such "CDs" much faster from within the operating system. Raw image files have particular uses in some data recovery situations and for the storage and analysis of data for forensic purposes: (e.g. WinHex and Encase)..
Restoration Practice
- We have already alluded to creating and restoring image files using a specialised or dedicated floppy or CD. If system recovery is going to be vital to you then (regardless of the software you determine to use) do first practice making and restoring images on a brand new or spare disk until you are truly comfortable with the methods. There is nothing worse than believing you can restore your system because you have put everything in place only to find that you cannot or don't have a clue how to restore the backup.