mikekunz.com
 MikeKunz > Write-ups > Development and Programming > Using Image File Headers To Verify Image Format

Using Image File Headers To Verify Image Format

A good method of identifying the format of an image file is to read the image file header. A Clarion code sample is included.

Published May 15, 2007 by mjkunz
Last updated on Jul 28, 2008

As a standard most image files have a header or marker segment that will allow you to determine the image type.  You can always look at the file extension but it is not a reliable way to determine the image type, especially if the file extension has been changed.  For example, suppose you have a software application for uploading images.  If the only image supported is a JPEG image then most likely you will have some end user that will try and upload a file that has a different image format.  They will probably rename the file extension to accomplish that.  This can cause potential headaches depending on what your software application checks and needs to do with the image.  A simple solution is to have your software read the image header to determine and verify the real image format.

There is all kinds of information available in the header of an image file but fortunately the segment that contains image type data is at the beginning of the header.  Here is the header information you need to look for to identify some of the more common image formats.

JPEG

 The JPEG image format is probably the most widely used compressed image format.  There are different formats for JPEG images and JFIF is the most common.  It is a simplified version of JPEG and is considered to be the standard for most applications that read and write JPEG files.

When we look at a jpeg header there are multiple parts we can use to identify the type of image and formats used.  The first part to look at is the first two bytes of the file. The hex values FF D8 will identify the start of the image file.  This is often enough to know that you have an actual JPEG file.  The next two bytes are the Application marker typically FF E0.  This marker can change depending on the application used to modify or save the image.  I have seen this marker as FF E1 when pictures were created by Canon digital cameras.  The next two bytes are skipped.  Read the next five bytes to identify specifically the application marker.  This would typically be 4A 46 49 46 (JFIF) and 00 to terminate the string.  Normally this zero terminated string will be "JFIF" but using the previous example of Canon digital cameras this string will be 45 78 69 66 (Exif).  Most image editors handle all JPEG formats unless a proprietary format is used that does not follow the JPEG standard.

Here is a quick summarization.  Look for the following byte sequence starting at the beginning of the file but remember the application markers that can vary:  FF D8 FF E0 <skip two bytes> 4A 46 49 46 00

Byte 1 2 3 4 5 6 7 8 9 10 11
Hex FF D8 FF E0 Skip Skip 4A 46 49 46 00
Char ÿ Ø ÿ à Skip Skip J F I F  

 

 

 

TIFF (Tag Image File Format)

The TIFF image format was designed to become a standard in image file exchange.  Even though it is widely used it never did become the standard that was envisioned.  Most commonly now you might see this format used by document scanners.  The image header for a TIFF image is a fixed 8 byte segment always occurring at the beginning of the file.  To ensure TIFF images can be read properly by PC's (Intel processors) and Macintosh computers the header must indicate a byte order which in this case is the first two bytes of the file.  The first two bytes will either be hex 49 49 (II) for Intel format or 4D 4D (MM) for the Macintosh integer format which was based on Motorola processors.  The next byte is 2A (decimal 42).  This number should never change.  The number 42 was used because of it's significant meaning in mathematics, religion, science, and other fields.   Ignore the rest of the header.  For our purpose of identifying the image format these bytes are all that we need to look at.

Byte 1 2 3
Hex 49 49 2A
Char I I *

 

 

 

Bitmap (BMP)

The bitmap graphics format is usually uncompressed. Because of this the Bitmap file will use more storage space than other compressed image formats such as a JPEG.  The first two bytes of the Bitmap file and header will identify its image format.

Byte 1 2
Hex 42 4D
Char B M

 

 

 

GIF (Graphics Interchange Format)

A GIF file (pronounced as "jiff") is a compressed image format.  It uses lossless data compression which is also used in zip and gzip functions.  Lossless data compression ensures that there is no data loss or image degradation.  GIF files are largely used for animated images and in the early years of the internet you would be hard pressed to find a website not using some form of animated GIF file.

To identify the GIF file read the first three bytes of the file.

Byte 1 2 3
Hex 47 49 46
Char G I F

 

 

 

Code Examples

Clarion Code

The code to perform these checks in Clarion is quite simple.  Here is a basic example for checking the header file for the above mentioned file formats.  This code will not compile on its own.

VerifyImageType PROCEDURE (STRING fileNameAndPath, *STRING myErrorDescription)

imageFileExt             STRING(4), AUTO

imageFileName         STRING(MAX_PATH_AND_FILENAME_LENGTH), AUTO
IMAGE_FILE            FILE, DRIVER('DOS'),Name(imageFileName),PRE(IMFIL)
Record                  RECORD
fileBuffer                STRING(10)
                        END
                      END
CODE

  imageFileName = fileNameAndPath
  myErrorDescription = ''

  !Extract the file extension - not a built in function
  IF ~GetfileExtension(imageFileName, imageFileExt)
    !Report Error
    myErrorDescription = 'Error getting file extension'
    Return(FALSE)
  END

  Open(IMAGE_FILE, ReadOnly)
   IF ErrorCode()
    !Report Error
    myErrorDescription = 'Error opening ' & Clip(imageFileName)
    Return(FALSE)
   END

  Set(IMAGE_FILE)
  Next(IMAGE_FILE)
  IF ErrorCode()
    myErrorDescription = 'No data found in image file'
    Return(FALSE)
  END

  CASE Upper(Clip(imageFileExt))
  OF 'JPG'
  OROF 'JPEG'

    IF IMFIL:fileBuffer[1:2] = '<0FFh,0D8h>'
      !Success, File was recognized, Perform some task.
      Return(TRUE)
    ELSE
      !Report error and Perform some task
      myErrorDescription = 'Image header does not match the file extension'
      Return(FALSE)
    END

  OF 'BMP'

    IF IMFIL:fileBuffer[1:2] = '<042h,04Dh>'
      !Success, File was recognized, Perform some task.
      Return(TRUE)
    ELSE
      !Report error and Perform some task
      myErrorDescription = 'Image header does not match the file extension'
      Return(FALSE)
    END

  OF 'TIFF'
  OROF 'TIF'

    IF IMFIL:fileBuffer[1:3] = '<049h,049h,02Ah>'
      !Success, File was recognized, Perform some task.
      Return(TRUE)
    ELSE
      !Report error and Perform some task
      myErrorDescription = 'Image header does not match the file extension'
      Return(FALSE)
    END

  OF 'GIF'

    IF IMFIL:fileBuffer[1:3] = '<047h,049h,046h>'
      !Success, File was recognized, Perform some task.
      Return(TRUE)
    ELSE
      !Report error and Perform some task
      myErrorDescription = 'Image header does not match the file extension'
      Return(FALSE)
    END

  ELSE
    !Handle unsupported images if needed.
    myErrorDescription = 'Error, non supported image extension encountered'
    Return(FALSE)
  END

 

If you have any feedback, questions, ideas, or suggestions regarding this write-up please contact us and let us know. We are always working to improve mikekunz.com and we appreciate your feedback.

del.icio.us logo Save to del.icio.us!    StumbleUpon logo Stumble It!