Back File Scanning Project Overview and Tips

Back File Scanning Project Overview and Tips
  • 29

Back File Scanning Project Overview and Tips


I get asked all the time about back file projects and what the key critical steps are in the process. I have personally been involved in several large projects over my career so I am not an expert but I do have at least 5,000,000 scans under my belt. I just wanted to put together a simple overview of the back file scanning process and I added a few tips and best practices you should follow as well.

The document scanning process has 4 main steps – document prep, scanning, indexing and storing into a document management repository.

Step 1 – Document Preparation – This is probably the most important part on any back file project. This is the process of making the paper documents ready for scanning. Doc prep can be a very time consuming process but needs to be done right to ensure a successful back file scanning project. For mission critical applications a page count prior to scanning with verification after scan should be done.  Doc prep can include any of the following items

–          Removing the pages from boxes, folders or binders

–          Remove all staples and paperclips – make sure you get them all out before scanning

–          Unfolding of pages and shuffling pages to make sure pages are not stuck together

–          Document repair where required

–          Inserting document or batch separators could be used for document bursting or initial indexing


Step 2 – Scanning – This is the process of converting the paper document into a digital format. There are many things to consider when scanning because file size is greatly increased by changing the scanners DPI settings and whether you scan in black and white or color. The higher the DPI setting on the scanner the larger the document size. Industry standard for most business class documents is 200-300 DPI scanned in black and white. TIFF and PDF are the preferred file formats for scanning but the industry seems be moving towards PDF files.  There are some great compression tools on scanners for color images that will help reduce the overall file size but you should test this before scanning a large batch in color. Here are some general scanning tips:


–          Do some test scanning to see the results at 200 and 300 DPI in black and white to verify quality and then determine what the default DPI setting will be. Lower quality paper documents will need to be scanned at higher DPI settings.

–          If bitonal (black & white) scanning won’t work you might need to scan in grayscale or color JPEG

–          If you are planning to do OCR or OCR zone on documents then scanning at 300 DPI or greater is recommended


Step 3 – Indexing – This is the process of assigning metadata or key search terms to documents that you scan in. This is a very important part of the overall back file project because if the documents are not labeled correctly you won’t be able to easily find them in the digital document repository.


–          When indexing documents try to avoid double keying as much as possible. If you can index only a few fields and use a database look-up to a line of business application to get other key index data that will save a lot of time.

–          Use bar coding for document bursting or bursting and assigning document type

–          Use drop down lists where applicable – avoids input errors and allows for indexing consistency

–          Take your time and remember garbage in means garbage out. Don’t take short cuts when indexing your documents.


Step 4 – Storing the final documents in your document Repository – This step might be automatic for some customers because the scanning software could be the same solution they are using for their document management solution but the same rules apply in either case. Once the documents are scanned and indexed they will be in some sort of scan queue folder. It is recommended that some level of QA is done on the batch of scanned documents before they are moved to the main document repository. The QA process will be different for every type of process or vertical industry. Some will want to review all documents and indexed and others a small sampling. I would highly recommend some level of QA even if it is very small sampling.


–          Now that the hard work for scanning these files is done make sure you have proper backups of your document management repository and database.

–          Test the backups regularly and do a full restore on a test server

–          Make sure the document management server has enough CPU and memory to manage the amount of data and users that will be accessing the system

–          Make sure that any copier scanning hardware that might store the images on the machine gets cleared or erased. We have seen it to many times where sensitive data is protected in a document management solutions but it still is kept locally on the copier that was used as the scanning input device. Most copier vendors have a way to auto erase files after a certain period of time. Please verify what best practices your scanning hardware vendors recommend.