Is Chat GPT a Viable Alternative to OCR Systems?

July 4, 2024

Introduction

With the release of GPT-4o, OpenAI introduced a multimodal model capable of understanding the content of images. One of the notable tasks this model can perform is Optical Character Recognition (OCR). But how does it stack up against a specialized OCR system like Azure Read? This blog post delves into a comparative analysis between Chat GPT and Azure Read, focusing on their OCR capabilities.

Methodology

While Azure Read returns detections within bounding boxes, Chat GPT outputs plain text, making it impossible to use standard OCR datasets directly. We needed to create a small, custom dataset for comparison. We selected scans of old documents from the HathiTrust database. The chosen documents were of lower quality, presenting a more significant challenge for the models. We downloaded 10 images of pages and supporting texts, then we manually cleaned them to ensure accuracy. Both systems were then used to perform OCR on these images. We evaluated the OCR performance using Word Error Rate (WER) and Match Error Rate (MER) metrics from the jiwer library (https://pypi.org/project/jiwer/).

Examples of scanned documents from HathiTrust database

Results

Quality

Both systems performed remarkably well in OCR tasks, demonstrating high levels of accuracy. Azure Read showed slightly better performance, evidenced by its lower error rates:

  • WER Azure: 0.019671
  • WER Chat GPT: 0.055454
  • MER Azure: 0.019444
  • MER Chat GPT: 0.052878

For context, a good WER (Word Error Rate) and MER (Match Error Rate) value for OCR systems is typically below 0.5. Both Azure Read and Chat GPT significantly outperform this benchmark, showcasing their high accuracy. While Chat GPT's performance was slightly less accurate, it still provided a very good level of accuracy suitable for production environments.

Time

Here is where the real differences start to emerge. Azure Read is significantly faster, processing 10 images in 78.6 seconds compared to Chat GPT's 183.4 seconds for the same number of images. This indicates that Azure Read is a more time-efficient option, and its speed advantage is likely to scale better with larger images and higher volumes of text.

  • Azure: 78.6 seconds for 10 images
  • Chat GPT: 183.4 seconds for 10 images

Price

Azure Read is more cost-effective, which may not seem significant for just 10 images. However, when designing a system that handles thousands of images, this cost difference becomes a critical factor. Azure Read's lower price can lead to substantial savings in large-scale applications.

  • Azure: $0.015 for 10 pages
  • Chat GPT: $0.05 for 10 pages

Other Aspects

There are several other important aspects to consider when comparing these two systems. These might include:

  • Bounding Boxes: Azure Read returns bounding boxes, which are incredibly useful in many production environments where spatial text information is crucial.
  • Rate Limits: Azure Read has better rate limits, accommodating higher volumes of OCR requests.
  • Additional Logic: Chat GPT can apply additional logic to OCR results, such as structuring text as markdown, if you want to summarise the text from image, you might consider doing it in a single step with Chat GPT.

Summary

Both systems performed really well, with Azure Read showing slightly better accuracy and efficiency. However, the quality of Chat GPT's OCR is still sufficient for many production workflows. The choice between these technologies depends largely on specific project requirements.

If you are considering implementing an OCR system or other AI solutions, contact Baies Analytics. Baies Analytics is dedicated to guiding clients through the complexities of AI and data, enabling them to derive maximum value from these technologies. Whether you need expert advice on OCR implementation or broader AI integration, Baies Analytics is here to help you navigate and succeed in leveraging these powerful tools.

Let’s have a chat

Full name
Email
Message
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.