close
TechTutorial

How to Implement A High-precision OCR Tool with 100 Lines of Python Code

Recently, GitHub has openly sourced a screenshot tool based on Python called Textshot. It has just opened up 500+Star within half a month.

I took the time to take a look at the source code of Textshot in the past two days. It is indeed a project worth introducing.

Compared with most OCR tools with complex engineering and unsatisfactory results, Textshot has obvious advantages.

  • Simple project
  • Rich technical points
  1. Simple Project

The entire Textshot project has only 1 Python file, 139 lines of code, no complicated third-party library applications, and too many back-end algorithm calls.

  • Rich Technical Points

Although the Textshot project has only a short 139 lines of code, it involves the application of knowledge in many aspects of Python.

  • Front-end UI development
  • Screenshot tool development
  • Back-end engine call

Through this short project, you can not only use PyQt5 to implement a user interface, but also learn how to use pyscreenshot to develop your own screenshot tool. In addition, you can learn to call back-end tesseract.

In other words, this short 139 lines of code covers the entire process from front-end to back-end, and involves the connection of screenshots and OCR tools. Therefore, although Textshot is not a big project, it is a very complete and worth learning project.

This article will analyze the source code of this project and teach you step by step to realize self-use and permanent free screenshot & OCR tools!

  • Tesseract

There are countless OCR tools, but most of them are different packages on the same back-end algorithm. And the one that is really good at the core of OCR, then it must be tesseract.

Tesseract has been developed by HP Labs as early as 1985, and in 1995 it was rated as one of the three most accurate OCR tools. Since then, tesseract has been open sourced, and after Google has continuously optimized and upgraded it, it has become a benchmark tool for OCR. Many open source or paid OCR tools directly call tesseract or optimize it slightly.

The Textshot introduced today is to directly call the tesseract back-end engine for OCR recognition. Therefore, Textshot only implements a screenshot tool to connect the front and back ends, and does not do any work in the OCR recognition algorithm.

  • Tesseract Installation

Since Textshot’s OCR recognition needs to call the tesseract back-end engine, tesseract needs to be installed first.

You can use Homebrew to install it in Mac.

  • Textshot

Textshot is an OCR tool that uses screenshots to identify text. Therefore, it mainly involves two links.

  • Screenshot
  • OCR recognition

Textshot first obtains the image that needs text recognition through screenshots, then performs OCR text recognition on this image, and outputs the recognition result.

As mentioned earlier, the OCR recognition phase of Textshot calls tesseract, so it only needs 1 line of code to complete.

Therefore, Textshot’s work is mainly around the realization of front-end windows and screenshot tools.

3. Screenshot Tool

The screenshot tool is a tool we often use. How to implement a screenshot tool?

Many people think it is very complicated. In fact, there are many libraries or functions in Python that can implement screenshots. For example, the ImageGrab function in pyscreenshot or pillow is called as follows:

In other words, we only need to pass the start and end coordinates of the mouse frame selection to the grab method to achieve the screenshot function.

So, the question now is how to get the start and end points of the mouse selection?

Textshot uses PyQt5 and inherits QWidget to implement some methods in the mouse frame selection process to obtain the start and end points of the frame selection.

Textshot inherits and rewrites QWidget methods mainly include the following:

  • keyPressEvent(self, event): keyboard response function
  • paintEvent(self, event): UI drawing function
  • mousePressEvent(self, event): mouse click event
  • mouseMoveEvent(self, event): mouse movement event
  • mouseReleaseEvent(self, event): mouse release event

It can be seen that the method rewritten above and the various actions involved in the screenshot process are included.  

  • Click the mouse
  • Drag and draw screenshot frame
  • Release the mouse

The specific code is as follows:

class Snipper(QtWidgets.QWidget):

    def __init__(self, parent=None, flags=Qt.WindowFlags()):

        super().__init__(parent=parent, flags=flags)

        self.setWindowTitle(“TextShot”)

        self.setWindowFlags(

            Qt.FramelessWindowHint | Qt.WindowStaysOnTopHint | Qt.Dialog

        )

        self.is_macos = sys.platform.startswith(“darwin”)

        if self.is_macos:

            self.setWindowState(self.windowState() | Qt.WindowMaximized)

        else:

            self.setWindowState(self.windowState() | Qt.WindowFullScreen)

        self.setStyleSheet(“background-color: black”)

        self.setWindowOpacity(0.5)

        QtWidgets.QApplication.setOverrideCursor(QtGui.QCursor(QtCore.Qt.CrossCursor))

        self.start, self.end = QtCore.QPoint(), QtCore.QPoint()

    def keyPressEvent(self, event):

        if event.key() == Qt.Key_Escape:

            QtWidgets.QApplication.quit()

        return super().keyPressEvent(event)

    def paintEvent(self, event):

        if self.start == self.end:

            return super().paintEvent(event)

        painter = QtGui.QPainter(self)

        painter.setPen(QtGui.QPen(QtGui.QColor(255, 255, 255), 3))

        painter.setBrush(QtGui.QColor(255, 255, 255, 100))

        if self.is_macos:

            start, end = (self.mapFromGlobal(self.start), self.mapFromGlobal(self.end))

        else:

            start, end = self.start, self.end

        painter.drawRect(QtCore.QRect(start, end))

        return super().paintEvent(event)

    def mousePressEvent(self, event):

        self.start = self.end = QtGui.QCursor.pos()

        self.update()

        return super().mousePressEvent(event)

    def mouseMoveEvent(self, event):

        self.end = QtGui.QCursor.pos()

        self.update()

        return super().mousePressEvent(event)

    def mouseReleaseEvent(self, event):

        if self.start == self.end:

            return super().mouseReleaseEvent(event)

        x1, x2 = sorted((self.start.x(), self.end.x()))

        y1, y2 = sorted((self.start.y(), self.end.y()))

  • Then start the screenshot interface. The specific code is as follows:

QtCore.QCoreApplication.setAttribute(Qt.AA_DisableHighDpiScaling)

app = QtWidgets.QApplication(sys.argv)

window = QtWidgets.QMainWindow()

snipper = Snipper(window)

snipper.show()

  • The user drags and selects the window to obtain the coordinates of the start and end points of the window. At this time, you can call the following statement to take a screenshot to obtain the text image that needs OCR recognition.
  • OCR Text Recognition

To capture the text image shot through ImageGrab.grab, the next step is to input the image content to the back-end tesseract engine to convert the image into a string.

At this point, an OCR tool with high accuracy and free forever has been realized.

Looking back at the Textshot project, we will find that only 2 lines of code are required for image and OCR recognition within the screenshot coordinate range, most of which are developed around obtaining the window start and end coordinates. In other words, the Textshot project did not make any changes to the core parts of OCR, but did some clever work in product packaging.

This article mainly introduces how to implement high-precision and free OCR tools based on Python code. The sample code introduced in the article is very detailed. It has a certain reference learning value for everyone’s study or work. Friends who need it can refer to this article.

Tags : freehigh-precisionOCR tool

1 Comment

Leave a Response