Jimmy筆記: Python OCR使用

2020年11月7日星期六

Python OCR使用

使用Tesseract-OCR及Pytesseract套件

Tesseract下載位置：

https://github.com/tesseract-ocr/tesseract/wiki

中文辨識檔案下載位置：

https://github.com/tesseract-ocr/tessdata

下載後放到C:\Program Files\Tesseract-OCR\tessdata目錄下。

Pytesseract套件使用方法參考：

https://pypi.org/project/pytesseract/

範列：

from PIL import Image
import pytesseract

#設定Tesseract安裝位置。
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
#將圖片轉成文字。
print(pytesseract.image_to_string(Image.open('D:\\Test.png'), lang='chi_tra+eng'))

#轉PDF

with open('D:\\ToPDF.pdf', 'w+b') as f:
    f.write(pytesseract.image_to_pdf_or_hocr('D:\\Test.png', extension='pdf'))

Jimmy筆記

2020年11月7日星期六

Python OCR使用

沒有留言:

張貼留言

Ubuntu-Journalctl查看系統日誌

2020年11月7日 星期六

Python OCR使用

沒有留言:

張貼留言

Ubuntu-Journalctl查看系統日誌

2020年11月7日星期六