(final?) yak shaving tool for writing my Phd thesis
Pipeline
scan the cards
Register them in a db. Here it will assigned a id (serial int), filename and scannedat. There is no intent to use it in a distributed manner yet. We might modify the image to have a higher success rate for ocr. I might be using minio to store this
ocr に入れるための準備コマンド:
mogrify -path processed \ -rotate -90 \ -density 300 \ -colorspace Gray \ -contrast-stretch 0 \ -despeckle \ *.jpg
ocr the images by using azure
The images will be sent to Azure Read API to ocr the handwritten text. The result will be stored in a separate table.
curl -v -X POST "https://umesata-vision.cognitiveservices.azure.com/vision/v3.2/read/analyze?language=ja" -H "Ocp-Apim-Subscription-Key: ${AZURE_KEY}" -H "Content-Type: application/octet-stream" --data-binary "@image.jpg"
recreate markdown document using llm
using the ocr output, feed them to a llm to convert the result into markdown document. The purpose of the pass through is to eliminate ocr errors and reconstruct the content to a comprehensible chunk of data. I might modify this markdown to supplement information that was lost or need more followup. It will be organized with a timestamp to store history. (In other words there will be multiple versions under on id)
- substring it by sentences, and compute the embeddings. calculate the geometric mean to represent the whole card. only the latest version of the markdown document will be used for this.
- cluster the cards
- visualize (interface is by it’s self a whole project so later)