-
Notifications
You must be signed in to change notification settings - Fork 785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Both the image in docx and pdf will not be converted to base64 encoded content #58
Comments
Better handling of images in documents is certainly something I would like to support, but just to be clear, what is your expected behavior here?
|
My preference would be for the first option. Save to disk and reference in Markdown. I've been the issues base64 can cause and would suggest avoiding that one as well. Perhaps it could be a flag to toggle between extract or LLM pipeline? |
Duplicate of #56 |
I think it should be encoded as base64 format image, since this wheel is used to convert document to markdown, the images shoule be displayed correctly. Saving the image to file with reference is also acceptable. The final goal I think is to display the image correctly. |
How to read a DOCX file, the OCR of an image is recognized as text or image description |
Hello, I've found this repo and it is awesome to use this to convert some document to markdown. But when I use this tool to convert
docx
orpdf
to markdown, the image in the file cannot be convertted currectly.The file as follows
The version of MarkItDown I used is 0.0.1a2, which is installed from pypi.org using
pip install markitdown
The text was updated successfully, but these errors were encountered: