-
Beta Was this translation helpful? Give feedback.
Answered by
myhloli
Aug 6, 2024
Replies: 2 comments 8 replies
-
这个不太合理,建议上传两次解析的auto目录打包上来给我们分析一下 |
Beta Was this translation helpful? Give feedback.
8 replies
-
0.10.x的auto模式对这些span不规范的pdf有优化,可以更新到最新版本或在我们的gradio demo上测试 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
这篇文档比较特殊,span的排版存在一些错位和空格塞入的问题
左边是auto模式使用文字版提取逻辑得到的span,右边是ocr模式提取的span。
对于这种文档,可以在命令行中添加 --method ocr 的方式强制开启ocr解析,效果会好不少
ocr.zip