Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF parsing doesn't support partially numbered lists #68

Open
majdalsado opened this issue Dec 16, 2024 · 1 comment
Open

PDF parsing doesn't support partially numbered lists #68

majdalsado opened this issue Dec 16, 2024 · 1 comment
Labels
bug Something isn't working open for contribution Invites open-source developers to contribute to the project.

Comments

@majdalsado
Copy link

When parsing a PDF file with partial numberings as is common in MasterFormat, a format used in most construction and government documents across US/Canada, the parser fails to show numbered lists properly. See example.

MarkItDown Output:

RFP for Construction Management Services
Rotary Clubs of Grande Prairie Wellness Centre Society
Ken Sargent House

Section 00 00 43
Instructions to Respondents
Page 1 of 8

INTENT

.1

.2

.3

.4

.5

.6

.7

.8

The intent of this Request for Proposal (RFP) is to solicit submissions, in the format
detailed in this document, from qualified Construction Managers for the following project:

KEN SARGENT HOUSE
GRANDE PRAIRIE, ALBERTA

Available information relative to the project is included in Section 00 00 45 – Description
of Project.

Actual Document

Image
@afourney
Copy link
Member

Yeah, PDFs are not the nicest format for conversion, and their conversion is perhaps the least good in terms of fidelity. I'd like to improve that, and will investigate some options while keeping things lightweight.

@gagb gagb added bug Something isn't working open for contribution Invites open-source developers to contribute to the project. labels Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working open for contribution Invites open-source developers to contribute to the project.
Projects
None yet
Development

No branches or pull requests

3 participants