We process contracts that are anywhere from 5 to 200+ pages. For shorter ones, Qomplement works great. But for the really long ones (100+ pages), I’m finding that:
- Processing takes a while (expected)
- The extracted data sometimes misses clauses that appear deep in the document
Any tips for handling very long documents? Should I split them up?
1 Like
For long contracts we split by section. Most contracts have clear section headers so we wrote a quick script to split the PDF at those boundaries, process each section with the appropriate template, then merge the results. Adds complexity but the accuracy is much better.
1 Like
Similar approach here. For regulatory filings that are 100+ pages, we split into logical sections first. It also makes the results more usable downstream since each section maps to a specific compliance requirement.
3 Likes