Working on a data pipeline that needs to process ~10,000 PDFs (research papers, 5-20 pages each). Looking for the most efficient approach.
Currently doing sequential uploads at ~15 seconds per document = 41+ hours. Not great.
for pdf_path in pdf_files:
with open(pdf_path, 'rb') as f:
resp = requests.post(
'https://api.qomplement.com/v1/parse',
files={'file': f},
headers={'Authorization': 'Bearer <token>'}
)
results.append(resp.json())
time.sleep(1)
Is there a batch endpoint? Max concurrent requests? Async processing option?