Liberating Structured Data from PDF Prisons
TWIX is an open-source data extraction tool that reconstructs structured data from documents at scale, accurately and at low cost, by inferring the shared underlying visual template across documentsInteractive LLM-Powered Data Processing with DocWrangler
DocWrangler is an IDE that provides instant feedback, visual exploration tools, and AI assistance for building and iterating on LLM-powered data processing pipelinesReimagining LLM-Powered Unstructured Data Analysis with DocETL
DocETL is an open-source system for building LLM-powered data processing pipelines, offering declarative operators and powerful optimization for complex document analysis tasksLightweight Nudges for More Accurate Retrieval in RAG Pipelines
Make your retrieval pipelines more effective with this novel and lightweight fine-tuning approach
Newer