Low-Cost LLM-Powered Data Processing with BARGAIN

BARGAIN reduces cost of data processing with LLMs while providing statistical guarantees on output quality

Liberating Structured Data from PDF Prisons

TWIX is an open-source data extraction tool that reconstructs structured data from documents at scale, accurately and at low cost, by inferring the shared underlying visual template across documents

Interactive LLM-Powered Data Processing with DocWrangler

DocWrangler is an IDE that provides instant feedback, visual exploration tools, and AI assistance for building and iterating on LLM-powered data processing pipelines

Reimagining LLM-Powered Unstructured Data Analysis with DocETL

DocETL is an open-source system for building LLM-powered data processing pipelines, offering declarative operators and powerful optimization for complex document analysis tasks