Using Unstructured with Snowflake Cortex Search for RAG


Use Snowflake Cortex and RAG to do natural-language searches across a Snowflake table that contains data provided by Unstructured. Additional Snowflake Cortex functions are also explored.
Unstructured API Snowflake Cortex RAG Search Workflows S3

Agentic RAG with LangGraph and Together AI


Build Agentic RAG with LangGraph and Together AI and compare the results with Vanilla RAG in pure Python
Unstructured API Workflows Agents LangGraph Together AI Astra DB

Getting Started with Unstructured API and Snowflake


Learn how to create data processing workflows with Unstructured API and its Python SDK to preprocess all of your unstructured data from your Azure Blob Storage into your Snowflake Table.
Unstructured API Workflows Azure Blob Storage Snowflake

Building Graph-Based RAG Applications


Learn how to use the Unstructured API to create a Graph RAG-based workflow that writes data with named entity recognition (NER) to your Astra DB.
Unstructured API Workflows Graph RAG NER Astra DB

Getting Started with Unstructured API and Delta Tables in Databricks


Learn how to create data processing workflows with Unstructured API and its Python SDK to preprocess all of your unstructured data into your Delta Table.
Unstructured API Workflows Databricks S3

RAG for Online Documentation


Crawl websites with Firecrawl and build a RAG workflow powered by Unstructured and MongoDB Atlas vector search.
Unstructured API Workflows MongoDB

Unstructured Workflow Endpoint Quickstart


Build an end-to-end workflow in Unstructured programmatically by using the Unstructured Workflow Endpoint.
Unstructured API Workflows S3

RAG with Databricks Vector Search with Context from Multiple Sources


Build RAG with Databricks Vector Search with context preprocessed from multiple sources by Unstructured.
Databricks Introductory notebook

Agentic RAG with Hugging Face smolagents vs Vanilla RAG


Build Agentic RAG with smolagents library and compare the results with Vanilla RAG in pure Python
GPT-4o smolagents Agents DataStax S3 Advanced notebook

LLama3.2 RAG evaluation on unstructured text


Evaluate Llama3.2 for your RAG system with Unstructured, GPT-4o, Ragas, and LangChain
GPT-4o Ragas LangChain Llama3.2 Pinecone S3 Advanced notebook

Multimodal RAG: Enhancing RAG outputs with image results


Process a file in S3 with Unstructured and return images in your RAG output
S3 FAISS GPT-4o-mini Advanced notebook

Quantitative Reasoning with tables inside PDFs


From Pixels to Insights: Seamlessly Extracting and Visualizing Table Data with Unstructured and Hex
Unstructured API Hex Advanced notebook

PII removal with GLiNER in unstructured data ETL


Remove Personally Identifiable Information (PII) as a part of unstructured data preprocessing.
Unstructured API PII GLiNER Advanced notebook

Custom metadata extraction and self-querying retrieval


Extract custom metadata, and enable metadata pre-filtering in your RAG.
Unstructured API MongoDB Metadata Advanced notebook

Selecting an embedding model for custom data


End-to-end data processing pipeline using Unstructured Serverless API.
Unstructured API Hugging Face Advanced notebook

RAG with PDFs, LangChain and Llama 3


A RAG system with the Llama 3 model from Hugging Face.
Unstructured API 🤗 Hugging Face LangChain Llama 3 Introductory notebook

Unstructured data ETL from S3 to SingleStore DB


Learn to ingest, partition, chunk, embed and load data from an S3 bucket into SingleStore DB.
Unstructured API SingleStoreDB AWS S3 Introductory notebook

Google Drive to DataStax Astra DB


Embed your Google Drive Docs in an Astra Vector Database with Unstructured Serverless API
Unstructured API Google DataStax Introductory notebook

Weaviate RAG quickstart


Embed your local documents in an Weaviate Vector Database with Unstructured Serverless API
Unstructured API OpenAI Weaviate Introductory notebook

Preprocess PDFs in AWS S3, load into Elasticsearch


Ingest PDF documents from an S3 bucket, transform them into a normalized JSON with Unstructured Serverless API, chunk, embed and load into Elasticsearch.
Unstructured API AWS S3 Elasticsearch Introductory notebook

Preprocess documents in Google Drive, load into Databricks Volume


Preprocess documents from a Google Drive Unstructured Serverless API and load them into Databricks Volume.
Unstructured API Google Drive Databricks Introductory notebook

Source references in RAG responses


Add document source references to RAG responses based on documents metadata.
Unstructured API RAG LangChain Intermediate notebook

Query processed PDF with HuggingChat


Send a PDF to Unstructured for processing, and send a subset of the returned PDF’s processed text to
HuggingChat for chatbot-style querying.
Unstructured API 🤗 Hugging Face 🤗 HuggingChat Introductory notebook

Llama 3 Local RAG with emails


Build a local RAG app for your emails with Unstructured, LangChain and Ollama.
Unstructured API LangChain Ollama Llama 3 Introductory notebook

Building RAG With PowerPoint presentations


A RAG solution that is based on PowerPoint files.
Unstructured API 🤗 Hugging Face LangChain Llama 3 Introductory notebook

Synthetic test dataset generation


Build a Synthetic Test Dataset for your RAG system in 5 easy steps
Unstructured API GPT-4o Ragas LangChain Advanced notebook