The Unstructured Partition Endpoint, part of the Unstructured API, is intended for rapid prototyping of Unstructured’s various partitioning strategies, with limited support for chunking. It is designed to work only with processing of local files, one file at a time. Use the Unstructured Workflow Endpoint for production-level scenarios, file processing in large batches, files and data in remote locations, generating embeddings, applying post-transform enrichments, using the latest and highest-performing models, and for the highest quality results at the lowest cost.

Get started

To call the Unstructured Partition Endpoint, you need an Unstructured account and an Unstructured API key:

If you signed up for Unstructured through the For Enterprise page, or if you are using a self-hosted deployment of Unstructured, the following information about signing up, signing in, and getting your Unstructured API key might apply differently to you. For details, contact Unstructured Sales at sales@unstructured.io.

  1. Go to https://platform.unstructured.io and use your email address, Google account, or GitHub account to sign up for an Unstructured account (if you do not already have one) and sign into the account at the same time. The Unstructured user interface (UI) appears.

  2. Get your Unstructured API key:

    a. In the Unstructured UI, click API Keys on the sidebar.
    b. Click Generate API Key.
    c. Follow the on-screen instructions to finish generating the key.
    d. Click the Copy icon next to your new key to add the key to your system’s clipboard. If you lose this key, simply return and click the Copy icon again.

By following the preceding instructions, you are signed up for a Developer pay per page account by default.

To save money, consider switching to a Subscribe & Save account instead. To save even more money, consider switching to an Enterprise account instead.

Try the quickstart.

Set up billing

If you signed up for a pay-per-page plan, you can enjoy a free 14-day trial with usage capped at 1000 pages per day.

If you initially signed up for a subscribe-and-save plan instead, of if you signed up through the For Enterprise page instead, your billing setup and terms will be different. For billing guidance, contact Unstructured Sales at sales@unstructured.io.

At the end of the 14-day free trial, or if you need to go past the trial’s page processing limits during the 14-day free trial, you must set up your billing information to keep using the Unstructured Partition Endpoint:

  1. Sign in to your Unstructured account, at https://platform.unstructured.io.
  2. At the bottom of the sidebar, click your user icon, and then click Account Settings.
  3. Click the Billing tab.
  4. Click Manage Payment Method, follow the on-screen instructions to enter or update your payment details, and then click Save card.

Your card is billed monthly based on your usage. The Billing page shows a billing overview for the current month and a list of your billing invoices.

You can save money by switching from a pay-per-page plan to a subscribe-and-save plan. To do this, go to the Unstructured Subscribe & Save page and complete the on-screen instructions.

We calculate a page as follows:

  • For these file types, a page is a page, slide, or image: .pdf, .pptx, and .tiff.
  • For .docx files that have page metadata, we calculate the number of pages based on that metadata.
  • For all other file types, we calculate the number of pages as the file’s size divided by 100 KB.
  • For non-file data, we calculate a page as 100 KB of incoming data to be processed.

Quickstart

This example uses the curl utility on your local machine to call the Unstructured Partition Endpoint. It sends one or more source (input) files from your local machine to the Unstructured Partition Endpoint which then delivers the processed data to a destination (output) location, also on your local machine. Data is processed on Unstructured-hosted compute resources.

If you do not have source files readily available, you could use for example a sample PDF file containing the text of the United States Constitution, available for download from https://constitutioncenter.org/media/files/constitution.pdf.

1

Set environment variables

From your terminal or Command Prompt, set the following two environment variables.

  • Replace <your-unstructured-api-url> with the Unstructured Partition Endpoint base URL, which is https://api.unstructuredapp.io
  • Replace <your-unstructured-api-key> with your Unstructured API key, which you generated earlier on this page.
export UNSTRUCTURED_API_URL=<your-unstructured-api-url>
export UNSTRUCTURED_API_KEY="<your-unstructured-api-key>"
2

Create a partition job

Run the following curl command, replacing <path/to/file-1> with the path to the source file on your local machine. To specify multiple files, repeat the --form 'files=@<path/to/file-N>;type=application/pdf' option in this command for each additional file.

If the source file is not a PDF file, then remove ;type=application/pdf from the related --form option in this command.

curl --request 'POST' \
"$UNSTRUCTURED_API_URL/v1/partition_async" \
--header 'accept: application/json' \
--header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
--header 'content-Type: multipart/form-data' \
--form 'strategy=vlm' \
--form 'vlm_model_provider=openai' \
--form 'vlm_model=gpt-4o' \
--form 'output_format=application/json' \
--form 'files=@<path/to/file-1>;type=application/pdf' \
--form 'files=@<path/to/file-N>;type=application/pdf'

The results are printed to your terminal or Command Prompt with a format similar to the following:

{
    "partition_id": "<job-id>",
    "partition_status": "scheduled",
    "partition_status_message": "Partition job created"
}

Make a note of the <job-id> value, as you will need it in the next step.

3

Check the status of the job

Run the following curl command, replacing <job_id> with the <job_id> value from the previous step.

curl --request 'GET' \
"$UNSTRUCTURED_API_URL/v1/partition_async/<job_id>" \
--header 'accept: application/json' \
--header "unstructured-api-key: $UNSTRUCTURED_API_KEY"

The results are printed to your terminal or Command Prompt with a format similar to the following:

{
    "partition_id": "<job-id>",
    "partition_status": "in_progress",
    "partition_status_message": "Started processing partition request",
    "elements": null
}

If the job is still in progress, repeat the curl command until the job is complete.

4

Examine the results

If you run the preceding command and the job has successfully completed, the results that are printed to your terminal or Command Prompt will contain the processed data within the elements array, for example:

{
    "partition_id": "<job-id>",
    "partition_status": "in_progress",
    "partition_status_message": "Started processing partition request",
    "elements": [
        {
            "type": "...",
            "element_id": "...",
            "text": "...",
            "metadata": {
                "...": "..."
            }
        },
        {
            "type": "...",
            "element_id": "...",
            "text": "...",
            "metadata": {
                "...": "..."
            }
        }
    ]   
}

By default, the JSON is printed without indenting or other whitespace. You can pretty-print the JSON output by using utilities such as jq in future command runs.

You can also pipe the JSON output to a local file by using the curl option -o, —output <file> in future command runs.

You can also call the Unstructured Partition Endpoint by using the Unstructured Python SDK or the Unstructured JavaScript/TypeScript SDK.