Documents AI

From 3B Knowledge
Revision as of 19:02, 19 March 2025 by Admin (talk | contribs) (→‎Limits)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Intro

Version 5.11 of 3B Forms introduces the new AI Vision capability right within Forms. With VisionAI you can use images as input prompts, and generate responses based on the data inside those images.

In simple terms, Admins (form builder users) can create forms that can accept an image, provide natural language instructions and kick back and watch how AI can OCR (use optical recognition) the data from these images. This

Setup

Head over to Setup -> Settings -> 3B Forms General Settings and set the field 3B AI Endpoint to https://x3b-ai-76854d6fa2eb.herokuapp.com/extractImage. This is all the setup required to enable Vision AI in forms.

Use Cases

There are many use cases for our Vision AI capability - from validating ID documents, extracting validity dates, analyzing document authenticity all the way to parsing CVs, Invoices, Timesheets and even Shift Imports.

How To

File Upload Component

To start using Vision AI, simply drag and drop a File question type on a form. The File component is context aware, so you can place it in the top level (context object) or inside of a Parent/Child panel. Based on the context where the File question is added, you will be able to select the context object's field(s) to prompt against. So, if you are building a form that validates a certificate of type, add the Child Panel with all of the fields you want pre-populated and drop the File question inside of that panel. Then simply add a prompt to each field you want Vision AI to assist in pre-populating.


There are a few options under the new AI tab:

AI Prompt Config

Require Manual Interaction - this option will create a button that the user has to click on to initiate Vision AI. The default behaviour is that this option is disabled and Vision AI will be triggered as soon as a file is uploaded.

General Prompt - this is a text-based general prompt, instructing the Vision AI agent. Prompt engineering here is important to ensure that the AI stays within boundaries. A good general prompt is "You are a helpful assistant that analyzes documents uploaded by prospective job applicants and extracts information from these documents."

Extraction Fields - this table allows you to pick a field (contextually) and provide a prompt. The prompt is contextualized to the field only. Make sure to tell the AI the expected results and field restrictions (e.g. max field length, whether you expect a boolean, string or date response and the format of that response)

Sample Field Extraction Prompts

  • "Find the birth date on the document and return it in YYYY-MM-DD format. If no birth date is found on the document, return null"
  • "Analyze the document provided and if it looks like an international Passport, return a boolean true, otherwise return a boolean false"
  • "The document uploaded should be a forklift license. If the document looks like a forklift training/certificate/driving license, return a boolean true, otherwise return a boolean false"
  • "The document should have the names of the person (holder). Extract the First Name and normalize it by ensuring proper letter spacing, capitalization and remove any special characters"
  • "The document provided should be a CV. Return a comma separated list of employment skills you think are relevant based on the provided CV"
  • "The provided document should be some type of identity document. If the document is a passport, return "Passport", if the document is EU Id Card, return "Identity Card", if the document is a biometric residence permit, return "Biometric Residence Card", if the document is a birth certificate, then return "Birth Certificate" and in all other cases, return "Unknown""

Tips

  • In Extraction Fields, be sure to define the expected response type - string, date, boolean or something else.
  • If you want the AI to check a checkbox, ask it to return a boolean true
  • You can ask the AI to format and normalize values - e.g. ask it to normalize names, remove odd characters, spacing etc
  • You can ask the AI to analyse the image and categorize it. This is especially useful if you want users to upload a specific image type
  • You could include language in your prompt to specify that you want to return empty parameters, or a specific sentence, if the AI detects that the input is incompatible with the task.
  • Results can still contain mistakes. If you see mistakes, try adjusting your instructions, providing examples in the system instructions, or splitting tasks into simpler subtasks.

Restrictions

Although Vision AI is capable of reading any file type (pdfs, word documents, images and other specialised files), please restrict the input file format to images only until further notice.

You can set the Accepted file types property to image/* to restrict file input formats against the File question type.

Prompt Engineering

As with most AI systems, the power of this tool is capped to your ability to engineer your prompts. Clear, well defined prompts will ensure data quality and reliability.

Write clear instructions

AI can’t read your mind. If outputs are too long, ask for brief replies. If outputs are too simple, ask for expert-level writing. If you dislike the format, demonstrate the format you’d like to see. The less the AI has to guess at what you want, the more likely you’ll get it. Tactics:

  • Include details in your query to get more relevant answers
  • Ask the AI to adopt a persona
  • Use delimiters to clearly indicate distinct parts of the input
  • Specify the steps required to complete a task
  • Provide examples
  • Specify the desired length of the output

Provide reference text

Language AI can confidently invent fake answers, especially when asked about esoteric topics or for citations and URLs. In the same way that a sheet of notes can help a student do better on a test, providing reference text to these AI models can help in answering with fewer fabrications.

Split complex tasks into simpler subtasks

Just as it is good practice in software engineering to decompose a complex system into a set of modular components, the same is true of tasks submitted to a LLM. Complex tasks tend to have higher error rates than simpler tasks. Furthermore, complex tasks can often be re-defined as a workflow of simpler tasks in which the outputs of earlier tasks are used to construct the inputs to later tasks.

Prompt examples

Worse Better
Summarize the document. Summarize the document in a single paragraph. Then write a markdown list of the document type, unique characteristics, any dates worth noting and whether the document provided looks authentic.

Limitations

Version 5.11 of 3B Forms only supports contextual data extraction with single layer field mapping (i.e. we can only extract and write to the fields of the contextual object where the File question is embedded).

A future release will add the capability to extract data into repeatable records (like experiences, skills, employment and education and more). This article will be updated once this feature has been released.

Limits

  • Medical images: The AI is not suitable for interpreting specialized medical images like CT scans and shouldn't be used for medical advice.
  • Non-English: The AI may not perform optimally when handling images with text of non-Latin alphabets, such as Japanese or Korean.
  • Small text: Enlarge text within the image to improve readability, but avoid cropping important details.
  • Rotation: The AI may misinterpret rotated or upside-down text and images.
  • Visual elements: The AI may struggle to understand graphs or text where colors or styles—like solid, dashed, or dotted lines—vary.
  • Spatial reasoning: The AI struggles with tasks requiring precise spatial localization, such as identifying chess positions.
  • Accuracy: The AI may generate incorrect descriptions or captions in certain scenarios.
  • Image shape: The AI struggles with panoramic and fisheye images.
  • Metadata and resizing: The AI doesn't process original file names or metadata, and images are resized before analysis, affecting their original dimensions.
  • Counting: The AI may give approximate counts for objects in images.
  • CAPTCHAS: For safety reasons, our system blocks the submission of CAPTCHAs.
  • Number of requests per minute: we restrict requests to 500 per minute per org, meaning you can have up to 500 file extractions every minute. Contact us to increase this limit.
  • File restrictions:
File types Size limits Other requirements
  • PNG (.png)
  • JPEG (.jpeg and .jpg)
  • WEBP (.webp)
  • Non-animated GIF (.gif)
  • Up to 20MB per image
  • No watermarks or logos
  • No text
  • No NSFW content
  • Clear enough for a human to understand

Testing

Testing your form can only be done once the form is published. Vision AI is disabled within the builder (in the Preview tab).

Security

Vision AI is not a Salesforce native tool. We use off platform processing, meaning that the uploaded files are sent to specialized Heroku servers that run a LLM (large language model). Although data is encrypted in transit, it is still in transit, meaning it is sent outside of Salesforce. Heroku is a Salesforce PaaS (platform as a service), so most if not all compliance certifications still apply (e.g. SOC1/SOC2, ISO certifications, Hippa and GDPR).

The requests (files) sent to our service are analyzed and processed only during the lifecycle of the transaction. We do not retain a copy of the files sent to our service, however we do collect usage analytics of the service which include (and not limited to):

- Org Id - we log the Salesforce's Organizaition Identifier that made the request

- Date/Time of the request

- User Id

- File size of the request

- Analytics data such as response time, memory usage, and other technical details about the request excluding the actual request itself (so we never retain a copy of the files sent to our API)

Costs and Pricing

This service is free to use for the first 500 requests per client/org. Pricing is yet to be determined, however 3B is planning on introducing a transaction based pricing.

Data Processing Agreement

By using this service, our clients agree to our standard DPA.