Documents AI: Difference between revisions

From 3B Knowledge
Jump to navigation Jump to search
(Created page with "== Intro == Version 5.11 of 3B Forms introduces the new AI OCR capability right in Forms. Admins can now build forms and request files to be uploaded and through natural language instructions, they can extract and modify information that would otherwise need to be inputted by hand. == Setup == Head over to Setup -> Settings -> 3B Forms General Settings and set the field 3B AI Endpoint to https://x3b-ai-76854d6fa2eb.herokuapp.com/extractImage...")
 
No edit summary
Line 1: Line 1:
== Intro ==
== Intro ==
Version 5.11 of [[Changelog 3B Forms v5.11|3B Forms]] introduces the new AI OCR capability right in Forms. Admins can now build forms and request files to be uploaded and through natural language instructions, they can extract and modify information that would otherwise need to be inputted by hand.
Version 5.11 of [[Changelog 3B Forms v5.11|3B Forms]] introduces the new AI Vision capability right within Forms. With VisionAI you can use images as input prompts, and generate responses based on the data inside those images.
 
In simple terms, Admins (form builder users) can create forms that can accept an image, provide natural language instructions and kick back and watch how AI can OCR (use optical recognition) the data from these images. This


== Setup ==
== Setup ==
Head over to Setup -> Settings -> 3B Forms General Settings and set the field 3B AI Endpoint to https://x3b-ai-76854d6fa2eb.herokuapp.com/extractImage
Head over to Setup -> Settings -> '''3B Forms General Settings''' and set the field 3B AI Endpoint to https://x3b-ai-76854d6fa2eb.herokuapp.com/extractImage. This is all the setup required to enable Vision AI in forms.
 
== Use Cases ==
There are many use cases for our Vision AI capability - from validating ID documents, extracting validity dates, analyzing document authenticity all the way to parsing CVs, Invoices, Timesheets and even Shift Imports.


== Usage ==
== How To ==
[[File:File Upload AI.png|thumb|File Upload Component]]
[[File:File Upload AI.png|thumb|File Upload Component]]
Simply drag and drop a File question type on a form, either in a parent, child or contextual object scope. Then, simply click on "Use documents AI" and configure the options that will appear.
To start using Vision AI, simply drag and drop a File question type on a form. The File component is context aware, so you can place it in the top level (context object) or inside of a Parent/Child panel. Based on the context where the File question is added, you will be able to select the context object's field(s) to prompt against. So, if you are building a form that validates a certificate of type, add the Child Panel with all of the fields you want pre-populated and drop the File question inside of that panel. Then simply add a prompt to each field you want Vision AI to assist in pre-populating.  
 
 
There are a few options under the new AI tab:
[[File:AI Prompt Config.png|thumb|AI Prompt Config]]
[[File:AI Prompt Config.png|thumb|AI Prompt Config]]
'''Require Manual Interaction''' - this option will create a button that the user can click on to extract the data from the document provided. By default, the AI will scan the document as soon as it is selected
'''Require Manual Interaction''' - this option will create a button that the user <u>has to click</u> on to initiate Vision AI. The default behaviour is that this option is disabled and Vision AI will be triggered as soon as a file is uploaded.
 
'''General Prompt''' - this is a text-based general prompt, instructing the Vision AI agent. Prompt engineering here is important to ensure that the AI stays within boundaries.
 
'''Extraction Fields''' - this table allows you to pick a field (contextually) and provide a prompt. The prompt is contextualized to the field only. Make sure to tell the AI the expected results and field restrictions (e.g. max field length, whether you expect a boolean, string or date response and the format of that response)


'''General Prompt''' - this is the general instruction to the AI agent. Through A/B testing, you can finetune the behaviour of the agent
== Sample Field Extraction Prompts ==


'''Extraction Fields''' - these are the mappings to the form fields. Each form field can have a related prompt which will instruct the AI on what to do
* "Find the birth date on the document and return it in YYYY-MM-DD format. If no birth date is found on the document, return null"
* "Analyze the document provided and if it looks like an international Passport, return a boolean true, otherwise return a boolean false"
* "The document uploaded should be a forklift license. If the document looks like a forklift training/certificate/driving license, return a boolean true, otherwise return a boolean false"
* "The document should have the names of the person (holder). Extract the First Name and normalize it by ensuring proper letter spacing, capitalization and remove any special characters"
* "The document provided should be a CV. Return a comma separated list of employment skills you think are relevant based on the provided CV"
* "The provided document should be some type of identity document. If the document is a passport, return "Passport", if the document is EU Id Card, return "Identity Card", if the document is a biometric residence permit, return "Biometric Residence Card", if the document is a birth certificate, then return "Birth Certificate" and in all other cases, return "Unknown""


== Tips ==
== Tips ==
 
* In Extraction Fields, be sure to define the expected response type - string, date, boolean or something else.
* In extraction fields, be careful what the expected field type/format should be. So, if you want to extract a birthdate, you need to ask the AI to return the results in YYYY-MM-DD
* If you want the AI to check a checkbox, ask it to return a boolean true
* If you want the AI to check a checkbox, ask it to return the results in a "boolean true" format
* You can ask the AI to format and normalize values - e.g. ask it to normalize names, remove odd characters, spacing etc
* You can ask the AI to format and normalize values - e.g. ask it to normalize names, remove odd characters, spacing etc
* You can ask the AI to analyse the image and categorize it. This is especially useful if you want users to upload a specific image type
* You can ask the AI to analyse the image and categorize it. This is especially useful if you want users to upload a specific image type


== Restrictions ==
== Restrictions ==
Although the AI is capable of reading any file type, please restrict these to images. Update the '''Accepted file types''' property to "image/*" which will enable the user to upload images only
Although Vision AI is capable of reading any file type (pdfs, word documents, images and other specialised files), please restrict the input file format to images only until further notice.


== Metering ==
You can set the '''Accepted file types''' property to '''image/*''' to restrict file input formats against the File question type.
This is a new service provided by 3B and it is exclusive to our paying customers.  
 
== Prompt Engineering ==
As with most AI systems, the power of this tool is capped to your ability to engineer your prompts. Clear, well defined prompts will ensure data quality and reliability.
 
== Limitations ==
Version 5.11 of 3B Forms <u>only supports contextual data extraction</u> with single layer field mapping (i.e. we can only extract and write to the fields of the contextual object where the File question is embedded).
 
A future release will add the capability to extract data into repeatable records (like experiences, skills, employment and education and more). This article will be updated once this feature has been released.
 
== Limits ==
 
* Medical images: The AI is not suitable for interpreting specialized medical images like CT scans and shouldn't be used for medical advice.
* Non-English: The AI may not perform optimally when handling images with text of non-Latin alphabets, such as Japanese or Korean.
* Small text: Enlarge text within the image to improve readability, but avoid cropping important details.
* Rotation: The AI may misinterpret rotated or upside-down text and images.
* Visual elements: The AI may struggle to understand graphs or text where colors or styles—like solid, dashed, or dotted lines—vary.
* Spatial reasoning: The AI struggles with tasks requiring precise spatial localization, such as identifying chess positions.
* Accuracy: The AI may generate incorrect descriptions or captions in certain scenarios.
* Image shape: The AI struggles with panoramic and fisheye images.
* Metadata and resizing: The AI doesn't process original file names or metadata, and images are resized before analysis, affecting their original dimensions.
* Counting: The AI may give approximate counts for objects in images.
* CAPTCHAS: For safety reasons, our system blocks the submission of CAPTCHAs.
 
== Testing ==
Testing your form can only be done once the form is published. Vision AI is disabled within the builder (in the Preview tab).  


== Security ==
== Security ==
Data is processed off platform in a LLM container. As such, the data sent to the AI will be in transit and will be routed to Heroku (Salesforce owned PaaS). We will NOT retain the files sent over to the AI, nor will we record the results of the AI requests. However, we will collect usage data in order to analyze utilization and adoption. The data we collect includes:
Vision AI is not a Salesforce native tool. We use off platform processing, meaning that the uploaded files are sent to specialized Heroku servers that run a LLM (large language model). Although data is encrypted in transit, it is still in transit, meaning it is sent outside of Salesforce. Heroku is a Salesforce PaaS (platform as a service), so most if not all compliance certifications still apply (e.g. SOC1/SOC2, ISO certifications, Hippa and GDPR).
 
The requests (files) sent to our service are analyzed and processed only during the lifecycle of the transaction. We do not retain a copy of the files sent to our service, however we do collect usage analytics of the service which include (and not limited to):
 
- '''Org Id''' - we log the Salesforce's Organizaition Identifier that made the request


- Org Id that performed the request
- '''Date/Time of the request'''


- Date/Time of the request
- '''User Id'''


- User Id that performed the request
- '''File size''' of the request


- File size of the request
- Analytics data such as '''response time''', '''memory usage''', and '''other technical details''' about the request <u>excluding</u> the actual request itself (so we never retain a copy of the files sent to our API)


- Analytics data such as response time, memory usage, and other technical details about the request excluding the actual request itself
== Costs and Pricing ==
This service is free to use for the first 500 requests per client/org. Pricing is yet to be determined, however 3B is planning on introducing a transaction based pricing.


== Data Processing Agreement ==
== Data Processing Agreement ==
By using this service, our clients agree to our standard DPA.  
By using this service, our clients agree to our standard DPA.


[[Category:3B Onboarding]]
[[Category:3B Onboarding]]

Revision as of 18:50, 19 March 2025

Intro

Version 5.11 of 3B Forms introduces the new AI Vision capability right within Forms. With VisionAI you can use images as input prompts, and generate responses based on the data inside those images.

In simple terms, Admins (form builder users) can create forms that can accept an image, provide natural language instructions and kick back and watch how AI can OCR (use optical recognition) the data from these images. This

Setup

Head over to Setup -> Settings -> 3B Forms General Settings and set the field 3B AI Endpoint to https://x3b-ai-76854d6fa2eb.herokuapp.com/extractImage. This is all the setup required to enable Vision AI in forms.

Use Cases

There are many use cases for our Vision AI capability - from validating ID documents, extracting validity dates, analyzing document authenticity all the way to parsing CVs, Invoices, Timesheets and even Shift Imports.

How To

File Upload Component

To start using Vision AI, simply drag and drop a File question type on a form. The File component is context aware, so you can place it in the top level (context object) or inside of a Parent/Child panel. Based on the context where the File question is added, you will be able to select the context object's field(s) to prompt against. So, if you are building a form that validates a certificate of type, add the Child Panel with all of the fields you want pre-populated and drop the File question inside of that panel. Then simply add a prompt to each field you want Vision AI to assist in pre-populating.


There are a few options under the new AI tab:

AI Prompt Config

Require Manual Interaction - this option will create a button that the user has to click on to initiate Vision AI. The default behaviour is that this option is disabled and Vision AI will be triggered as soon as a file is uploaded.

General Prompt - this is a text-based general prompt, instructing the Vision AI agent. Prompt engineering here is important to ensure that the AI stays within boundaries.

Extraction Fields - this table allows you to pick a field (contextually) and provide a prompt. The prompt is contextualized to the field only. Make sure to tell the AI the expected results and field restrictions (e.g. max field length, whether you expect a boolean, string or date response and the format of that response)

Sample Field Extraction Prompts

  • "Find the birth date on the document and return it in YYYY-MM-DD format. If no birth date is found on the document, return null"
  • "Analyze the document provided and if it looks like an international Passport, return a boolean true, otherwise return a boolean false"
  • "The document uploaded should be a forklift license. If the document looks like a forklift training/certificate/driving license, return a boolean true, otherwise return a boolean false"
  • "The document should have the names of the person (holder). Extract the First Name and normalize it by ensuring proper letter spacing, capitalization and remove any special characters"
  • "The document provided should be a CV. Return a comma separated list of employment skills you think are relevant based on the provided CV"
  • "The provided document should be some type of identity document. If the document is a passport, return "Passport", if the document is EU Id Card, return "Identity Card", if the document is a biometric residence permit, return "Biometric Residence Card", if the document is a birth certificate, then return "Birth Certificate" and in all other cases, return "Unknown""

Tips

  • In Extraction Fields, be sure to define the expected response type - string, date, boolean or something else.
  • If you want the AI to check a checkbox, ask it to return a boolean true
  • You can ask the AI to format and normalize values - e.g. ask it to normalize names, remove odd characters, spacing etc
  • You can ask the AI to analyse the image and categorize it. This is especially useful if you want users to upload a specific image type

Restrictions

Although Vision AI is capable of reading any file type (pdfs, word documents, images and other specialised files), please restrict the input file format to images only until further notice.

You can set the Accepted file types property to image/* to restrict file input formats against the File question type.

Prompt Engineering

As with most AI systems, the power of this tool is capped to your ability to engineer your prompts. Clear, well defined prompts will ensure data quality and reliability.

Limitations

Version 5.11 of 3B Forms only supports contextual data extraction with single layer field mapping (i.e. we can only extract and write to the fields of the contextual object where the File question is embedded).

A future release will add the capability to extract data into repeatable records (like experiences, skills, employment and education and more). This article will be updated once this feature has been released.

Limits

  • Medical images: The AI is not suitable for interpreting specialized medical images like CT scans and shouldn't be used for medical advice.
  • Non-English: The AI may not perform optimally when handling images with text of non-Latin alphabets, such as Japanese or Korean.
  • Small text: Enlarge text within the image to improve readability, but avoid cropping important details.
  • Rotation: The AI may misinterpret rotated or upside-down text and images.
  • Visual elements: The AI may struggle to understand graphs or text where colors or styles—like solid, dashed, or dotted lines—vary.
  • Spatial reasoning: The AI struggles with tasks requiring precise spatial localization, such as identifying chess positions.
  • Accuracy: The AI may generate incorrect descriptions or captions in certain scenarios.
  • Image shape: The AI struggles with panoramic and fisheye images.
  • Metadata and resizing: The AI doesn't process original file names or metadata, and images are resized before analysis, affecting their original dimensions.
  • Counting: The AI may give approximate counts for objects in images.
  • CAPTCHAS: For safety reasons, our system blocks the submission of CAPTCHAs.

Testing

Testing your form can only be done once the form is published. Vision AI is disabled within the builder (in the Preview tab).

Security

Vision AI is not a Salesforce native tool. We use off platform processing, meaning that the uploaded files are sent to specialized Heroku servers that run a LLM (large language model). Although data is encrypted in transit, it is still in transit, meaning it is sent outside of Salesforce. Heroku is a Salesforce PaaS (platform as a service), so most if not all compliance certifications still apply (e.g. SOC1/SOC2, ISO certifications, Hippa and GDPR).

The requests (files) sent to our service are analyzed and processed only during the lifecycle of the transaction. We do not retain a copy of the files sent to our service, however we do collect usage analytics of the service which include (and not limited to):

- Org Id - we log the Salesforce's Organizaition Identifier that made the request

- Date/Time of the request

- User Id

- File size of the request

- Analytics data such as response time, memory usage, and other technical details about the request excluding the actual request itself (so we never retain a copy of the files sent to our API)

Costs and Pricing

This service is free to use for the first 500 requests per client/org. Pricing is yet to be determined, however 3B is planning on introducing a transaction based pricing.

Data Processing Agreement

By using this service, our clients agree to our standard DPA.