Amazon Kendra is an intelligent search service powered by machine learning (ML), enabling organizations to provide relevant information to customers and employees, when they need it.
Amazon Kendra uses ML algorithms to enable users to use natural language queries to search for information scattered across a company’s various data sources, including commonly used document storage systems such as Microsoft OneDrive.
OneDrive is an online cloud storage service that lets you host your content and automatically sync it across multiple devices. Amazon Kendra can index document formats such as Microsoft OneNote, HTML, PDF, Microsoft Word, Microsoft PowerPoint, Microsoft Excel, Rich Text, JSON, XML, CSV, XSLT, and plain text.
We’re excited to announce that we’ve updated the OneDrive for Amazon Kendra plugin to add even more capabilities. For example, we’ve added support for searching OneNote documents. Additionally, you can now choose to use identity information or ACLs to make your searches more granular.
The plugin helps index documents and their access control information to limit search results to only documents that the user can access. To display search results based on user access rights and using only user information, the plugin provides an identity tracker to load key information such as user and group maps to a main warehouse.
In this post, we demonstrate how to set up multiple data sources in Amazon Kendra to provide a central place to search your document repository.
Solution overview
For our solution, we demonstrate how to index a OneDrive repository or folder using the Amazon Kendra plugin for OneDrive. The solution consists of the following steps:
- Create and configure an application in the Microsoft Azure Portal and obtain authentication credentials.
- Create a OneDrive data source using the Amazon Kendra console.
- Index the data in the OneDrive repository.
- Run a sample query to get the information.
- Filter the query by users or groups.
Prerequisites
To test the Amazon Kendra Plugin for OneDrive, you need the following:
Configure an Azure application and assign connection permissions
Before setting up the OneDrive data source, we need some details about the OneDrive repository. Complete the following steps:
- Sign in to Azure.
- After logging in with your account credentials, choose Application logsthen choose New registration.
- Give your app a suitable name and register it.
- Collect the information about customer ID, tenant ID and other application details.
- To get a client secret, choose Add a certificate or secret under Customer Credentials.
- choose New customer secret and provide appropriate description and expiration.
- Note the client-id, tenant-id, and secret-id values. We use them to authenticate the OAuth2 application.
- Navigate to Appchoose API permissions in the navigation panel and choose Add a permission.
- choose Microsoft Graph.
- Under Application permissionscome in Dossier in the search bar and in Files, select Files.Read.All.
- choose Add permissions
- Similarly, add the following permissions to the file Microsoft Graph option for the app you created:
Group.Read.All
Notes.Read.All
Upon completion, the API permissions will look like the screenshot below.
Set up the Amazon Kendra plugin for OneDrive
To configure the Amazon Kendra plugin, follow these steps:
- In the Amazon Kendra console, choose Create Index.
- For index name, enter a name for the index (for example
my-onedrive-index
). - Enter an optional description.
- choose Create a new role.
- For Role Nameenter an IAM role name.
- Configure optional encryption options and tags
- choose next
- In the Configure user access control section, select Yes under Access control configurations
- For Token write, choose JSON in the drop-down menu.
- Leave the remaining values as defaults.
- choose next
Before moving on to the next configuration step, we need to provide Amazon Kendra with a role that has the necessary permissions to connect to the site. These include permission to obtain and decrypt the AWS Secrets Manager secret that contains the application ID and secret key needed to connect to the OneDrive site.
- Open another tab for your AWS account, and in the IAM console, navigate to the role you created earlier (for example
AmazonKendra-us-west-2-onedrive
). - choose Add permissions i Create a policy online.
- For servicechoose Kendra.
- For Shareschoose write and specify
BatchPutDocument
. - For Resourceschoose All resources.
- choose Review Policy.
- For First nameenter a name (eg
BatchPutPolicy
). - choose Create a policy.
- Add this policy to the role you created.
- Also, attach the
SecretsManagerReadWrite
AWS managed the policy for the role - Return to the Amazon Kendra tab.
- Select Developer Edition and choose To create.
This creates and propagates the IAM role and then creates the Amazon Kendra index, which can take up to 30 minutes.
- Return to the Amazon Kendra console, choose Data sources in the navigation panel and choose Add a data source.
- Under OneDrive Plugin V2.0choose Add a plugin.
- For Name of the data sourceenter a name (eg
my-onedrive
). - Enter an optional description.
- choose next.
- For OneDrive tenant IDenter the tenant ID you collected earlier.
- For Configure VPC and security groupleave the default value (No VPC).
- keep Identity tracker is on selected This imports identity information into the index.
- For the IAM role, choose Create a new role.
- Enter a role name, such as
AmazonKendra-us-west-2-onedrive
then choose next. - In the authentication section, choose Create and add a secret.
- Create a secret with
clientId
iclientSecret
as keys - Add their respective values with the information you gathered earlier.
- choose next.
- In the Configure sync settings section, add the OneDrive users whose documents you want to index.
- Select the synchronization mode for the index. For this post, we select Synchronization of new, modified or deleted content.
- Choose the indexing frequency as Run on demandthen choose next
Field maps allow you to establish the searchability and relevance of fields. For example, the lastUpdatedAt
The field can sort or increase the ranking of documents based on recent update.
- Keep all default values in the file Establish field maps section and choose next.
- On the review page, choose Add a data source
- choose Sync now
Synchronization may take up to 30 minutes to complete.
Try the solution
Now that you’ve indexed your OneDrive content, you can test it by checking the index.
- Go to your index in the Amazon Kendra console and choose Search for indexed content in the navigation pane.
- Enter a search term and press come in.
Note that without a token, ACLs prevent a search result from being returned.
- expand Test query with an access token and choose Apply token.
- Enter the appropriate token with a user who has permissions to read the file and choose To apply.
- Search again for the information present in OneDrive.
You can verify that Amazon Kendra presents classified results as expected.
Congratulations, you’ve configured Amazon Kendra to index and search documents in OneDrive and control access to them using ACLs.
conclusion
With the Microsoft OneDrive V2 Plugin for Amazon Kendra, organizations can access commonly used business document stores, securely using intelligent search powered by Amazon Kendra. You can enhance the search experience by integrating the data source with the Custom Document Enrichment (CDE) capability in Amazon Kendra to perform additional attribute mapping logic and even custom content transformation during ingestion
About the authors
Pravinchandra Varma is a Senior Customer Delivery Architect with the AWS Professional Services team and is passionate about machine learning applications and AI services.
Supratim Barat is a software developer engineer with the AWS Kendra Yellowbadge Team and is a blockchain and cybersecurity enthusiast
Ikaroa is proud to announce Microsoft’s updated OneDrive connector (V2) for Amazon Kendra. Amazon Kendra is a highly intelligent search service that uses natural language processing (NLP) to deliver accurate search results. With the new OneDrive connector, customers can now search for, find, and access their Microsoft OneDrive content quickly and reliably, unlocking value in all their data stored in the cloud.
The updated version of the OneDrive connector provides a user-friendly experience to quickly search, discover, and access their OneDrive content. This connector works with Kendra’s automatic document labeling feature, which identifies all types of documents stored in a user’s OneDrive storage. This powerful feature helps customers quickly find what they need in minutes, instead of hours.
The OneDrive connector for Amazon Kendra delivers a wide range of benefits for our customers. With powerful search capabilities, it helps customers to find the information they need quickly and securely. Additionally, it provides a secure connection for customers to instantly transfer data from their OneDrive account to the Kendra search index in the cloud.
This announcement builds upon the promise of seamless integration between Microsoft and Amazon, with both companies committed to investing in and innovating on AI and machine learning technologies.
At Ikaroa, we remain committed to helping our customers unlock the power of their data, allowing them to access and organize their content quickly and securely. We are proud to be part of this announcement, and we look forward to seeing the results of this collaboration.