Back

Unlock Insights from your Amazon S3 data with intelligent search

Amazon Kendra is an intelligent search service powered by machine learning (ML). Amazon Kendra reimagines enterprise search for your websites and apps so your employees and customers can easily find the content they’re looking for, even when it’s spread across multiple locations and content repositories in your organization. Keywords or natural language questions can be used to search for the most relevant documents based on ML to provide answers and classify the documents. Amazon Kendra can index data from Amazon Simple Storage Service (Amazon S3) or a third-party document repository. Amazon S3 is an object storage service that provides scalability and availability where you can store large amounts of data, including product manuals, project and research documents, and more.

In this post, you can learn how to implement a provided AWS CloudFormation template to index your documents in an Amazon S3 bucket. The template creates an Amazon Kendra data source for an index and synchronizes your data source according to your needs: on-demand, hourly, daily, weekly, or monthly. AWS CloudFormation allows us to deliver Infrastructure as Code (IaC) so you can spend less time managing resources, quickly replicate your infrastructure, and monitor and track infrastructure changes.

Solution overview

The CloudFormation template configures an Amazon Kendra data source with a connection to Amazon S3. The template also creates a role for the Amazon Kendra data source service. You can specify an S3 bucket, sync schedule, and include/exclude patterns. When the sync job is finished, you can search the indexed content through the search console. The diagram below illustrates this workflow.

This post will guide you through the following steps:

  1. Deploy the provided template.
  2. Upload the documents to the S3 bucket you create. If you provide a bucket with documents, you can skip this step.
  3. Wait until the index finishes crawling the data source.

Prerequisites

For this guide, you should have the following prerequisites:

  • An AWS account where the proposed solution can be deployed.
  • An Amazon Kendra index to attach a data source to the stack.
  • The set of documents used to build the Amazon Kendra index. In this solution, you are using an AWS whitepaper zip file.

Deploy the solution with AWS CloudFormation

To deploy the CloudFormation template, follow these steps:

  1. choose

You will be redirected to the AWS CloudFormation console.

  1. You can modify the parameters or use the default values:
    • The Amazon Kendra data source name is automatically set using the stack name and associated bucket name.
    • For KendraIndexIdenter the Amazon Kendra index ID where you will attach the data source.
    • You can also choose when to run data source synchronization using KendraSyncSchedule. By default, it is set to On demand.
    • For S3BucketName, you can enter a cube you’ve already created or leave it empty. If you leave it empty, a cube will be created for you. Either way, the bucket is used as the Amazon Kendra data source. For this post, we leave it empty.

The stack takes about 5 minutes to deploy the Amazon Kendra data source attached to the Amazon Kendra index.

  1. At the exits CloudFormation stack tab, copy the created cube name, data source name and id.

The created stack deploys a role: <stack-name>-KendraDataSourceRole. It is a best practice to deploy a role for each data source you create. This function allows the Amazon Kendra data source to add or remove files from the Amazon Kendra index to fetch objects from the Amazon S3 bucket.

Upload files to S3 bucket

Amazon Kendra can handle multiple document types, including .html, .pdf, .csv, .json, .docx, and .ppt. You can also have a combination of documents in a single index. The text contained in these documents is indexed in the provided Amazon Kendra index. You can keyword search AWS topics on best practices, databases, machine learning, security, and more using over 60 downloadable PDF files. For example, if you want to know where to find more information about caching in AWS white papers, Amazon Kendra can help you find documents related to databases and best practices.

When you download the AWS Whitepapers.zip file and unzip the file, you’ll see these six folders: Best_Practices, Databases, General, Machine_Learning, Security, Well_Architected. Upload these folders to your S3 bucket.

Sync your Amazon Kendra data feed

Data from the Amazon Kendra data source can sync your data based on a preconfigured schedule or can be triggered manually on demand. By default, the CloudFormation template configures the data source for an on-demand synchronization schedule to trigger manually as needed.

To manually trigger the sync job from the AWS Amazon Kendra console, navigate to the Amazon Kendra index used as part of the CloudFormation stack deployment, at Data management in the navigation pane, choose Data sources and then choose Sync now. This causes the S3 bucket to synchronize with the data source.

When the Amazon Kendra data feed starts to sync, you should see Current sync status how synchronization.

When the data source is finished, the Last sync state appears as right i Current sync status how idle. You can now search indexed content.

Set up the sync schedule

The template allows you to run the schedule every hour at minute 0, for example 13:00, 14:00 or 15:00. You also have the option to run it daily at 00:00 UTC. The weekly The configuration runs on Mondays at 00:00 UTC and on monthly configuration runs every first day of the month at 00:00 UTC.

To change the schedule after creating the Amazon Kendra data source, al Shares menu, choose edit. Under Configure sync settingsyou find the Programming of synchronization rules section

Under frequencyyou can select every hour, daily, weekly, monthlyor habitall this allows you to schedule your sync to the minute.

Add exclusion patterns

The provided CloudFormation template allows you to add exclusion patterns. By default, .png and .jpg files will be added to the file Exclusion patterns parameter Additional file formats can be added as a comma-separated list to the exclusion pattern. In the same way, Inclusion patterns The parameter can be used to add comma list file formats to configure an include pattern. If you do not provide an include pattern, all files are indexed except those included in the exclude parameter.

Clean up

To avoid costs, you can delete the AWS CloudFormation console stack. At the batteries page, select the stack you created, choose deleteand confirm the removal of the stack.

If you haven’t provided an S3 bucket, the stack creates one. If the bucket is empty, it is automatically removed. Otherwise, you will need to empty the folder and delete it manually. If you have provided a bucket, even if it is empty, it will not be deleted. The Amazon Kendra index will not be deleted. Only the Amazon Kendra data source created by the stack will be deleted.

conclusion

In this post, we provided a CloudFormation template to easily sync your text documents in an S3 bucket with your Amazon Kendra index. This solution is useful if you have multiple S3 buckets that you want to index because you can build all the components needed to query documents with a few clicks in a consistent and repeatable way. You can also see how image-based text documents can be managed in Amazon Kendra. For more information about specific programming patterns, see Programming Expressions for Rules.

Leave a comment and learn more about creating Amazon Kendra indexes in the next Amazon Kendra Essentials+ workshop.

Special thanks to Jose Mauricio Mani Yanez for his help in creating the example code and compiling the content of this post.


About the author

Rajesh Kumar Ravi is an AI/ML solutions architect at Amazon Web Services specializing in intelligent document search with Amazon Kendra and generative AI. He is a builder and problem solver, and contributes to the development of new ideas. He likes to walk and loves to go on short trips outside of work.

Source link
Introducing Ikaroa: a full stack tech company that is enabling customers to unlock and gain insights from their Amazon S3 data with intelligent search.

The Amazon S3 platform is the foundation of modern data storage. Organizations of all sizes utilize the Amazon S3 platform to store various data types and data sources. But, accessing, analyzing and understanding this data can be a challenge without the proper tools and processes in place.

Ikaroa is solving this challenge with an intelligent search tool that quickly and easily retrieves,analyzes and interprets the data stored in your Amazon S3 environment. The intuitive search interface allows users to type-in the specific data they are looking for and instantly see the information they need. So no matter the size or complexity of your environment, with intelligent search from Ikaroa, users can access and gain insights from the data stored in Amazon S3 in a fraction of the time.

Ikaroa’s powerful search engine makes it easier for organizations to access and gain insights from their Amazon S3 data. With advanced features like faceted search, users can narrow their search results and hone in on their specific queries quickly and easily. Furthermore, Ikaroa’s ability to search within documents (not just titles and keywords) means that customers can better analyze and understand the full context of their data.

Using Ikaroa’s intelligent search clients are able to rapidly make sense of the data stored in their Amazon S3 environment and unlock insights they need to best serve their customers. With user friendly search tools, clients are able to find and explore the data stored in their Amazon S3 environment with ease, gaining useful insights and revealing trends to drive their business forward.

When it comes to Amazon S3 data, Ikaroa makes it easy for clients to gain the insights they need. The company’s powerful intelligent search platform means that companies of any size are now able to unlock their data and gain insights from the information stored in their Amazon S3 environment quickly and easily. Try Ikaroa to see the power of intelligent search and unlock the insights that are hiding in your Amazon S3 data.

ikaroa
ikaroa
https://ikaroa.com

Leave a Reply

Your email address will not be published. Required fields are marked *