Introduction
This document provides best practices for customers preparing their SharePoint library as a Knowledge Base (KB) for use with AI solutions, such as those orchestrated by Powell Software. The focus is on structuring and maintaining the KB to ensure efficient integration and utilization.
1. Folder Structure and Document Organization
Hierarchical Organization: Structure folders hierarchically for easy navigation. Top-level folders could represent broad categories (e.g., HR, Sales, Marketing).
Consistent Naming Conventions: Use clear and consistent naming for folders and documents. Include relevant keywords for easy searchability.
Document Categorization: Categorize documents according to their purpose and content (e.g., policies, procedures, templates).
2. Document Quantity Management
Human Limit on Document Management: While there might not be a strict technical limit on the number of documents in an index, there is a practical 'human' limit. This is based on how many documents stakeholders can effectively manage, ensuring that the information is up-to-date and avoiding duplication.
Practical Limit: The manageable number often falls within a few dozen per language. This number allows for effective oversight and maintenance without overwhelming the responsible parties. Beyond this, the complexity of managing updates and ensuring consistency increases significantly, potentially affecting the quality and reliability of the information in the knowledge base.
Strategic Organization: It's important to organize these documents strategically. Consolidating information into comprehensive documents rather than numerous fragmented ones can help manage this limit effectively.
Note: In the current version of the product, when a document is flagged as useful to reply to the employee question, the entire index of the document is added to the prompt. Try to limit the document size to 2000 words at maximum (Around 3000 tokens).
3. Preferred File Types for Indexing
Optimal File Types: For creating a SharePoint library that is efficiently indexable, certain file types tend to yield better results. CSV, DOCX, and PDF are among the most effective for indexing and retrieval.
Other File Types (XLS, PPT): While XLS (Excel spreadsheets) and PPT (PowerPoint presentations) are technically indexable, they may not provide the same level of efficacy in terms of searchability and AI processing.
Note: In the current version of the product, images and other media contents associated with a document are not indexed.
4. Creating an Information Tree to Avoid Topic Duplication
In constructing a knowledge base (KB), particularly for AI applications like a search index or an AI-driven bot, it is crucial to avoid topic duplication across different documents. Duplication can lead to contradictions, outdated information, and difficulty retrieving accurate information. This section explains how to split topics into subtopics and create an information tree effectively.
Single Source Principle: Ensure each topic is covered in only one document. This eliminates contradictions and confusion when the AI system retrieves information
Topic Segmentation: Break down broader topics into specific subtopics. Each subtopic should be distinct and comprehensive.
Building an Information Tree: Develop a hierarchical structure that logically organizes information from general to specific
Consistency in Structure: Maintain a consistent format across the KB. This consistency helps users and AI systems to predict and locate information quickly.
Regular Audits: Periodically review the KB to identify and resolve any content duplications or contradictions
5. Document Structure and Length
Clarity and Brevity: Ensure all text is clear, direct, and concise. Avoid excessive wordiness that could detract from the main message.
Headings and Subheadings: Use descriptive and subheadings to organize content logically. This aids in readability and helps users quickly find the information they need.
Table Usage: Utilize tables for data presentation where appropriate. Tables should have clear, descriptive headers. For entries, prefer complete sentences or comprehensive phrases instead of simple "Yes/No" to provide more context.
Bullet Points and Numbered Lists: To enhance the structure and readability, use bullet points for unordered lists and numbered lists for sequences or steps.
Conciseness: Aim to keep documents concise, ideally under 5 pages. This length is optimal for ensuring information is digestible and can be processed effectively by AI technologies.
Segmentation of Complex Topics: Break down the content into smaller, focused documents for topics requiring extensive coverage. This approach can involve creating a series of linked documents.
Summary Sections: For longer documents, include a summary or abstract at the beginning to give an overview of the content.
Professional and Clear: Use a professional tone appropriate for a business environment. The language should be formal yet accessible.
Jargon-Free: Avoid industry jargon or technical terms that might not be easily understood by all employees. If technical terms are necessary, include definitions or explanations.