-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Graphrag integration #4612
base: main
Are you sure you want to change the base?
Graphrag integration #4612
Conversation
hi @lspinheiro - this is exciting. its also marked as DRAFT in the subject line but not marked as such in the PR - I'm marking as draft and please set it back by clicking Ready to Review when you are ready. |
Exciting to see this!! I love the tool idea. The tool itself can also be stateful and shared by multiple agents. |
Thanks @ekzhu and @rysweet . This should be ready for review now. Still needs improvements as mentioned in the description, but the tools can be used. I used the following test script. import asyncio
from autogen_core import CancellationToken
from autogen_ext.models.openai import AzureOpenAIChatCompletionClient
from autogen_ext.tools.graphrag import (
GlobalSearchTool,
LocalSearchTool,
GlobalDataConfig,
LocalDataConfig,
EmbeddingConfig,
)
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
async def main():
openai_client = AzureOpenAIChatCompletionClient(
model="gpt-4o-mini",
azure_endpoint="https://<resource-name>.openai.azure.com",
azure_deployment="gpt-4o-mini",
api_version="2024-08-01-preview",
azure_ad_token_provider=get_bearer_token_provider(DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default")
)
# Global search example
global_config = GlobalDataConfig(
input_dir="./autogen-test/ragtest/output"
)
global_tool = GlobalSearchTool.from_config(
openai_client=openai_client,
data_config=global_config
)
global_args = {
"query": "What does the station-master says about Dr. Becher?"
}
global_result = await global_tool.run_json(global_args, CancellationToken())
print("\nGlobal Search Result:")
print(global_result)
# Local search example
local_config = LocalDataConfig(
input_dir="./autogen-test/ragtest/output"
)
embedding_config = EmbeddingConfig(
model="text-embedding-3-small",
api_base="https://<resource-name>.openai.azure.com",
deployment_name="text-embedding-3-small",
api_version="2023-05-15",
api_type="azure",
azure_ad_token_provider=get_bearer_token_provider(DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"),
max_retries=10,
request_timeout=180.0,
)
local_tool = LocalSearchTool.from_config(
openai_client=openai_client,
data_config=local_config,
embedding_config=embedding_config
)
local_args = {
"query": "What does the station-master says about Dr. Becher?"
}
local_result = await local_tool.run_json(local_args, CancellationToken())
print("\nLocal Search Result:")
print(local_result)
if __name__ == "__main__":
asyncio.run(main()) |
@jackgerrits , I had to add |
Thank you! More documentation would help me review this PR. I would like to be able to build the docs page on this PR and see the example. |
python/packages/autogen-ext/src/autogen_ext/tools/graphrag/_model_adapter.py
Outdated
Show resolved
Hide resolved
Related #4438 |
@gagb , I added a sample with a readme and some docstrings that should help with the review. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #4612 +/- ##
==========================================
+ Coverage 68.61% 72.42% +3.80%
==========================================
Files 156 112 -44
Lines 10053 6494 -3559
==========================================
- Hits 6898 4703 -2195
+ Misses 3155 1791 -1364
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
python/packages/autogen-ext/src/autogen_ext/tools/graphrag/_global_search.py
Outdated
Show resolved
Hide resolved
python/packages/autogen-ext/src/autogen_ext/tools/graphrag/_local_search.py
Outdated
Show resolved
Hide resolved
…obal_search.py Co-authored-by: Eric Zhu <[email protected]>
…cal_search.py Co-authored-by: Eric Zhu <[email protected]>
Let's add some unit tests? See the code coverage result. Is it possible to run a simple set up procedure with mini data set, perhaps generated? |
How much data do you think it is ok to add? I think the sherlock holmes book generates roughly 10mb of data between the parquet and vector db files. I can try to look into something smaller but I dont know how to estimate output files from input in graphrag so hard to say how much size I need to store in the repo as test data files |
How about a text file with 10 sentences? What is the size of the index? |
Is there a mirror we can fetch it from instead of including it in the repo? |
@ekzhu @jackgerrits , I added the data in a conftest file. Since we are mocking the LLM calls it wont matter as much |
python/packages/autogen-ext/src/autogen_ext/tools/graphrag/_global_search.py
Outdated
Show resolved
Hide resolved
python/packages/autogen-ext/src/autogen_ext/tools/graphrag/_local_search.py
Outdated
Show resolved
Hide resolved
|
||
async def main(): | ||
# Initialize the OpenAI client | ||
openai_client = AzureOpenAIChatCompletionClient( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the recent addition of component configuration now we can use a config file to specify the endpoints.
See another sample: https://github.com/microsoft/autogen/tree/main/python/samples/core_chess_game
…obal_search.py Co-authored-by: Eric Zhu <[email protected]>
…cal_search.py Co-authored-by: Eric Zhu <[email protected]>
Co-authored-by: Eric Zhu <[email protected]>
Why are these changes needed?
This PR adds initial integration between graphrag and autogen by exposing local and global search as tools that can be used in
autogen-agentchat
. To be followed up with a user-guide/cookbook. I I added no tests because the test data I used was fairly large and I'm not sure we have a stablished way to add tests for those more complex integrations but there is a script below that I used. The indexing needs to be done in graphrag first, the goal is to illustrate the e2e steps in a notebook.Would appreciate some initial feedback, hoping to gradually extend with more flexible configuration, integration of drift search and examples.
Related issue number
Checks