[Bug]: <title> ERROR Error executing verb "create_base_entity_graph" in create_base_entity_graph: 'name' #1603

Redhair957 · 2025-01-09T09:39:32Z

Do you need to file an issue?

I have searched the existing issues and this bug is not already filed.
My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.

Describe the bug

graphrag 0.9

Traceback (most recent call last):
File "/home/user/ENTER/envs/graphrag0.9/lib/python3.10/site-packages/datashaper/workflow/workflow.py", line 415, in _execute_verb
result = await result
File "/data/graphrag-dify/graphrag-0.9.0/graphrag/index/workflows/v1/subflows/create_base_entity_graph.py", line 47, in create_base_entity_graph
await create_base_entity_graph_flow(
File "/data/graphrag-dify/graphrag-0.9.0/graphrag/index/flows/create_base_entity_graph.py", line 58, in create_base_entity_graph
merged_entities = _merge_entities(entity_dfs)
File "/data/graphrag-dify/graphrag-0.9.0/graphrag/index/flows/create_base_entity_graph.py", line 119, in _merge_entities
all_entities.groupby(["name", "type"], sort=False)
File "/home/user/ENTER/envs/graphrag0.9/lib/python3.10/site-packages/pandas/core/frame.py", line 9183, in groupby
return DataFrameGroupBy(
File "/home/user/ENTER/envs/graphrag0.9/lib/python3.10/site-packages/pandas/core/groupby/groupby.py", line 1329, in init
grouper, exclusions, obj = get_grouper(
File "/home/user/ENTER/envs/graphrag0.9/lib/python3.10/site-packages/pandas/core/groupby/grouper.py", line 1043, in get_grouper
raise KeyError(gpr)
KeyError: 'name'
17:32:39,625 graphrag.callbacks.file_workflow_callbacks INFO Error executing verb "create_base_entity_graph" in create_base_entity_graph: 'name' details=None
17:32:39,628 graphrag.index.run.run ERROR error running workflow create_base_entity_graph
Traceback (most recent call last):
File "/data/graphrag-dify/graphrag-0.9.0/graphrag/index/run/run.py", line 260, in run_pipeline
result = await _process_workflow(
File "/data/graphrag-dify/graphrag-0.9.0/graphrag/index/run/workflow.py", line 103, in _process_workflow
result = await workflow.run(context, callbacks)
File "/home/user/ENTER/envs/graphrag0.9/lib/python3.10/site-packages/datashaper/workflow/workflow.py", line 369, in run
timing = await self._execute_verb(node, context, callbacks)
File "/home/user/ENTER/envs/graphrag0.9/lib/python3.10/site-packages/datashaper/workflow/workflow.py", line 415, in _execute_verb
result = await result
File "/data/graphrag-dify/graphrag-0.9.0/graphrag/index/workflows/v1/subflows/create_base_entity_graph.py", line 47, in create_base_entity_graph
await create_base_entity_graph_flow(
File "/data/graphrag-dify/graphrag-0.9.0/graphrag/index/flows/create_base_entity_graph.py", line 58, in create_base_entity_graph
merged_entities = _merge_entities(entity_dfs)
File "/data/graphrag-dify/graphrag-0.9.0/graphrag/index/flows/create_base_entity_graph.py", line 119, in _merge_entities
all_entities.groupby(["name", "type"], sort=False)
File "/home/user/ENTER/envs/graphrag0.9/lib/python3.10/site-packages/pandas/core/frame.py", line 9183, in groupby
return DataFrameGroupBy(
File "/home/user/ENTER/envs/graphrag0.9/lib/python3.10/site-packages/pandas/core/groupby/groupby.py", line 1329, in init
grouper, exclusions, obj = get_grouper(
File "/home/user/ENTER/envs/graphrag0.9/lib/python3.10/site-packages/pandas/core/groupby/grouper.py", line 1043, in get_grouper
raise KeyError(gpr)
KeyError: 'name'

{
"type": "error",
"data": "Error Invoking LLM",
"stack": "Traceback (most recent call last):\n File "/home/user/ENTER/envs/graphrag0.9/lib/python3.10/site-packages/fnllm/base/base.py", line 112, in call\n return await self._invoke(prompt, **kwargs)\n File "/home/user/ENTER/envs/graphrag0.9/lib/python3.10/site-packages/fnllm/base/base.py", line 128, in _invoke\n return await self._decorated_target(prompt, **kwargs)\n File "/home/user/ENTER/envs/graphrag0.9/lib/python3.10/site-packages/fnllm/services/json.py", line 71, in invoke\n return await delegate(prompt, **kwargs)\n File "/home/user/ENTER/envs/graphrag0.9/lib/python3.10/site-packages/fnllm/services/retryer.py", line 109, in invoke\n result = await execute_with_retry()\n File "/home/user/ENTER/envs/graphrag0.9/lib/python3.10/site-packages/fnllm/services/retryer.py", line 93, in execute_with_retry\n async for a in AsyncRetrying(\n File "/home/user/ENTER/envs/graphrag0.9/lib/python3.10/site-packages/tenacity/asyncio/init.py", line 166, in anext\n do = await self.iter(retry_state=self._retry_state)\n File "/home/user/ENTER/envs/graphrag0.9/lib/python3.10/site-packages/tenacity/asyncio/init.py", line 153, in iter\n result = await action(retry_state)\n File "/home/user/ENTER/envs/graphrag0.9/lib/python3.10/site-packages/tenacity/_utils.py", line 99, in inner\n return call(*args, **kwargs)\n File "/home/user/ENTER/envs/graphrag0.9/lib/python3.10/site-packages/tenacity/init.py", line 398, in \n self._add_action_func(lambda rs: rs.outcome.result())\n File "/home/user/ENTER/envs/graphrag0.9/lib/python3.10/concurrent/futures/_base.py", line 438, in result\n return self.__get_result()\n File "/home/user/ENTER/envs/graphrag0.9/lib/python3.10/concurrent/futures/_base.py", line 390, in __get_result\n raise self._exception\n File "/home/user/ENTER/envs/graphrag0.9/lib/python3.10/site-packages/fnllm/services/retryer.py", line 101, in execute_with_retry\n return await attempt()\n File "/home/user/ENTER/envs/graphrag0.9/lib/python3.10/site-packages/fnllm/services/retryer.py", line 78, in attempt\n return await delegate(prompt, **kwargs)\n File "/home/user/ENTER/envs/graphrag0.9/lib/python3.10/site-packages/fnllm/services/rate_limiter.py", line 70, in invoke\n result = await delegate(prompt, **args)\n File "/home/user/ENTER/envs/graphrag0.9/lib/python3.10/site-packages/fnllm/base/base.py", line 152, in _decorator_target\n output = await self._execute_llm(prompt, **kwargs)\n File "/home/user/ENTER/envs/graphrag0.9/lib/python3.10/site-packages/fnllm/openai/llm/chat_text.py", line 155, in _execute_llm\n completion = await self._call_completion_or_cache(\n File "/home/user/ENTER/envs/graphrag0.9/lib/python3.10/site-packages/fnllm/openai/llm/chat_text.py", line 127, in _call_completion_or_cache\n return await self._cache.get_or_insert(\n File "/home/user/ENTER/envs/graphrag0.9/lib/python3.10/site-packages/fnllm/services/cache_interactor.py", line 50, in get_or_insert\n entry = await func()\n File "/home/user/ENTER/envs/graphrag0.9/lib/python3.10/site-packages/openai/resources/chat/completions.py", line 1661, in create\n return await self._post(\n File "/home/user/ENTER/envs/graphrag0.9/lib/python3.10/site-packages/openai/_base_client.py", line 1843, in post\n return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)\n File "/home/user/ENTER/envs/graphrag0.9/lib/python3.10/site-packages/openai/_base_client.py", line 1537, in request\n return await self._request(\n File "/home/user/ENTER/envs/graphrag0.9/lib/python3.10/site-packages/openai/_base_client.py", line 1638, in _request\n raise self._make_status_error_from_response(err.response) from None\nopenai.APIStatusError: Error code: 402 - {'error': {'message': 'Insufficient Balance (request id: 2025010916490549354044674414679)', 'type': 'unknown_error', 'param': '', 'code': 'invalid_request_error'}}\n",
"source": "Error code: 402 - {'error': {'message': 'Insufficient Balance (request id: 2025010916490549354044674414679)', 'type': 'unknown_error', 'param': '', 'code': 'invalid_request_error'}}",
"details": {
"prompt": "\n-Goal-\nGiven a text document that is potentially relevant to this activity and a list of entity types, identify all entities of those types from the text and all relationships among the identified entities.\n\n-Steps-\n1. Identify all entities. For each identified entity, extract the following information:\n- entity_name: Name of the entity, capitalized\n- entity_type: One of the following types: [service_object, acceptance_conditions, processing_type, legal_processing_time, promised_processing_time, fee_status, consultation_method, implementing_authority, authority_nature, power_source, service_location, service_hours, administrative_level, processing_form, implementing_authority_code, service_object_type, application_materials, processing_flow, supervision_complaint_method, rights_and_obligations]\n- entity_description: Comprehensive description of the entity's attributes and activities\nFormat each entity as ("entity"<|><entity_name><|><entity_type><|><entity_description>)\n\n2. From the entities identified in step 1, identify all pairs of (source_entity, target_entity) that are clearly related to each other.\nFor each pair of related entities, extract the following information:\n- source_entity: name of the source entity, as identified in step 1\n- target_entity: name of the target entity, as identified in step 1\n- relationship_description: explanation as to why you think the source entity and the target entity are related to each other\n- relationship_strength: an integer score between 1 to 10, indicating strength of the relationship between the source entity and target entity\nFormat each relationship as ("relationship"<|><source_entity><|><target_entity><|><relationship_description><|><relationship_strength>)\n\n3. Return output in Chinese as a single list of all the entities and relationships identified in steps 1 and 2. Use ## as the list delimiter.\n4.Return Chinese format .\n\n5. When finished, output <|COMPLETE|>.\n\n-Examples-\n######################\n\nExample 1:\n\nentity_types: [service_object, acceptance_conditions, processing_type, legal_processing_time, promised_processing_time, fee_status, consultation_method, implementing_authority, authority_nature, power_source, service_location, service_hours, administrative_level, processing_form, implementing_authority_code, service_object_type, application_materials, processing_flow, supervision_complaint_method, rights_and_obligations]\ntext:\

Steps to reproduce

No response

Expected Behavior

No response

GraphRAG Config Used

encoding_model: cl100k_base
skip_workflows: []
llm:
  api_key: ${GRAPHRAG_CHAT_API_KEY}
  type: openai_chat # or azure_openai_chat
  model: ${GRAPHRAG_CHAT_MODEL}
  model_supports_json: false # recommended if this is available for your model.
  # audience: "https://cognitiveservices.azure.com/.default"
  # max_tokens: 4000
  # request_timeout: 180.0
  api_base: ${GRAPHRAG_API_BASE}
  # api_version: 2024-02-15-preview
  # organization: <organization_id>
  # deployment_name: <azure_model_deployment_name>
  # tokens_per_minute: 150_000 # set a leaky bucket throttle
  # requests_per_minute: 10_000 # set a leaky bucket throttle
  # max_retries: 10
  # max_retry_wait: 10.0
  # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
  # concurrent_requests: 25 # the number of parallel inflight requests that may be made
#  temperature: 1 # temperature for sampling
#  top_p: 0.8 # top-p sampling
  # n: 1 # Number of completions to generate

parallelization:
  stagger: 0.3
  # num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  # target: required # or all
  # batch_size: 16 # the number of documents to send in a single request
  # batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
  vector_store:
    type: lancedb
    db_uri: 'output/lancedb'
    container_name: default # A prefix for the vector store to create embedding containers. Default: 'default'.
    overwrite: true
  # vector_store: # configuration for AI Search
    # type: azure_ai_search
    # url: <ai_search_endpoint>
    # api_key: <api_key> # if not set, will attempt to use managed identity. Expects the `Search Index Data Contributor` RBAC role in this case.
    # audience: <optional> # if using managed identity, the audience to use for the token
    # overwrite: true # or false. Only applicable at index creation time
    # container_name: default # A prefix for the AzureAISearch to create indexes. Default: 'default'.
  llm:
    api_key: Xinference
    type: openai_embedding # or azure_openai_embedding
    model:  bge-m3-embeddings
    api_base: http://10.168.2.227:9997/v1
    # api_version: 2024-02-15-preview
    # audience: "https://cognitiveservices.azure.com/.default"
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    # max_retries: 10
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    # concurrent_requests: 25 # the number of parallel inflight requests that may be made

chunks:
  size: 1200
  overlap: 200
  group_by_columns: [id] # by default, we don't allow chunks to cross documents

input:
  type: file # or blob
  file_type: text # or csv
  base_dir: "input"
  file_encoding: utf-8
  file_pattern: ".*\\.txt$"

cache:
  type: file # or blob
  base_dir: "cache"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

storage:
  type: file # or blob
  base_dir: "output"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

#update_index_storage: # Storage to save an updated index (for incremental indexing). Enabling this performs an incremental index run
#   type: file # or blob
#   base_dir: "update_output"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

reporting:
  type: file # or console, blob
  base_dir: "logs"
  # connection_string: <azure_blob_storage_connection_string>
  # container_name: <azure_blob_storage_container_name>

entity_extraction:
  ## strategy: fully override the entity extraction strategy.
  ##   type: one of graph_intelligence, graph_intelligence_json and nltk
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/entity_extraction.txt"
  entity_types: [业务办理项编码, 实施主体, 事项名称, 实施主体编码,权力来源]
  max_gleanings: 1

summarize_descriptions:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/summarize_descriptions.txt"
  max_length: 500

claim_extraction:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  enabled: true
  prompt: "prompts/claim_extraction.txt"
  description: "Any claims or facts that could be relevant to information discovery."
  max_gleanings: 1

community_reports:
  ## llm: override the global llm settings for this task
  ## parallelization: override the global parallelization settings for this task
  ## async_mode: override the global async_mode settings for this task
  prompt: "prompts/community_report.txt"
  max_length: 2000
  max_input_length: 8000

cluster_graph:
  max_cluster_size: 10

embed_graph:
  enabled: false # if true, will generate node2vec embeddings for nodes
  # num_walks: 10
  # walk_length: 40
  # window_size: 2
  # iterations: 3
  # random_seed: 597832

umap:
  enabled: false # if true, will generate UMAP embeddings for nodes

snapshots:
  graphml: false
  raw_entities: false
  top_level_nodes: false

local_search:
  # text_unit_prop: 0.5
  # community_prop: 0.1
  # conversation_history_max_turns: 5
  # top_k_mapped_entities: 10
  # top_k_relationships: 10
  # llm_temperature: 0 # temperature for sampling
  # llm_top_p: 1 # top-p sampling
  # llm_n: 1 # Number of completions to generate
  # max_tokens: 12000

global_search:
  # llm_temperature: 0 # temperature for sampling
  # llm_top_p: 1 # top-p sampling
  # llm_n: 1 # Number of completions to generate
  # max_tokens: 12000
  # data_max_tokens: 12000
  # map_max_tokens: 1000
  # reduce_max_tokens: 2000
  # concurrency: 32

Logs and screenshots

No response

Additional Information

GraphRAG Version: 0.9
Operating System:
Python Version:3.10
Related Issues:

Redhair957 · 2025-01-09T09:40:10Z

ubuntu

Redhair957 added bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer labels Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: <title> ERROR Error executing verb "create_base_entity_graph" in create_base_entity_graph: 'name' #1603

[Bug]: <title> ERROR Error executing verb "create_base_entity_graph" in create_base_entity_graph: 'name' #1603

Redhair957 commented Jan 9, 2025

Redhair957 commented Jan 9, 2025

[Bug]: <title> ERROR Error executing verb "create_base_entity_graph" in create_base_entity_graph: 'name' #1603

[Bug]: <title> ERROR Error executing verb "create_base_entity_graph" in create_base_entity_graph: 'name' #1603

Comments

Redhair957 commented Jan 9, 2025

Do you need to file an issue?

Describe the bug

Steps to reproduce

Expected Behavior

GraphRAG Config Used

Logs and screenshots

Additional Information

Redhair957 commented Jan 9, 2025