Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot load table thru glue catalog #1501

Open
1 of 3 tasks
xpj01 opened this issue Jan 9, 2025 · 3 comments
Open
1 of 3 tasks

cannot load table thru glue catalog #1501

xpj01 opened this issue Jan 9, 2025 · 3 comments

Comments

@xpj01
Copy link

xpj01 commented Jan 9, 2025

Apache Iceberg version

0.8.1 (latest release)

Please describe the bug 🐞

I can load the glue catalog and list table from it, but got failed once load the table.
I searched for the previous issues like
#515
#892

I tired with those fixes but still not working for me.

Here is my sample code.

I tried to put all the potential properties into it.

catalog = load_catalog(
        "glue",
        **{
            "type": "glue",
            "profile_name": "default",
            "glue.profile-name": "default",
            "glue.access-key-id": aws_credentials['aws_access_key_id'],
            "glue.secret-access-key": aws_credentials['aws_secret_access_key'],
            "glue.session-token": aws_credentials.get('aws_session_token', None),
            "glue.region": "us-west-2",
            "s3.access-key-id": aws_credentials['aws_access_key_id'],
            "s3.secret-access-key": aws_credentials['aws_secret_access_key'],
            "s3.session-token": aws_credentials.get('aws_session_token', None),
            "s3.region": "us-west-2",
            "aws_access_key_id": aws_credentials['aws_access_key_id'],
            "aws_secret_key_id": aws_credentials['aws_access_key_id'],
            "aws_secret_access_key": aws_credentials['aws_secret_access_key'],
            "aws_session_token": aws_credentials.get('aws_session_token', None),
            "region_name": "us-west-2",
            "warehouse": "s3://<bucket>/<warehouse folder>",
            "client.region": "us-west-2",
            "client.access-key-id": aws_credentials['aws_access_key_id'],
            "client.secret-access-key": aws_credentials['aws_secret_access_key'],
            "client.session-token": aws_credentials.get('aws_session_token', None)  # Use get() in case token doesn't exist
        }
    )

List tables works well for me

    tables = catalog.list_tables("<db name>")
    for table in tables:
        print(table)

Load table doesn't work
table = catalog.load_table("<db name>.<table name>")

trace

Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.12/site-packages/airflow/models/taskinstance.py", line 762, in _execute_task
result = _execute_callable(context=context, **execute_callable_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/airflow/models/taskinstance.py", line 733, in _execute_callable
return ExecutionCallableRunner(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/airflow/utils/operator_helpers.py", line 252, in run
return self.func(*args, *kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/airflow/models/baseoperator.py", line 422, in wrapper
return func(self, args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/airflow/operators/python.py", line 238, in execute
return_value = self.execute_callable()
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/airflow/operators/python.py", line 256, in execute_callable
return runner.run(*self.op_args, self.op_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/airflow/utils/operator_helpers.py", line 252, in run
return self.func(*args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/airflow/dags/iceberg_dag.py", line 61, in append_to_iceberg
table = catalog.load_table("
")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/pyiceberg/catalog/glue.py", line 472, in load_table
return self._convert_glue_to_iceberg(self._get_glue_table(database_name=database_name, table_name=table_name))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/pyiceberg/catalog/glue.py", line 295, in _convert_glue_to_iceberg
metadata = FromInputFile.table_metadata(file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/pyiceberg/serializers.py", line 113, in table_metadata
with input_file.open() as input_stream:
^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.12/site-packages/pyiceberg/io/pyarrow.py", line 262, in open
input_file = self._filesystem.open_input_file(self._path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pyarrow/_fs.pyx", line 789, in pyarrow._fs.FileSystem.open_input_file
File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
OSError: When reading information for key 'iceberg_test/metadata/00004-804bd1cf-09b6-48b4-8d8b-4e332d971b13.metadata.json' in bucket '
': AWS Error ACCESS_DENIED during HeadObject operation: No response body.

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time
@kevinjqliu
Copy link
Contributor

OSError: When reading information for key 'iceberg_test/metadata/00004-804bd1cf-09b6-48b4-8d8b-4e332d971b13.metadata.json' in bucket '': AWS Error ACCESS_DENIED during HeadObject operation: No response body.

this is likely a permission issue related to AWS resources. Specifically, the credentials passed to S3 FileIO does not have the proper permission to access the file.

            "s3.access-key-id": aws_credentials['aws_access_key_id'],
            "s3.secret-access-key": aws_credentials['aws_secret_access_key'],
            "s3.session-token": aws_credentials.get('aws_session_token', None),
            "s3.region": "us-west-2",

https://py.iceberg.apache.org/configuration/#s3

@xpj01
Copy link
Author

xpj01 commented Jan 10, 2025

Hi @kevinjqliu ,

I used the same credential to upload and download file to the same bucket with S3Hooks which is working. The permission settings is different from pyiceberg?

  s3_hook.load_bytes(
        bytes_data=parquet_buffer.getvalue(),
        key=file_name,
        bucket_name='<bucket>',
        replace=True
    )

@xpj01
Copy link
Author

xpj01 commented Jan 10, 2025

Further I used both the temporary key(with session token) and long term key(w/o session token). Both are not working.
I also compared the owner id with mine, which is matching.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants