-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyIceberg Cookbook #1201
Comments
Copying over from community sync Cookbook suggestions
|
@kevinjqliu are you accepting contributions for this cookbook yet? Happy to help if so! |
Hi @shiv-io, yes, we're accepting contributions. We currently don't have a page set up for the cookbook yet. |
Hey! I'm creating a PoC using PyIceberg for a project. I'm quite interested in incremental processing. For this, what I've used before were MERGE operations to update the table (I was using Delta with Spark at the time) with data from a DataFrame. Is this possible yet? Something similar would be overwrite + overwrite_filter, but I can't really use that with a DataFrame, I'd have to pass it as a string, right? And in that case, a IN clause with thousands of IDs would deteriorate performance |
hey @francocalvo
The writes work with pyarrow tables and dataframe. Im don't think you need to pass as string
It depends on the exact logic. But we do some optimizations such as filter pushdowns to speed up reads and writes |
Thank you for the prompt answer!
Yes, what I mean is when I need to update an Iceberg table using a Arrow table. In other cases I used a MERGE with a WHEN MATCHED UPDATE clause. This allowed me to 'soft-delete' old versions (It's a SCD Type 2 table). In some cases, I need to update +10k rows in one go, and match them based on an ID. In any case, I'm glad this exists and hope the cookbook creates a good starting point for people that are trying this out. |
Feature Request / Improvement
It was brought up at the recent community sync that we should start a cookbook to capture different use cases with PyIceberg. Similar to the Tabular Iceberg cookbook
Starting this issue to track the creation of the cookbook. And more importantly, what items people would like to see to be included in the cookbook.
Feel free to add suggestions below.
The text was updated successfully, but these errors were encountered: