-
-
Notifications
You must be signed in to change notification settings - Fork 287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collections public dumps #513
base: master
Are you sure you want to change the base?
Conversation
Pinging @paramsingh instead of a review request which doesn't work… |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look reasonable to me, however, I'm wondering if it's a better idea to keep the collections in the same dump. different data dump files make it difficult for users to get into the project, or consume the data, is there a reason why we did the two dumps?
I didn't find any other way to not dump the private collections, other than create a dump without the collections and a dump with the public collections only (consisting of selected rows of three tables). Do you think I could just concatenate the two sql dump files? I must admit I haven't tried that. |
I did end up concatenating the two dump files, and it works like a charm, thanks @paramsingh ! I also realized my duplicated tables didn't have their foreign keys, and had to add them to the collection dump sql script. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rename tables to make it clear they are temp. Next, what happens with private collections? Will there be a follow up PR to handle those?
|
||
-- duplicate user_collection table with public collections only | ||
|
||
CREATE table if not exists public_user_collection (LIKE bookbrainz.user_collection INCLUDING ALL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to have these tables named starting with tmp_ to make it clear that they are dump tables. The clean-public-collection-dump-tables.sql script is a bit terrifying to read without the knowledge that these are not temp tables.
Create two dumps: one without any user collections and one with public user collections only. We can then import both of those files and bob's your uncle!
Replace the existing single file import with importing both the main and user collection dumps, after extracting them from the tarball. Modified instructions and links accordingly
In a transaction, with proper foreing keys, ready to be dumped.
& revert changes in instructions to that of a single .sql.bz2 file
55e4d8f
to
447600b
Compare
Problem
With the introduction of user collections, private collections are currently going to be exported in the dumps.
Solution
This PR aims to create a database dump without collections, and another dump with only the public collections.
For that purpose, we create temporary tables (ie
user_collection
->public_user_collection
) with the appropriate select statements to ignore private collections its items and collaborators, and dump those three tables to a file before removing them.We also want to rename the table names in the collections dump file once it has been created (for example to rename
bookbrainz.public_user_collection
->bookbrainz.user_collection