Skip to content

Releases: IQSS/dataverse

v6.5

12 Dec 21:10
d9cc5eb
Compare
Choose a tag to compare

Dataverse 6.5

Please note: To read these instructions in full, please go to https://github.com/IQSS/dataverse/releases/tag/v6.5 rather than the list of releases, which will cut them off.

This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project!

Release Highlights

Highlights for Dataverse 6.5 include:

  • new API endpoints, including editing of collections, Search API file counts, listing of exporters, comparing dataset versions, and auditing data files
  • UX improvements, especially Preview URLs
  • increased harvesting flexibility
  • performance gains
  • a security vulnerability addressed
  • many bug fixes
  • and more! Please see below.

Features Added

Private URL Renamed to Preview URL and Improved

The name of the URL that may be used by dataset administrators to share a draft version of a dataset has been changed from Private URL to Preview URL.

Also, additional information about the creation of Preview URLs has been added to the popup accessed via edit menu of the Dataset Page.

Users of the Anonymous Preview URL will no longer be able to see the name of the Dataverse that the dataset is in but will be able to see the name of the repository.

Any Private URLs created in previous versions of Dataverse will continue to work.

The old "privateUrl" API endpoints for the creation and deletion of Preview (formerly Private) URLs have been deprecated. They will continue to work but please switch to the "previewUrl" equivalents that have been documented in the API Guide.

See also #8184, #8185, #10950, #10961, and #11085.

Showing Differences Between Dataset Versions is More Scalable

Showing differences between dataset versions, which is done during dataset edit operations and to populate the dataset page versions table, has been made significantly more scalable. See #10814 and #10818.

Version Differences Details Sorting Added

In order to facilitate the comparison between the draft version and the published version of a dataset, a sort on subfields has been added. See #10969.

Reindexing After a Role Assignment is Less Memory Intensive

Adding or removing a user from a role on a collection, particularly the root collection, could lead to a significant increase in memory use, resulting in Dataverse itself failing with an out-of-memory condition. Such changes now consume much less memory. A Solr reindexing step is included in the upgrade instructions below. See also #10697 and #10698.

Longer Custom Questions in Guestbooks

Custom questions in Guestbooks can now be more than 255 characters and the bug causing a silent failure when questions were longer than this limit has been fixed. See also #9492, #10117, #10118.

PostgreSQL and Flyway Updates

This release bumps the version of PostgreSQL and Flyway used in containers as well as the PostgreSQL JDBC driver used all installations, including classic (non-Docker) installations. PostgreSQL and its driver have been bumped to version 17. Flyway has been bumped to version 10.

PostgreSQL 13 remains the version used with automated testing, leading us to continue to recommend that version for classic installations.

As of Flyway 10, supporting older versions of PostgreSQL no longer requires a paid subscription. While we don't encourage the use of older PostgreSQL versions, this flexibility may benefit some of our long-standing installations in their upgrade paths.

As part of this update, the containerized development environment now uses Postgres 17 instead of 16. Developers must delete their data (rm -rf docker-dev-volumes) and start with an empty database (rerun the quickstart in the dev guide), as explained on the dev mailing list.

The Docker compose file used for evaluations or demos has been upgraded from Postgres 13 to 17.

See also #10889 and #10912.

Harvesting "oai_dc" Metadata Prefix When Extended With Specific Namespaces

Some data repositories extend the "oai_dc" metadata prefix with specific namespaces. In this case, harvesting of these datasets into Dataverse was not possible because an XML parsing error was raised.

Harvesting of these datasets has been fixed by excluding tags with namespaces that are not "dc:". That is, only harvesting metadata with the "dc" namespace. See #10837.

Harvested Dataset PID from Record Header

When harvesting, Dataverse can now use the identifier from the OAI-PMH record header as the persistent id for the harvested dataset.

This will allow harvesting from sources that do not include a persistent id in their oai_dc metadata records, but use valid DOIs or handles as the OAI-PMH record header identifiers.

It is also possible to optionally configure a harvesting client to use this OAI-PMH identifier as the preferred choice for the persistent id. See the Harvesting Clients API section of the Guides, #11049 and #10982 for more information.

Harvested Datasets Can Have Multiple "otherId" Values

When harvesting using the DDI format, datasets can now have multiple "otherId" values. See #10772.

Multiple Languages in Docker

Documentation has been added to explain how to set up multiple languages (e.g. English and French) in the tutorial for setting up Dataverse in Docker.

See the tutorial, #10939, and #10940.

GlobusBatchLookupSize

An optimization has been added for the Globus upload workflow, with a corresponding new database setting: :GlobusBatchLookupSize

See the Database Settings section of the guides, #10977, and #11040 for more information.

Bugs Fixed

Relation Type (Related Publication) and DataCite

The subfield "Relation Type" was added to the field "Related Publication" in Dataverse 6.4 (#10632) but couldn't be used without workarounds described in an announcement about the problem. The bug has been fixed and workarounds are no longer required. See #10926 and the announcement above.

Sort Order for Files

"Newest" and "Oldest" were reversed when sorting files on the dataset landing page. This has been fixed. See #10742 and #11000.

Guestbook Email Validation

In the Guestbook UI form, the email address is now checked for validity. See #10661 and #11022.

Updating Files Now Possible When Latest and Only Dataset Version is Deaccessioned

When a dataset was deaccessioned, and was the only previous version, it would cause an error when trying to update the files. This has been fixed. See #9351 and #10901.

My Data Filter by Username Feature Restored

The superuser-only feature of filtering by a username on the My Data page was not working. Entering a username in the "Results for Username" field now returns data for the desired user. See also #7239 and #10980.

Better Handling of Parallel Edit/Publish Errors

Improvements have been made in handling the errors when a dataset has been edited in one browser window and an attempt is made to edit or publish it in another. (This practice is discouraged, by the way.) See #10793 and #10794.

Facets Filter Labels Now Translated Above Search Results

On the main page, it's possible to filter results using search facets. If internationalization (i18n) has been enabled in the Dataverse installation, allowing pages to be displayed in several languages, the facets were correctly translated in the filter column at the left. However, they were not being translated above the search results, remaining in the default language, English. This has been fixed. See #9408 and #10158.

Unpublished File Bug Fix Related to Deaccessioning

A bug fix was made related to retrieval of the major version of a Dataset when all major versions were deaccessioned. This fixes the incorrect showing of the files as "Unpublished" in the search list even when they are published. In the upgrade instructions below, there is a step to reindex Solr. See also #10947 and #10974.

Minor DataCiteXML Fix (Useless Null)

A minor bug fix was made to avoid sending a useless ", null" in the DataCiteXML sent to DataCite and in the DataCite export when a dataset has a metadata entry for "Software Name" and no entry for "Software Version". The bug fix will update datasets upon publication. Anyone with existing published datasets with this problem can be fixed by pushing updated metadata to DataCite for affected datasets and re-exporting the dataset metadata. See "Pushing updated metadata to DataCite" in the upgrade instructions below. See also #10919.

PIDs and Make Data Count Citation Retrieval

Make Data Count (MDC) citation retrieval with the PID settings has been fixed. PID parsing in Dataverse is now case insensitive, improving interaction with services that may change the case of PIDs. Warnings rel...

Read more

v6.4

30 Sep 16:30
906f874
Compare
Choose a tag to compare

Dataverse 6.4

Please note: To read these instructions in full, please go to https://github.com/IQSS/dataverse/releases/tag/v6.4 rather than the list of releases, which will cut them off.

This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

New features in Dataverse 6.4:

  • Enhanced DataCite Metadata, including "Relation Type"
  • All ISO 639-3 languages are now supported
  • There is now a button for "Unlink Dataset"
  • Users will have DOIs/PIDs reserved for their files as part of file upload instead of at publication time
  • Datasets can now have types such as "software" or "workflow"
  • Croissant support
  • RO-Crate support
  • and more! Please see below.

New client library:

  • Rust

This release also fixes two important bugs described below and in a post on the mailing list:

  • "Update Current Version" can cause metadata loss
  • Publishing breaks designated dataset thumbnail, messes up collection page

Additional details on the above as well as many more features and bug fixes included in the release are described below. Read on!

Features Added

Enhanced DataCite Metadata, Including "Relation Type"

Within the "Related Publication" field, a new subfield has been added called "Relation Type" that allows for the most common values recommended by DataCite: isCitedBy, Cites, IsSupplementTo, IsSupplementedBy, IsReferencedBy, and References. For existing datasets where no "Relation Type" has been specified, "IsSupplementTo" is assumed.

Dataverse now supports the DataCite v4.5 schema. Additional metadata is now being sent to DataCite including metadata about related publications and files in the dataset. Improved metadata is being sent including how PIDs (ORCID, ROR, DOIs, etc.), license/terms, geospatial, and other metadata are represented. The enhanced metadata will automatically be sent to DataCite when datasets are created and published. Additionally, after publication, you can inspect what was sent by looking at the DataCite XML export.

The additions are in rough alignment with the OpenAIRE XML export, but there are some minor differences in addition to the Relation Type addition, including an update to the DataCite 4.5 schema. For details see #10632, #10615 and the design document referenced there.

Multiple backward incompatible changes and bug fixes have been made to API calls (three of four of which were not documented) related to updating PID target URLs and metadata at the provider service:

Full List of ISO 639-3 Languages Now Supported

The controlled vocabulary values list for the metadata field "Language" in the citation block has now been extended to include roughly 7920 ISO 639-3 values.

Some of the language entries in the pre-6.4 list correspond to "macro languages" in ISO-639-3 and admins/users may wish to update to use the corresponding individual language entries from ISO-639-3. As these cases are expected to be rare (they do not involve major world languages), finding them is not covered in the release notes. Anyone who desires help in this area is encouraged to reach out to the Dataverse community via any of the standard communication channels.

ISO 639-3 codes were downloaded from sil.org and the file used for merging with the existing citation.tsv was "iso-639-3.tab". See also #8578 and #10762.

Unlink Dataset Button

A new "Unlink Dataset" button has been added to the dataset page to allow a user to unlink a dataset from a collection. To unlink a dataset the user must have permission to link the dataset. Additionally, the existing API for unlinking datasets has been updated to no longer require superuser access as the "Publish Dataset" permission is now enough. See also #10583 and #10689.

Pre-Publish File DOI Reservation

Dataverse installations using DataCite as a persistent identifier (PID) provider (or other providers that support reserving PIDs) will be able to reserve PIDs for files when they are uploaded (rather than at publication time). Note that reserving file DOIs can slow uploads with large numbers of files so administrators may need to adjust timeouts (specifically any Apache "ProxyPass / ajp://localhost:8009/ timeout=" setting in the recommended Dataverse configuration). See also #7334.

Initial Support for Dataset Types

Out of the box, all datasets now have the type "dataset" but superusers can add additional types. At this time the type of a dataset can only be set at creation time via API. The types "dataset", "software", and "workflow" (just those three, for now) will be sent to DataCite (as resourceTypeGeneral) when the dataset is published.

For details see the guides, #10517 and #10694. Please note that this feature is highly experimental and is expected to evolve.

Croissant Support (Metadata Export)

A new metadata export format called Croissant is now available as an external metadata exporter. It is oriented toward making datasets consumable by machine learning.

For more about the Croissant exporter, including installation instructions, see https://github.com/gdcc/exporter-croissant. See also #10341, #10533, and discussion on the mailing list.

Please note: the Croissant exporter works best with Dataverse 6.2 and higher (where it updates the content of <head> as described in the guides) but can be used with 6.0 and higher (to get the export functionality).

RO-Crate Support (Metadata Export)

Dataverse now supports RO-Crate as a metadata export format. This functionality is not available out of the box, but you can enable one or more RO-Crate exporters from the list of external exporters. See also #10744 and #10796.

Rust API Client Library

An Dataverse API client library for the Rust programming language is now available at https://github.com/gdcc/rust-dataverse and has been added to the list of client libraries in the API Guide. See also #10758.

Collection Thumbnail Logo for Featured Collections

Collections can now have a thumbnail logo that is displayed when the collection is configured as a featured collection. If present, this thumbnail logo is shown. Otherwise, the collection logo is shown. Configuration is done under the "Theme" for a collection as explained in the guides. See also #10291 and #10433.

Saved Searches Can Be Deleted

Saved searches can now be deleted via API. See the Saved Search section of the API Guide, #9317 and #10198.

Notification Email Improvement

When notification emails are sent the part of the closing that says "contact us for support at" will now show the support email address (dataverse.mail.support-email), when configured, instead of the default system email address. Using the system email address here was particularly problematic when it was a "noreply" address. See also #10287 and #10504.

Ability to Disable Automatic Thumbnail Selection

It is now possible to turn off the feature that automatically selects one of the image datafiles to serve as the thumbnail of the parent dataset. An admin can turn it off by enabling the feature flag dataverse.feature.disable-dataset-thumbnail-autoselect. When the feature is disabled, a user can still manually pick a thumbnail image, or upload a dedicated thumbnail image. See also #10820.

More Flexible PermaLinks

The configuration setting dataverse.pid.*.permalink.base-url, which is used for PermaLinks, has been updated to support greater flexibility. Previously, the string /citation?persistentId= was automatically appended to the configured base URL. With this update, the base URL will now be used exactly as configured, without any automatic additions. See also #10775.

Globus Asy...

Read more

v6.3

03 Jul 14:54
8c99a74
Compare
Choose a tag to compare

Dataverse 6.3

Summary

  • New Contributor Guide. The UX Working Group released a new Dataverse Contributor Guide.
  • Search Performance Improvements. Solr indexing and searching were improved, speeding up performance. Larger installations take note.
  • Dataverse Now Supports File-level Retention Periods. See the Retention Periods section of the guide for details.
  • API Optimizations for Large Datasets. Search API and permission checking have been improved for datasets with thousands of files.
  • Improved Controlled Vocabulary Support. Improvements include updates to the citation metadata block's Language field and multiple extensions added to the external vocabulary mechanism.
  • Improved Detection of RO-Crate Files. Dataverse now detects mime-types based on filename extensions and detects RO-Crate metadata files.
  • Sitemap Now Supports More Than 50K Items. Dataverse can now handle more than 50,000 items when generating sitemap files. For details, see the sitemap section of the Installation Guide.
  • Infrastructure Updates. Payara and Solr have been updated.

Please note: To read these instructions in full, please go to https://github.com/IQSS/dataverse/releases/tag/v6.3 rather than the list of releases, which will cut them off.

This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Table of Contents

Release Highlights

Solr Search and Indexing Improvements

Multiple improvements have been made to the way Solr indexing and searching is done. Response times should be significantly improved.

  • Two experimental features flag called "add-publicobject-solr-field" and "avoid-expensive-solr-join" have been added to change how Solr documents are indexed for public objects and how Solr queries are constructed to accommodate access to restricted content (drafts, etc.). It is hoped that it will help with performance, especially on large instances and under load.

  • Before the search feature flag ("avoid-expensive...") can be turned on, the indexing flag must be enabled, and a full reindex performed. Otherwise publicly available objects are NOT going to be shown in search results.

  • A feature flag called "reduce-solr-deletes" has been added to improve how datafiles are indexed. When the flag is enabled, Dataverse will avoid pre-emptively deleting existing Solr documents for the files prior to sending updated information. This
    should improve performance and will allow additional optimizations going forward.

  • The /api/admin/index/status and /api/admin/index/clear-orphans calls
    (see https://guides.dataverse.org/en/latest/admin/solr-search-index.html#index-and-database-consistency)
    will now find and remove (respectively) additional permissions related Solr documents that were not being detected before.
    Reducing the overall number of documents will improve Solr performance and large sites may wish to periodically call the "clear-orphans" API.

  • Dataverse now relies on the autoCommit and autoSoftCommit settings in the Solr configuration instead of explicitly committing documents to the Solr index. This improves indexing speed.

See also #10554, #10654, and #10579.

File Retention Period

Dataverse now supports file-level retention periods. The ability to set retention periods, with a minimum duration (in months), can be configured by a Dataverse installation administrator. For more information, see the Retention Periods section of the User Guide.

  • Users can configure a specific retention period, defined by an end date and a short reason, on a set of selected files or an individual file, by selecting the "Retention Period" menu item and entering information in a popup dialog. Retention periods can only be set, changed, or removed before a file has been published. After publication, only Dataverse installation administrators can make changes, using an API.

  • After the retention period expires, files can not be previewed or downloaded (as if restricted, with no option to allow access requests). The file (landing) page and all the metadata remains available.

↑ Table of Contents

Features

Large Datasets Improvements

For scenarios involving API calls related to large datasets (numerous files, for example: ~10k) the following have been been optimized:

  • The Search API endpoint.
  • The permission checking logic present in PermissionServiceBean.

See also #10415.

Improved Controlled Vocabulary for Citation Block

The Controlled Vocabuary Values list for the "Language" metadata field in the citation block has been improved, with some missing two- and three-letter ISO 639 codes added, as well as more alternative names for some of the languages, making all these extra language identifiers importable. See also #8243.

Updates on Support for External Vocabulary Services

Multiple extensions of the external vocabulary mechanism have been added. These extensions allow interaction with services based on the Ontoportal software and are expected to be generally useful for other service types.

These changes include:

  • Improved Indexing with Compound Fields: When using an external vocabulary service with compound fields, you can now specify which field(s) will include additional indexed information, such as translations of an entry into other languages. This is done by adding the indexIn in retrieval-filtering. See also #10505 and GDCC/dataverse-external-vocab-support documentation.

  • Broader Support for Indexing Service Responses: Indexing of the results from retrieval-filtering responses can now handle additional formats including JSON arrays of strings and values from arbitrary keys within a JSON Object. See #10505.

  • HTTP Headers: You are now able to add HTTP request headers required by the service you are implementing. See #10331.

  • Flexible params in retrievalUri: You can now use managed-fields field names as well as the term-uri-field field name as parameters in the retrieval-uri when configuring an external vocabulary service. {0} as an alternative to using the term-uri-field name is still supported for backward compatibility. Also you can specify if the value must be url encoded with encodeUrl:. See #10404.

    For example : "retrieval-uri": "https://data.agroportal.lirmm.fr/ontologies/{keywordVocabulary}/classes/{encodeUrl:keywordermURL}"

  • Hidden HTML Fields External controlled vocabulary scripts, configured via :CVocConf, can now access the values of managed fields as well as the term-uri-field for use in constructing the metadata view for a dataset. These values are now added as hidden elements in the HTML and can be found with the HTML attribute data-cvoc-metadata-name. See also #10503.

A Contributor Guide is now available

A new Contributor Guide has been added by the UX Working Group (#10531 and #10532).

URL Validation Is More Permissive

URL validation now allows two slashes in the path component of the URL.
Among other things, this allows metadata fields of url type to be filled with more complex url such as https://archive.softwareheritage.org/browse/directory/561bfe6698ca9e58b552b4eb4e56132cac41c6f9/?origin_url=https://github.com/gem-pasteur/macsyfinder&revision=868637fce184865d8e0436338af66a2648e8f6e1&snapshot=1bde3cb370766b10132c4e004c7cb377979928d1

See also #9750 and #9739

Improved Detection of RO-Crate Files

Detection of mime-types based on a filename with extension and detection of the RO-Crate metadata files.

From now on, filenames with extensions can be added into MimeTypeDetectionByFileName.properties file. Filenames added there will take precedence over simply recognizing files by extensions. For example, two new filenames are added into that file:

ro-crate-metadata.json=application/ld+json; profile="http://www.w3.org/ns/json-ld#flattened http://www.w3.org/ns/json-ld#compacted https://w3id.org/ro/crate"
ro-crate-metadata.jsonld=application/ld+json; profile="http://www.w3.org/ns/json-ld#flattened http://www.w3.org/ns/json-ld#compacted https://w3id.org/ro/crate"

Therefore, files named ro-crate-metadata.json will be then detected as RO-Crated metadata files from now on, instead as generic JSON files.
For more information on the RO-Crate specifications, see https://www.researchobject.org/ro-crate

See also #10015....

Read more

v6.2

02 Apr 15:02
a218417
Compare
Choose a tag to compare

Dataverse 6.2

Please note: As of 2024-05-16, the "dvinstall.zip" file below has been updated (PR #10563) to fix a bug (#10557) in the as-install.sh script as detailed in an announcement.

Please note: To read these instructions in full, please go to https://github.com/IQSS/dataverse/releases/tag/v6.2 rather than the list of releases, which will cut them off.

This release brings new features, enhancements, and bug fixes to the Dataverse software.
Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Table of Contents

💡Release Highlights

Search and Facet by License

License have been added to the search facets in the search side panel to filter datasets by license (e.g. CC0).

Datasets with Custom Terms are aggregated under the "Custom Terms" value of this facet. See the Licensing section of the guide for more details on configured Licenses and Custom Terms.

For more information, see #9060.

Licenses can also be used to filter the Search API results using the fq parameter, for example : /api/search?q=*&fq=license%3A%22CC0+1.0%22 for CC0 1.0, see the Search API guide for more examples.

For more information, see #10204.

When Returning Datasets to Authors, Reviewers Can Add a Note to the Author

The Popup for returning to author now allows to type in a message to explain the reasons of return and potential edits needed, that will be sent by email to the author.

Please note that this note is mandatory, but that you can still type a creative and meaningful comment such as "The author would like to modify his dataset", "Files are missing", "Nothing to report" or "A curation report with comments and suggestions/instructions will follow in another email" that suits your situation.

For more information, see #10137.

Support for Using Multiple PID Providers

This release adds support for using multiple PID (DOI, Handle, PermaLink) providers, multiple PID provider accounts
(managing a given protocol, authority, separator, shoulder combination), assigning PID provider accounts to specific collections,
and supporting transferred PIDs (where a PID is managed by an account when its authority, separator, and/or shoulder don't match
the combination where the account can mint new PIDs). It also adds the ability for additional provider services beyond the existing
DataCite, EZId, Handle, and PermaLink providers to be dynamically added as separate jar files.

These changes require per-provider settings rather than the global PID settings previously supported. While backward compatibility
for installations using a single PID Provider account is provided, updating to use the new microprofile settings is highly recommended
and will be required in a future version.

For more information check the PID settings on this link.

New microprofile settings

Rate Limiting

The option to rate limit has been added to prevent users from over taxing the system either deliberately or by runaway automated processes.
Rate limiting can be configured on a tier level with tier 0 being reserved for guest users and tiers 1-any for authenticated users.
Superuser accounts are exempt from rate limiting.

Rate limits can be imposed on command APIs by configuring the tier, the command, and the hourly limit in the database.
Two database settings configure the rate limiting :RateLimitingDefaultCapacityTiers and RateLimitingCapacityByTierAndAction, If either of these settings exist in the database rate limiting will be enabled and If neither setting exists rate limiting is disabled.

For more details check the detailed guide on this link.

Simplified SMTP Configuration

With this release, we deprecate the usage of asadmin create-javamail-resource to configure Dataverse to send mail using your SMTP server and provide a simplified, standard alternative using JVM options or MicroProfile Config.

At this point, no action is required if you want to keep your current configuration.
Warnings will show in your server logs to inform and remind you about the deprecation.
A future major release of Dataverse may remove this way of configuration.

Please do take the opportunity to update your SMTP configuration. Details can be found in section of the Installation Guide starting with the SMTP/Email Configuration section of the Installation Guide.

Once reconfiguration is complete, you should remove legacy, unused config. First, run asadmin delete-javamail-resource mail/notifyMailSession as described in the 6.2 guides. Then run curl -X DELETE http://localhost:8080/api/admin/settings/:SystemEmail as this database setting has been replace with dataverse.mail.system-email as described below.

Please note: as there have been problems with email delivered to SPAM folders when the "From" within mail envelope and the mail session configuration didn't match (#4210), as of this version the sole source for the "From" address is the setting dataverse.mail.system-email once you migrate to the new way of configuration.

New SMTP settings:

Binder Redirect

If your installation is configured to use Binder, you should remove the old "girder_ythub" tool and replace it with the tool described at https://github.com/IQSS/dataverse-binder-redirect

For more information, see #10360.

Optional Croissant 🥐 Exporter Support

When a Dataverse installation is configured to use a metadata exporter for the Croissant format, the content of the JSON-LD in the <head> of dataset landing pages will be replaced with that format. However, both JSON-LD and Croissant will still be available for download from the dataset page and API.

For more information, see #10382.

Harvesting Handle Missing Controlled Values

Allows datasets to be harvested with Controlled Vocabulary Values that existed in the originating Dataverse installation but are not in the harvesting Dataverse installation. For more information, view the changes to the endpoint here.

Add .QPJ and .QMD Extensions to Shapefile Handling

Support for .qpj and .qmd files in shapefile uploads has been introduced, ensuring that these files are properly recognized and handled as part of geospatial datasets in Dataverse.

For more information, see #10305.

Ingested Tabular Data Files Can Be Stored Without the Variable Name Header

Tabular Data Ingest can now save the generated archival files with the list of variable names added as the first tab-delimited line.

Access API will be able to take advantage of Direct Download for .tab files saved with these headers on S3 - since they no longer have to be generated and added to the streamed content on the fly.

This behavior is controlled by the new setting :StoreIngestedTabularFilesWithVarHeaders. It is false by default, preserving the legacy behavior. When enabled, Dataverse will be able to handle both the newly ingested files, and any already-existing legacy files stored without these headers transparently to the user. E.g. the access API will continue delivering tab-delimited files with this header line, whether it needs to add it dynamically for the legacy files, or reading complete files directly from storage for the ones stored with it.

We are planning to add an API for converting existing legacy tabular files in a future release.

For more information, see #10282.

Uningest/Reingest Options Available in the File Page Edit Menu

New Uningest/Reingest options are available in the File Page Edit menu. Ingest errors can be cleared by users who can published the associated dataset and by superusers, allowing for a successful ingest to be undone or retried (e.g. after a Dataverse version update or if ingest size limits are changed).

The /api/files//uningest api also now allows users who can publish the dataset to undo an ingest failure.

For more information, see #10319.

Sphinx Guides Now Support Markdown Format and Tabs

Our guides now support the Markdown format with the extension .md. Additionally, an option to create tabs in the guides using Sphinx Tabs has been added. (You can see the tabs in action in the "dev usage" page of the Container Guide.) To continue building the guides, you will need to install this new dependency by re-running:

pip install -r requirements.txt

For more information, see #10111.

Number of Concurrent Indexing Operations Now Configurable

A new MicroProfile setting called `dataverse.so...

Read more

v6.1

12 Dec 23:27
1f9e10c
Compare
Choose a tag to compare

Dataverse 6.1

Please see Dataverse 6.1 deployment challenges for information about a patch that fixes some issues in this release.

Please note: To read these instructions in full, please go to https://github.com/IQSS/dataverse/releases/tag/v6.1 rather than the list of releases, which will cut them off.

This release brings new features, enhancements, and bug fixes to the Dataverse software.
Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release highlights

Guestbook at request

Dataverse can now be configured (via the dataverse.files.guestbook-at-request option) to display any configured guestbook to users when they request restricted files (new functionality) or when they download files (previous behavior).

The global default defined by this setting can be overridden at the collection level on the collection page and at the individual dataset level by a superuser using the API. The default, showing guestbooks when files are downloaded, remains as it was in prior Dataverse versions.

For details, see dataverse.files.guestbook-at-request and PR #9599.

Collection-level storage quotas

This release adds support for defining storage size quotas for collections. Please see the API guide for details. This is an experimental feature that has not yet been used in production on any real life Dataverse instance, but we are planning to try it out at Harvard/IQSS.

Please note that this release includes a database update (via a Flyway script) that will calculate the storage sizes of all the existing datasets and collections on the first deployment. On a large production database with tens of thousands of datasets this may add a couple of extra minutes to the first, initial deployment of Dataverse 6.1.

For details, see Storage Quotas for Collections in the Admin Guide.

Globus support (experimental), continued

Globus support in Dataverse has been expanded to include support for using file-based Globus endpoints, including the case where files are stored on tape and are not immediately accessible and for the case of referencing files stored on remote Globus endpoints. Support for using the Globus S3 Connector with an S3 store has been retained but requires changes to the Dataverse configuration. Please note:

  • Globus functionality remains experimental/advanced in that it requires significant setup, differs in multiple ways from other file storage mechanisms, and may continue to evolve with the potential for backward incompatibilities.
  • The functionality is configured per store and replaces the previous single-S3-Connector-per-Dataverse-instance model.
  • Adding files to a dataset, and accessing files is supported via the Dataverse user interface through a separate dataverse-globus app.
  • The functionality is also accessible via APIs (combining calls to the Dataverse and Globus APIs)

Backward incompatibilities:

  • The configuration for use of a Globus S3 Connector has changed and is aligned with the standard store configuration mechanism
  • The new functionality is incompatible with older versions of the globus-dataverse app and the Globus-related functionality in the UI will only function correctly if a Dataverse 6.1 compatible version of the dataverse-globus app is configured.

New JVM options:

  • A new "globus" store type and associated store-related options have been added. These are described in the File Storage section of the Installation Guide.
  • dataverse.files.globus-cache-maxage - specifies the number of minutes Dataverse will wait between an initial request for a file transfer occurs and when that transfer must begin.

Obsolete Settings: the :GlobusBasicToken, :GlobusEndpoint, and :GlobusStores settings are no longer used

Further details can be found in the Big Data Support section of the Developer Guide.

Alternative Title now allows multiple values

Alternative Title now allows multiples. Note that JSON used to create a dataset with an Alternate Title must be changed. See "Backward incompatibilities" below and PR #9440 for details.

External tools: configure tools now available at the dataset level

Read/write "configure" tools (a type of external tool) are now available at the dataset level. They appear under the "Edit Dataset" menu. See External Tools in the Admin Guide and PR #9925.

S3 out-of-band upload

In some situations, direct upload might not work from the UI, e.g., when s3 storage is not accessible from the internet. This pull request adds an option to allow direct uploads via API only. This way, a third party application can use direct upload from within the internal network, while there is no direct download available to the users via UI.
By default, Dataverse supports uploading files via the add a file to a dataset API. With S3 stores, a direct upload process can be enabled to allow sending the file directly to the S3 store (without any intermediate copies on the Dataverse server).
With the upload-out-of-band option enabled, it is also possible for file upload to be managed manually or via third-party tools, with the Adding the Uploaded file to the Dataset API call (described in the Direct DataFile Upload/Replace API page) used to add metadata and inform Dataverse that a new file has been added to the relevant store.

JSON Schema for datasets

Functionality has been added to help validate dataset JSON prior to dataset creation. There are two new API endpoints in this release. The first takes in a collection alias and returns a custom dataset schema based on the required fields of the collection. The second takes in a collection alias and a dataset JSON file and does an automated validation of the JSON file against the custom schema for the collection. In this release functionality is limited to JSON format validation and validating required elements. Future releases will address field types, controlled vocabulary, etc. See Retrieve a Dataset JSON Schema for a Collection in the API Guide and PR #10109.

OpenID Connect (OIDC) improvements

Using MicroProfile Config for provisioning

With this release it is possible to provision a single OIDC-based authentication provider by using MicroProfile Config instead of or in addition to the classic Admin API provisioning.

If you are using an external OIDC provider component as an identity management system and/or broker to other authentication providers such as Google, eduGain SAML and so on, this might make your life easier during instance setups and reconfiguration. You no longer need to generate the necessary JSON file.

Adding PKCE Support

Some OIDC providers require using PKCE as additional security layer. As of this version, you can enable support for this on any OIDC provider you configure. (Note that OAuth2 providers have not been upgraded.)

For both features, see the OIDC section of the Installation Guide and PR #9273.

Solr improvements

As of this release, application-side support has been added for the "circuit breaker" mechanism in Solr that makes it drop requests more gracefully when the search engine is experiencing load issues.

Please see the Installing Solr section of the Installation Guide.

New release of Dataverse Previewers (including a Markdown previewer)

Version 1.4 of the standard Dataverse Previewers from https://github/com/gdcc/dataverse-previewers is available. The new version supports the use of signedUrls rather than API keys when previewing restricted files (including files in draft dataset versions). Upgrading is highly recommended. Please note:

  • SignedUrls can now be used with PrivateUrl access tokens, which allows PrivateUrl users to view previewers that are configured to use SignedUrls. See #10093.
  • Launching a dataset-level configuration tool will automatically generate an API token when needed. This is consistent with how other types of tools work. See #10045.
  • There is now a Markdown (.md) previewer.

New or improved APIs

The development of a new UI for Dataverse is driving the addition or improvement of many APIs.

New API endpoints

  • deaccessionDataset (/api/datasets/{id}/versions/{versionId}/deaccession): version deaccessioning through API (Given a dataset and a version).
  • /api/files/{id}/downloadCount
  • /api/files/{id}/dataTables
  • /api/files/{id}/metadata/tabularTags New endpoint to set tabular file tags.
  • canManageFilePermissions (/access/datafile/{id}/userPermissions) Added for getting user permissions on a file.
  • getVersionFileCounts (/api/datasets/{id}/versions/{versionId}/files/counts): Giv...
Read more

v6.0

08 Sep 17:47
5f2413b
Compare
Choose a tag to compare

Dataverse 6.0

Please note: To read these instructions in full, please go to https://github.com/IQSS/dataverse/releases/tag/v6.0 rather than the list of releases, which will cut them off.

This is a platform upgrade release. Payara, Solr, and Java have been upgraded. No features have been added to the Dataverse software itself. Only a handful of bugs were fixed.

Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project!

Release Highlights (Major Upgrades, Breaking Changes)

This release contains major upgrades to core components. Detailed upgrade instructions can be found below.

Runtime

  • The required Java version has been increased from version 11 to 17.
    • See PR #9764 for details.
  • Payara application server has been upgraded to version 6.2023.8.
    • This is a required update.
    • Please note that Payara Community 5 has reached end of life
    • See PR #9685 and PR #9795 for details.
  • Solr has been upgraded to version 9.3.0.
    • See PR #9787 for details.
  • PostgreSQL 13 remains the tested and supported version.
    • See the PostgreSQL section of the Installation Guide for details.

Development

  • Removal of Vagrant and Docker All In One (docker-aio), deprecated in Dataverse v5.14. See PR #9838 and PR #9685 for details.
  • All tests have been migrated to use JUnit 5 exclusively from now on. See PR #9796 for details.

Installation

If this is a new installation, please follow our Installation Guide. Please don't be shy about asking for help if you need it!

Once you are in production, we would be delighted to update our map of Dataverse installations around the world to include yours! Please create an issue or email us at [email protected] to join the club!

You are also very welcome to join the Global Dataverse Community Consortium (GDCC).

Upgrade Instructions

Upgrading requires a maintenance window and downtime. Please plan ahead, create backups of your database, etc.

These instructions assume that you've already upgraded through all the 5.x releases and are now running Dataverse 5.14.

Upgrade from Java 11 to Java 17

Java 17 is now required for Dataverse. Solr can run under Java 11 or Java 17 but the latter is recommended. In preparation for the Java upgrade, stop both Dataverse/Payara and Solr.

  1. Undeploy Dataverse, if deployed, using the unprivileged service account.

    sudo -u dataverse /usr/local/payara5/bin/asadmin list-applications

    sudo -u dataverse /usr/local/payara5/bin/asadmin undeploy dataverse-5.14

  2. Stop Payara 5.

    sudo -u dataverse /usr/local/payara5/bin/asadmin stop-domain

  3. Stop Solr 8.

    sudo systemctl stop solr.service

  4. Install Java 17.

    Assuming you are using RHEL or a derivative such as Rocky Linux:

    sudo yum install java-17-openjdk

  5. Set Java 17 as the default.

    Assuming you are using RHEL or a derivative such as Rocky Linux:

    sudo alternatives --config java

  6. Test that Java 17 is the default.

    java -version

Upgrade from Payara 5 to Payara 6

If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.

  1. Download Payara 6.2023.8.

    curl -L -O https://nexus.payara.fish/repository/payara-community/fish/payara/distributions/payara/6.2023.8/payara-6.2023.8.zip

  2. Unzip it to /usr/local (or your preferred location).

    sudo unzip payara-6.2023.8.zip -d /usr/local/

  3. Change ownership of the unzipped Payara to your "service" user ("dataverse" by default).

    sudo chown -R dataverse /usr/local/payara6

  4. Undeploy Dataverse, if deployed, using the unprivileged service account.

    sudo -u dataverse /usr/local/payara5/bin/asadmin list-applications

    sudo -u dataverse /usr/local/payara5/bin/asadmin undeploy dataverse-5.14

  5. Stop Payara 5, if running.

    sudo -u dataverse /usr/local/payara5/bin/asadmin stop-domain

  6. Copy Dataverse-related lines from Payara 5 to Payara 6 domain.xml.

    sudo -u dataverse cp /usr/local/payara6/glassfish/domains/domain1/config/domain.xml /usr/local/payara6/glassfish/domains/domain1/config/domain.xml.orig

    sudo egrep 'dataverse|doi' /usr/local/payara5/glassfish/domains/domain1/config/domain.xml > lines.txt

    sudo vi /usr/local/payara6/glassfish/domains/domain1/config/domain.xml

    If any JVM options reference the old payara5 path (/usr/local/payara5) be sure to change it to payara6.

    The lines will appear in two sections, examples shown below (but your content will vary).

    Section 1: system properties (under <server name="server" config-ref="server-config">)

    <system-property name="dataverse.db.user" value="dvnuser"></system-property>
    <system-property name="dataverse.db.host" value="localhost"></system-property>
    <system-property name="dataverse.db.port" value="5432"></system-property>
    <system-property name="dataverse.db.name" value="dvndb"></system-property>
    <system-property name="dataverse.db.password" value="dvnsecret"></system-property>
    

    Note: if you used the Dataverse installer, you won't have a dataverse.db.password property. See "Create password aliases" below.

    Section 2: JVM options (under <java-config classpath-suffix="" debug-options="-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=9009" system-classpath="">, the one under <config name="server-config">, not under <config name="default-config">)

    <jvm-options>-Ddataverse.files.directory=/usr/local/dvn/data</jvm-options>
    <jvm-options>-Ddataverse.files.file.type=file</jvm-options>
    <jvm-options>-Ddataverse.files.file.label=file</jvm-options>
    <jvm-options>-Ddataverse.files.file.directory=/usr/local/dvn/data</jvm-options>
    <jvm-options>-Ddataverse.rserve.host=localhost</jvm-options>
    <jvm-options>-Ddataverse.rserve.port=6311</jvm-options>
    <jvm-options>-Ddataverse.rserve.user=rserve</jvm-options>
    <jvm-options>-Ddataverse.rserve.password=rserve</jvm-options>
    <jvm-options>-Ddataverse.auth.password-reset-timeout-in-minutes=60</jvm-options>
    <jvm-options>-Ddataverse.timerServer=true</jvm-options>
    <jvm-options>-Ddataverse.fqdn=dev1.dataverse.org</jvm-options>
    <jvm-options>-Ddataverse.siteUrl=https://dev1.dataverse.org</jvm-options>
    <jvm-options>-Ddataverse.files.storage-driver-id=file</jvm-options>
    <jvm-options>-Ddoi.username=testaccount</jvm-options>
    <jvm-options>-Ddoi.password=notmypassword</jvm-options>
    <jvm-options>-Ddoi.baseurlstring=https://mds.test.datacite.org/</jvm-options>
    <jvm-options>-Ddoi.dataciterestapiurlstring=https://api.test.datacite.org</jvm-options>
    
  7. Check the Xmx setting in domain.xml.

    Under /usr/local/payara6/glassfish/domains/domain1/config/domain.xml, check the Xmx setting under <config name="server-config">, where you put the JVM options, not the one under <config name="default-config">. Note that there are two such settings, and you want to adjust the one in the stanza with Dataverse options. This sets the JVM heap size; a good rule of thumb is half of your system's total RAM. You may specify the value in MB (8192m) or GB (8g).

  8. Copy jhove.conf and jhoveConfig.xsd from Payara 5, edit and change payara5 to payara6.

    sudo cp /usr/local/payara5/glassfish/domains/domain1/config/jhove* /usr/local/payara6/glassfish/domains/domain1/config/

    sudo chown dataverse /usr/local/payara6/glassfish/domains/domain1/config/jhove*

    sudo -u dataverse vi /usr/local/payara6/glassfish/domains/domain1/config/jhove.conf

  9. Copy logos from Payara 5 to Payara 6.

    These logos are for collections (dataverses).

    sudo -u dataverse cp -r /usr/local/payara5/glassfish/domains/domain1/docroot/logos /usr/local/payara6/glassfish/domains/domain1/docroot

  10. If you are using Make Data Count (MDC), edit :MDCLogPath.

    Your :MDCLogPath database setting might be pointing to a Payara 5 directory such as /usr/local/payara5/glassfish/domains/domain1/logs. If so, edit this to be Payara 6. You'll probably want to copy your logs over as well.

  11. If you've enabled access logging or any other site-specific configuration, be sure to preserve them. For instance, the default domain.xml includes

         <http-service>
         <access-log></access-log>
    

    but you may wish to include

         <http-service access-logging-enabled="true">
         <access-log format="%client.name% %datetime% %request% %status% %response.length% %header.user-agent% %header.referer% %cookie.JSESSIONID% %header.x-forwarded-for%"></access-log>
    

    Be sure to keep a previous copy of your domain.xml for reference.

  12. Update systemd unit file (or other init system) from /usr/local/payara5 to /usr/local/payara6, if applicable.

  13. Start Payara.

    sudo -u dataverse /usr/local/payara6/bin/asadmin start-domain

  14. Create a Java mail resource, replacing "localhost" for mailhost with your mail relay server, and replacing "localhost" for fromaddress with the FQDN of your Dataverse server.

    `sudo -u dataverse /usr/local/payara6/bin/asadmin create-javamail-resource --mailhost "localhost" --mailuser "dataversenotify" --fromaddress "do-not-reply@l...

Read more

v5.14

04 Aug 20:35
9f4ddbb
Compare
Choose a tag to compare

Dataverse Software 5.14

(If this note appears truncated on the GitHub Releases page, you can view it in full in the source tree: https://github.com/IQSS/dataverse/blob/master/doc/release-notes/5.14-release-notes.md)

This release brings new features, enhancements, and bug fixes to the Dataverse software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Please note that, as an experiment, the sections of this release note are organized in a different order. The Upgrade and Installation sections are at the top, with the detailed sections highlighting new features and fixes further down.

Installation

If this is a new installation, please see our Installation Guide. Please don't be shy about asking for help if you need it!

After your installation has gone into production, you are welcome to add it to our map of installations by opening an issue in the dataverse-installations repo.

Upgrade Instructions

0. These instructions assume that you are upgrading from 5.13. If you are running an earlier version, the only safe way to upgrade is to progress through the upgrades to all the releases in between before attempting the upgrade to 5.14.

If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.

In the following commands we assume that Payara 5 is installed in /usr/local/payara5. If not, adjust as needed.

export PAYARA=/usr/local/payara5

(or setenv PAYARA /usr/local/payara5 if you are using a csh-like shell)

1. Undeploy the previous version.

  • $PAYARA/bin/asadmin undeploy dataverse-5.13

2. Stop Payara and remove the generated directory

  • service payara stop
  • rm -rf $PAYARA/glassfish/domains/domain1/generated

3. Start Payara

  • service payara start

4. Deploy this version.

  • $PAYARA/bin/asadmin deploy dataverse-5.14.war

5. Restart Payara

  • service payara stop
  • service payara start

6. Update the Citation metadata block: (the update makes the field Series repeatable)

  • wget https://github.com/IQSS/dataverse/releases/download/v5.14/citation.tsv
  • curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @citation.tsv -H "Content-type: text/tab-separated-values"

If you are running an English-only installation, you are finished with the citation block. Otherwise, download the updated citation.properties file and place it in the dataverse.lang.directory; /home/dataverse/langBundles used in the example below.

  • wget https://github.com/IQSS/dataverse/releases/download/v5.14/citation.properties
  • cp citation.properties /home/dataverse/langBundles

7. Upate Solr schema.xml to allow multiple series to be used. See specific instructions below for those installations without custom metadata blocks (7a) and those with custom metadata blocks (7b).

7a. For installations without custom or experimental metadata blocks:

  • Stop Solr instance (usually service solr stop, depending on Solr installation/OS, see the Installation Guide)

  • Replace schema.xml

    • cp /tmp/dvinstall/schema.xml /usr/local/solr/solr-8.11.1/server/solr/collection1/conf
  • Start Solr instance (usually service solr start, depending on Solr/OS)

7b. For installations with custom or experimental metadata blocks:

  • Stop Solr instance (usually service solr stop, depending on Solr installation/OS, see the Installation Guide)

  • There are 2 ways to regenerate the schema: Either by collecting the output of the Dataverse schema API and feeding it to the update-fields.sh script that we supply, as in the example below (modify the command lines as needed):

	wget https://raw.githubusercontent.com/IQSS/dataverse/master/conf/solr/8.11.1/update-fields.sh
	chmod +x update-fields.sh
	curl "http://localhost:8080/api/admin/index/solr/schema" | ./update-fields.sh /usr/local/solr/solr-8.8.1/server/solr/collection1/conf/schema.xml

OR, alternatively, you can edit the following lines in your schema.xml by hand as follows (to indicate that series and its components are now multiValued="true"):

     <field name="series" type="string" stored="true" indexed="true" multiValued="true"/>
     <field name="seriesInformation" type="text_en" multiValued="true" stored="true" indexed="true"/>
     <field name="seriesName" type="text_en" multiValued="true" stored="true" indexed="true"/>
  • Restart Solr instance (usually service solr restart depending on solr/OS)

8. Run ReExportAll to update dataset metadata exports. Follow the directions in the Admin Guide.

9. If your installation did not have :FilePIDsEnabled set, you will need to set it to true to keep file PIDs enabled:

  curl -X PUT -d 'true' http://localhost:8080/api/admin/settings/:FilePIDsEnabled

10. If your installation uses Handles as persistent identifiers (instead of DOIs): remember to upgrade your Handles service installation to a currently supported version.

Generally, Handles is known to be working reliably even when running older versions that haven't been officially supported in years. We still recommend to check on your service and make sure to upgrade to a supported version (the latest version is 9.3.1, https://www.handle.net/hnr-source/handle-9.3.1-distribution.tar.gz, as of writing this). An older version may be running for you seemingly just fine, but do keep in mind that it may just stop working unexpectedly at any moment, because of some incompatibility introduced in a Java rpm upgrade, or anything similarly unpredictable.

Handles is also very good about backward incompatibility. Meaning, in most cases you can simply stop the old version, unpack the new version from the distribution and start it on the existing config and database files, and it'll just keep working. However, it is a good idea to keep up with the recommended format upgrades, for the sake of efficiency and to avoid any unexpected surprises, should they finally decide to drop the old database format, for example. The two specific things we recommend: 1) Make sure your service is using a json version of the siteinfo bundle (i.e., if you are still using siteinfo.bin, convert it to siteinfo.json and remove the binary file from the service directory) and 2) Make sure you are using the newer bdbje database format for your handles catalog (i.e., if you still have the files handles.jdb and nas.jdb in your server directory, convert them to the new format). Follow the simple conversion instructions in the file README.txt in the Handles software distribution. Make sure to stop the service before converting the files and make sure to have a full backup of the existing server directory, just in case. Do not hesitate to contact the Handles support with any questions you may have, as they are very responsive and helpful.

New JVM Options and MicroProfile Config Options

The following PID provider options are now available. See the section "Changes to PID Provider JVM Settings" below for more information.

  • dataverse.pid.datacite.mds-api-url
  • dataverse.pid.datacite.rest-api-url
  • dataverse.pid.datacite.username
  • dataverse.pid.datacite.password
  • dataverse.pid.handlenet.key.path
  • dataverse.pid.handlenet.key.passphrase
  • dataverse.pid.handlenet.index
  • dataverse.pid.permalink.base-url
  • dataverse.pid.ezid.api-url
  • dataverse.pid.ezid.username
  • dataverse.pid.ezid.password

The following MicroProfile Config options have been added as part of Signposting support. See the section "Signposting for Dataverse" below for details.

  • dataverse.signposting.level1-author-limit
  • dataverse.signposting.level1-item-limit

The following JVM options are described in the "Creating datasets with incomplete metadata through API" section below.

  • dataverse.api.allow-incomplete-metadata
  • dataverse.ui.show-validity-filter
  • dataverse.ui.allow-review-for-incomplete

The following JVM/MicroProfile setting is for External Exporters. See "Mechanism Added for Adding External Exporters" below.

  • dataverse.spi.export.directory

The following JVM/MicroProfile settings are for handling of support emails. See "Contact Email Improvements" below.

  • dataverse.mail.support-email
  • dataverse.mail.cc-support-on-contact-emails

The following JVM/MicroProfile setting is for extracting a geospatial bounding box even if S3 direct upload is enabled.

  • dataverse.netcdf.geo-extract-s3-direct-upload

Backward Incompatibilities

The following list of potential backward incompatibilities references the sections of the "Detailed Release Highlights..." portion of the document further below where the corresponding changes are explained in detail.

Using the new External Exporters framework

Care should be taken when replacing Dataverse's internal metadata export formats as third party code, including other third party Exporters, may depend on the contents of those export formats. When replacing an existing format, one must also remember to delete the cached metadata export files or run the r...

Read more

v5.13

14 Feb 15:52
79d6e57
Compare
Choose a tag to compare

Dataverse Software 5.13

This release brings new features, enhancements, and bug fixes to the Dataverse software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

Schema.org Improvements (Some Backward Incompatibility)

The Schema.org metadata used as an export format and also embedded in dataset pages has been updated to improve compliance with Schema.org's schema and Google's recommendations for Google Dataset Search.

Please be advised that these improvements have the chance to break integrations that rely on the old, less compliant structure. For details see the "backward incompatibility" section below. (Issue #7349)

Folder Uploads via Web UI (dvwebloader, S3 only)

For installations using S3 for storage and with direct upload enabled, a new tool called DVWebloader can be enabled that allows web users to upload a folder with a hierarchy of files and subfolders while retaining the relative paths of files (similarly to how the DVUploader tool does it on the command line, but with the convenience of using the browser UI). See Folder Upload in the User Guide for details. (PR #9096)

Long Descriptions of Collections (Dataverses) are Now Truncated

Like datasets, long descriptions of collections (dataverses) are now truncated by default but can be expanded with a "read full description" button. (PR #9222)

License Sorting

Licenses as shown in the dropdown in UI can be now sorted by the superusers. See Sorting Licenses section of the Installation Guide for details. (PR #8697)

Metadata Field Production Location Now Repeatable, Facetable, and Enabled for Advanced Search

Depositors can now click the plus sign to enter multiple instances of the metadata field "Production Location" in the citation metadata block. Additionally this field now appears on the Advanced Search page and can be added to the list of search facets. (PR #9254)

Support for NetCDF and HDF5 Files

NetCDF and HDF5 files are now detected based on their content rather than just their file extension. Both "classic" NetCDF 3 files and more modern NetCDF 4 files are detected based on content. Detection for older HDF4 files is only done through the file extension ".hdf", as before.

For NetCDF and HDF5 files, an attempt will be made to extract metadata in NcML (XML) format and save it as an auxiliary file. There is a new NcML previewer available in the dataverse-previewers repo.

An extractNcml API endpoint has been added, especially for installations with existing NetCDF and HDF5 files. After upgrading, they can iterate through these files and try to extract an NcML file.

See the NetCDF and HDF5 section of the User Guide for details. (PR #9239)

Support for .eln Files (Electronic Laboratory Notebooks)

The .eln file format is used by Electronic Laboratory Notebooks as an exchange format for experimental protocols, results, sample descriptions, etc...

Improved Security for External Tools

External tools can now be configured to use signed URLs to access the Dataverse API as an alternative to API tokens. This eliminates the need for tools to have access to the user's API token in order to access draft or restricted datasets and datafiles. Signed URLs can be transferred via POST or via a callback when triggering a tool via GET. See Authorization Options in the External Tools documentation for details. (PR #9001)

Geospatial Search (API Only)

Geospatial search is supported via the Search API using two new parameters: geo_point and geo_radius.

The fields that are geospatially indexed are "West Longitude", "East Longitude", "North Latitude", and "South Latitude" from the "Geographic Bounding Box" field in the geospatial metadata block. (PR #8239)

Reproducibility and Code Execution with Binder

Binder has been added to the list of external tools that can be added to a Dataverse installation. From the dataset page, you can launch Binder, which spins up a computational environment in which you can explore the code and data in the dataset, or write new code, such as a Jupyter notebook. (PR #9341)

CodeMeta (Software) Metadata Support (Experimental)

Experimental support for research software metadata deposits has been added.

By adding a metadata block for CodeMeta, we take another step toward adding first class support of diverse FAIR objects, such as research software and computational workflows.

There is more work underway to make Dataverse installations around the world "research software ready."

Note: Like the metadata block for computational workflows before, CodeMeta is listed under Experimental Metadata in the guides. Experimental means it's brand new, opt-in, and might need future tweaking based on experience of usage in the field. We hope for feedback from installations on the new metadata block to optimize and lift it from the experimental stage. (PR #7877)

Mechanism Added for Stopping a Harvest in Progress

It is now possible for a sysadmin to stop a long-running harvesting job. See Harvesting Clients in the Admin Guide for more information. (PR #9187)

API Endpoint Listing Metadata Block Details has been Extended

The API endpoint /api/metadatablocks/{block_id} has been extended to include the following fields:

  • controlledVocabularyValues - All possible values for fields with a controlled vocabulary. For example, the values "Agricultural Sciences", "Arts and Humanities", etc. for the "Subject" field.
  • isControlledVocabulary: Whether or not this field has a controlled vocabulary.
  • multiple: Whether or not the field supports multiple values.

See Metadata Blocks in the API Guide for details. (PR #9213)

Advanced Database Settings

You can now enable advanced database connection pool configurations useful for debugging and monitoring as well as other settings. Of particular interest may be sslmode=require, though installations already setting this parameter in the Postgres connection string will need to move it to dataverse.db.parameters. See the new Database Persistence section of the Installation Guide for details. (PR #8915)

Support for Cleaning up Leftover Files in Dataset Storage

Experimental feature: the leftover files stored in the Dataset storage location that are not in the file list of that Dataset, but are named following the Dataverse technical convention for dataset files, can be removed with the new Cleanup Storage of a Dataset API endpoint.

OAI Server Bug Fixed

A bug introduced in 5.12 was preventing the Dataverse OAI server from serving incremental harvesting requests from clients. It was fixed in this release (PR #9316).

Major Use Cases and Infrastructure Enhancements

Changes and fixes in this release not already mentioned above include:

  • Administrators can configure an alternative storage location where files uploaded via the UI are temporarily stored during the transfer from client to server. (PR #8983, See also Configuration Guide)
  • To improve performance, Dataverse estimates download counts. This release includes an update that makes the estimate more accurate. (PR #8972)
  • Direct upload and out-of-band uploads can now be used to replace multiple files with one API call (complementing the prior ability to add multiple new files). (PR #9018)
  • A persistent identifier, CSRT, is added to the Related Publication field's ID Type child field. For datasets published with CSRT IDs, Dataverse will also include them in the datasets' Schema.org metadata exports. (Issue #8838)
  • Datasets that are part of linked dataverse collections will now be displayed in their linking dataverse collections.

New JVM Options and MicroProfile Config Options

The following JVM option is now available:

  • dataverse.personOrOrg.assumeCommaInPersonName - the default is false

The following MicroProfile Config options are now available (these can be treated as JVM options):

  • dataverse.files.uploads - alternative storage location of generated temporary files for UI file uploads
  • dataverse.api.signing-secret - used by signed URLs
  • dataverse.solr.host
  • dataverse.solr.port
  • dataverse.solr.protocol
  • dataverse.solr.core
  • dataverse.solr.path
  • dataverse.rserve.host

The following existing JVM options are now available via MicroProfile Config:

  • dataverse.siteUrl
  • dataverse.fqdn
  • dataverse.files.directory
  • dataverse.rserve.host
  • dataverse.rserve.port
  • dataverse.rserve.user
  • dataverse.rserve.password
  • dataverse.rserve.tempdir

Notes for Developers and Integrato...

Read more

v5.12.1

07 Nov 13:51
cf90431
Compare
Choose a tag to compare

Dataverse Software 5.12.1

This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

Bug Fix for "Internal Server Error" When Creating a New Remote Account

Unfortunately, as of 5.11 new remote users have seen "Internal Server Error" when creating an account (or checking notifications just after creating an account). Remote users are those who log in with institutional (Shibboleth), OAuth (ORCID, GitHub, or Google) or OIDC providers.

This is a transient error that can be worked around by reloading the browser (or logging out and back in again) but it's obviously a very poor user experience and a bad first impression. This bug is the primary reason we are putting out this patch release. Other features and bug fixes are coming along for the ride.

Ability to Disable OAuth Sign Up While Allowing Existing Accounts to Log In

A new option called :AllowRemoteAuthSignUp has been added providing a mechanism for disabling new account signups for specific OAuth2 authentication providers (Orcid, GitHub, Google etc.) while still allowing logins for already-existing accounts using this authentication method.

See the Installation Guide for more information on the setting.

Production Date Now Used for Harvested Datasets in Addition to Distribution Date (oai_dc format)

Fix the year displayed in citation for harvested dataset, especially for oai_dc format.

For normal datasets, the date used is the "citation date" which is by default the publication date (the first release date) unless you change it.

However, for a harvested dataset, the distribution date was used instead and this date is not always present in the harvested metadata.

Now, the production date is used for harvested dataset in addition to distribution date when harvesting with the oai_dc format.

Publication Date Now Used for Harvested Dataset if Production Date is Not Set (oai_dc format)

For exports and harvesting in oai_dc format, if "Production Date" is not set, "Publication Date" is now used instead. This change is reflected in the Dataverse 4+ Metadata Crosswalk linked from the Appendix of the User Guide.

Major Use Cases and Infrastructure Enhancements

Changes and fixes in this release include:

  • Users creating an account by logging in with Shibboleth, OAuth, or OIDC should not see errors. (Issue 9029, PR #9030)
  • When harvesting datasets, I want the Production Date if I can't get the Distribution Date (PR #8732)
  • When harvesting datasets, I want the Publication Date if I can't get the Production Date (PR #8733)
  • As a sysadmin I'd like to disable (temporarily or permanently) sign ups from OAuth providers while allowing existing users to continue to log in from that provider (PR #9112)
  • As a C/C++ developer I want to use Dataverse APIs (PR #9070)

New DB Settings

The following DB settings have been added:

  • :AllowRemoteAuthSignUp

See the Database Settings section of the Guides for more information.

Complete List of Changes

For the complete list of code changes in this release, see the 5.12.1 Milestone in GitHub.

For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email [email protected].

Installation

If this is a new installation, please see our Installation Guide. Please also contact us to get added to the Dataverse Project Map if you have not done so already.

Upgrade Instructions

Upgrading requires a maintenance window and downtime. Please plan ahead, create backups of your database, etc.

0. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the Dataverse Software 5 Release Notes. After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.12.1.

If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.

export PAYARA=/usr/local/payara5

(or setenv PAYARA /usr/local/payara5 if you are using a csh-like shell)

1. Undeploy the previous version

    $PAYARA/bin/asadmin list-applications
    $PAYARA/bin/asadmin undeploy dataverse<-version>

2. Stop Payara

    service payara stop
    rm -rf $PAYARA/glassfish/domains/domain1/generated

6. Start Payara

    service payara start

7. Deploy this version.

    $PAYARA/bin/asadmin deploy dataverse-5.12.1.war

8. Restart payara

    service payara stop
    service payara start

Upcoming Versions of Payara

With the recent release of Payara 6 (Payara 6.2022.1 being the first version), the days of free-to-use Payara 5.x Platform Community versions are numbered. Specifically, Payara's blog post says, "Payara Platform Community 5.2022.4 has been released today as the penultimate Payara 5 Community release."

Given the end of free-to-use Payara 5 versions, we plan to get the Dataverse software working on Payara 6 (#8305), which will require substantial efforts from the IQSS team and community members, as this also means shifting our app to be a Jakarta EE 10 application (upgrading from EE 8). We are currently working out the details and will share news as soon as we can. Rest assured we will do our best to provide you with a smooth transition. You can follow along in Issue #8305 and related pull requests and you are, of course, very welcome to participate by testing and otherwise contributing, as always.

v5.12

05 Oct 14:18
71341c0
Compare
Choose a tag to compare

Dataverse Software 5.12

This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

Support for Globus

Globus can be used to transfer large files. Part of "Harvard Data Commons Additions" below.

Support for Remote File Storage

Dataset files can be stored at remote URLs. Part of "Harvard Data Commons Additions" below.

New Computational Workflow Metadata Block

The new Computational Workflow metadata block will allow depositors to effectively tag datasets as computational workflows.

To add the new metadata block, follow the instructions in the Admin Guide: https://guides.dataverse.org/en/5.12/admin/metadatacustomization.html

The location of the new metadata block tsv file is scripts/api/data/metadatablocks/computational_workflow.tsv. Part of "Harvard Data Commons Additions" below.

Support for Linked Data Notifications (LDN)

Linked Data Notifications (LDN) is a standard from the W3C. Part of "Harvard Data Commons Additions" below.

Harvard Data Commons Additions

As reported at the 2022 Dataverse Community Meeting, the Harvard Data Commons project has supported a wide range of additions to the Dataverse software that improve support for Big Data, Workflows, Archiving, and interaction with other repositories. In many cases, these additions build upon features developed within the Dataverse community by Borealis, DANS, QDR, TDL, and others. Highlights from this work include:

  • Initial support for Globus file transfer to upload to and download from a Dataverse managed S3 store. The current implementation disables file restriction and embargo on Globus-enabled stores.
  • Initial support for Remote File Storage. This capability, enabled via a new RemoteOverlay store type, allows a file stored in a remote system to be added to a dataset (currently only via API) with download requests redirected to the remote system. Use cases include referencing public files hosted on external web servers as well as support for controlled access managed by Dataverse (e.g. via restricted and embargoed status) and/or by the remote store.
  • Initial support for computational workflows, including a new metadata block and detected filetypes.
  • Support for archiving to any S3 store using Dataverse's RDA-conformant BagIT file format (a BagPack).
  • Improved error handling and performance in archival bag creation and new options such as only supporting archiving of one dataset version.
  • Additions/corrections to the OAI-ORE metadata format (which is included in archival bags) such as referencing the name/mimetype/size/checksum/download URL of the original file for ingested files, the inclusion of metadata about the parent collection(s) of an archived dataset version, and use of the URL form of PIDs.
  • Display of archival status within the dataset page versions table, richer status options including success, pending, and failure states, with a complete API for managing archival status.
  • Support for batch archiving via API as an alternative to the current options of configuring archiving upon publication or archiving each dataset version manually.
  • Initial support for sending and receiving Linked Data Notification messages indicating relationships between a dataset and external resources (e.g. papers or other dataset) that can be used to trigger additional actions, such as the creation of a back-link to provide, for example, bi-directional linking between a published paper and a Dataverse dataset.
  • A new capability to provide custom per field instructions in dataset templates
  • The following file extensions are now detected:
    • wdl=text/x-workflow-description-language
    • cwl=text/x-computational-workflow-language
    • nf=text/x-nextflow
    • Rmd=text/x-r-notebook
    • rb=text/x-ruby-script
    • dag=text/x-dagman

Improvements to Fields that Appear in the Citation Metadata Block

Grammar, style and consistency improvements have been made to the titles, tooltip description text, and watermarks of metadata fields that appear in the Citation metadata block.

This includes fields that dataset depositors can edit in the Citation Metadata accordion (i.e. fields controlled by the citation.tsv and citation.properties files) and fields whose values are system-generated, such as the Dataset Persistent ID, Previous Dataset Persistent ID, and Publication Date fields whose titles and tooltips are configured in the bundles.properties file.

The changes should provide clearer information to curators, depositors, and people looking for data about what the fields are for.

A new page in the Style Guides called "Text" has also been added. The new page includes a section called "Metadata Text Guidelines" with a link to a Google Doc where the guidelines are being maintained for now since we expect them to be revised frequently.

New Static Search Facet: Metadata Types

A new static search facet has been added to the search side panel. This new facet is called "Metadata Types" and is driven from metadata blocks. When a metadata field value is inserted into a dataset, an entry for the metadata block it belongs to is added to this new facet.

This new facet needs to be configured for it to appear on the search side panel. The configuration assigns to a dataverse what metadata blocks to show. The configuration is inherited by child dataverses.

To configure the new facet, use the Metadata Block Facet API: https://guides.dataverse.org/en/5.12/api/native-api.html#set-metadata-block-facet-for-a-dataverse-collection

Broader MicroProfile Config Support for Developers

As of this release, many JVM options
can be set using any MicroProfile Config Source.

Currently this change is only relevant to developers but as settings are migrated to the new "lookup" pattern documented in the Consuming Configuration section of the Developer Guide, anyone installing the Dataverse software will have much greater flexibility when configuring those settings, especially within containers. These changes will be announced in future releases.

Please note that an upgrade to Payara 5.2021.8 or higher is required to make use of this. Payara 5.2021.5 threw exceptions, as explained in PR #8823.

HTTP Range Requests: New HTTP Status Codes and Headers for Datafile Access API

The Basic File Access resource for datafiles (/api/access/datafile/$id) was slightly modified in order to comply better with the HTTP specification for range requests.

If the request contains a "Range" header:

  • The returned HTTP status is now 206 (Partial Content) instead of 200
  • A "Content-Range" header is returned containing information about the returned bytes
  • An "Accept-Ranges" header with value "bytes" is returned

CORS rules/headers were modified accordingly:

  • The "Range" header is added to "Access-Control-Allow-Headers"
  • The "Content-Range" and "Accept-Ranges" header are added to "Access-Control-Expose-Headers"

This new functionality has enabled a Zip Previewer and file extractor for zip files, an external tool.

File Type Detection When File Has No Extension

File types are now detected based on the filename when the file has no extension.

The following filenames are now detected:

  • Makefile=text/x-makefile
  • Snakemake=text/x-snakemake
  • Dockerfile=application/x-docker-file
  • Vagrantfile=application/x-vagrant-file

These are defined in MimeTypeDetectionByFileName.properties.

Upgrade to Payara 5.2022.3 Highly Recommended

With lots of bug and security fixes included, we encourage everyone to upgrade to Payara 5.2022.3 as soon as possible. See below for details.

Major Use Cases and Infrastructure Enhancements

Changes and fixes in this release include:

  • Administrators can configure an S3 store used in Dataverse to support users uploading/downloading files via Globus File Transfer. (PR #8891)
  • Administrators can configure a RemoteOverlay store to allow files that remain hosted by a remote system to be added to a dataset. (PR #7325)
  • Administrators can configure the Dataverse software to send archival Bag copies of published dataset versions to any S3-compatible service. (PR #8751)
  • Users can see information about a dataset's parent collection(s) in the OAI-ORE metadata export. (PR #8770)
  • Users and administrators can now use the OAI-ORE metadata export to retrieve and assess the fixity of the original file (for ingested tabular files) via the included checksum. (PR #8901)
  • Archiving via RDA-conformant Bags is more robust and is more configurable. (PR #8773, #8747, #8699, #8609, #8606, #8610)
  • Users and administrators can see the archival status of the versions of the datasets they manage in the dataset page version table. (PR #8748, #8696)
  • Administrators can configure messaging between their Dataverse installation and other repositories that may hold related resources or services interested in activity within that installation. (PR #8775)
  • Collection managers can create templates that include custom instructions on how to fill out specific metadata fields.
  • Dataset update API users are given more information when the dataset they are updating is out of compliance with Terms of Access requirements (Issue #8859)
  • Adds...
Read more