Blog

Last modified by Julia Gilmore on 2024-05-24, 10:10

Feb 06 2020

Introducing the Data Curation Tool

The Scholars Portal Dataverse team has been hard at work on the new Dataverse Data Curation Tool as part of our Canarie RDM grant project. Development on this project is being led by Victoria Lubitch, Programmer/Analyst at Scholars Portal.

The Data Curation Tool (DCT) allows data owners and curators to create and edit variable-level metadata for any tabular file in a dataset. Users can access this tool as a modular application once they’ve uploaded a tabular file (e.g., SPSS, R, Excel, CSV) to a dataset in Dataverse.


The Data Curation Tool

The Data Curation Tool


Similar to tools like SPSS, the DCT allows users to view summary statistics about their data, add variable information like 'Interviewer Instructions' or 'Notes', create variable groups, and indicate weighting variables.


Summary Statistics in the DCT

Summary statistics in the DCT


Variable editor in the DCT

Variable editor in the DCT


Once edits have been completed and saved back to Dataverse, these changes can then be downloaded as an XML file or exported to a codebook.


Example of a codebook in Dataverse

Example of a codebook in Dataverse


Usability testing sessions were recently completed with 5 participants, who worked through a series of tasks and helped us identify areas where the user experience could be improved in the tool. We’re now working on translating this tool to be used in French, with translations provided by the University of Ottawa.

A demo of this tool is available online, and the code can be accessed on GitHub. The Data Curation Tool will be launched with the next Scholars Portal Dataverse upgrade, currently scheduled for the end of October, and will be available for community testing soon.

If you have any comments or suggestions, contact us at dataverse@scholarsportal.info. If you would like to see all the updates and have a SpotDocs account, click the "Watch this blog" button on the top right corner of the page to receive notifications.

Jul 09 2019

CANARIE project update: Dataverse for the Canadian Research Community

Background

Welcome to our Scholars Portal Dataverse blog, where we will be sharing news and updates about the Dataverse platform and service, including development work. Our first blog post provides an update about the development project "Dataverse for the Canadian Research Community"! This project is funded by CANARIE's RDM grant program and led by Scholars Portal and University of Toronto Libraries, with support from CARL and Portage. 

We're currently about half way through our 18 months of development work (October 2018-March 2020).

The aim of the grant is to enhance Dataverse to address the needs of a broad range of researchers in Canada through improved scalability, improved integrations with Canadian cloud storage and authentication providers, and better support for data curation workflows. These three areas of development are described further below and will be discussed in more detail in future blog posts. 


Scalability

The goals of the first leg of the project include:

  • Optimize system architecture for scalable use
  • Connect to existing Canadian cloud data storage environments
  • Support large files in upload/download contexts


Planned deliverables:

  • Develop and test connections to SWIFT object storage, such as the Ontario Library Research Cloud (OLRC)
  • Support Globus endpoints with file access mediated outside of Dataverse application
  • Develop large-file upload utility to support deposit of larger file sizes (2GB+) into Dataverse

image2019-5-29_10-19-31.png

Authentication

The goals of the second leg of the project include:

  • Integrate with Canadian authentication infrastructure

  • Streamline login workflows

Planned deliverables:

  • Integrate Dataverse with CAF Shibboleth Login for single-sign on
  • Investigate further integration with ORCID to support linking research outputs


Shibb2.PNG

Data Curation

The goals of the third leg of the project include:

    • Enhance multi-disciplinary support for data curation
    • Enable users to adopt metadata standards and best practices

Planned deliverables:

  • Data Curation Tool, a modular application integrated within Dataverse that would allow users to create and edit variable-level metadata of tabular data files to aid in data re-usability

image2019-5-29_10-28-25.png

Status update

Our Project Timeline & Deliverables roadmap is included below. We have completed our first two deliverables and are currently working on the third.

For our first deliverable to connect Dataverse with Swift as the primary storage service, we stood up a test instance of Dataverse connected to the OLRC. The SP team tested upload and download functionality, as well as the integrity of files stored, with a variety of file types and sizes, along with other functionalities core to Dataverse. The idea behind this type of configuration would allow us to more easily scale the system, add storage resources, and run the platform more optimally.

We have also successfully configured Dataverse to work with Shibboleth for single sign-on using the University of Toronto as the test case. We are now initiating a pilot project with interested institutions to test out new sign-up and login workflows. More details to come in another blog post.

Currently, we are working on completing our third deliverable - developing the Data Curation Tool. We presented the DCT prototype at NADDI (link to slides) and at the Dataverse Community Meeting (link to slides). Feel free to test out the Data Curation Tool - Prototype and stay tuned for a future blog post describing the development of this tool.

In the fall, we will start to focus on the large-file support and storage connection pieces of the project.

We will be sharing more details about these deliverables and details about the development work in upcoming blog posts! If you have any comments or suggestions, please feel free to contact us at dataverse@scholarsportal.info. If you would like to see all the updates and have a Spotdocs account, click the "Watch this blog" button on the top right corner of the page to receive notifications.


Project Timeline & Deliverables

Roadmaptest.PNG