Manage your team’s data, attach metadata, and publish to ICPSR using SEAD

Manage your team’s data, attach metadata, and publish to ICPSR using SEAD


Thank you, Linda. And thank you all for joining me for this ICPSR Data Fair webinar. My name is Anna Ovchinnikova and I’m the User Education and Support Specialist with SEAD. Today’s webinar will help you learn about data services offered by the SEAD project and demonstrate how you can easily publish data directly from your SEAD project space to openICPSR, ICPSR’s research data sharing service. As always, we are interested in your feedback and strongly encourage you to ask questions. So please type those into the question field and we’ll leave time at the end of the presentation for questions. So let’s begin. For those who are unfamiliar with SEAD, briefly about the project. The SEAD project is supported through NSF’s Data Net program and is led by the University of Michigan in collaboration with Indiana University and University of Illinois. SEAD provides researchers working with a broad range of physical and social science disciplines with easy-to-use tools for managing, sharing, and publishing data designed to support researchers through each step of data life cycle. SEAD’s secure access controlled project spaces make managing, organizing, describing, and collaborating over data simple. When you decide to publish and preserve your data SEAD streamlined publication workflow will guide you through the publication process that doesn’t require a lot of extra effort to submit your data for publication in the long-term repository. The SEAD project was launched in 2011. And most recently we successfully released our first production version of SEAD 2.0, at the end of July. Building off the previous platform, SEAD 1.5 and the feedback gathered during our beta release of the new platform, SEAD 2.0 represents the next generation of and data services for managing, sharing, curating, and publishing data. Among SEAD 2.0 many exciting new features are things like: on-demand creation of project spaces. SEAD users can now easily create project spaces with a click of a button and publish their data sets directly from their active project spaces to seize long-term partner repositories like openICPSR. So you may ask, “Who can use SEAD?” If you’re a researcher who are working in a physical social science area, working collaboratively in a team environment, anybody who needs a secure central workplace for their data. If you’re looking for an easier way to share, publish, or make your data public, SEAD is a place for you. Researchers increasingly need access to reliable tools for managing, sharing, and publishing data and many research teams out there, right now, are challenged to find a centralized way to manage their team’s data. During the course of their project they use services like shared service space or Dropbox, but those tools do not allow the type of annotation and metadata adoption in addition that scientists do need in order to be able to understand and find their own data. Additionally, when you want to publish your data you effectively have to engage in a separate set of steps to prepare and submit their data for publication. And SEAD is an answer for this problem. Using SEAD project spaces you can incrementally add, organize, and describe data with metadata over the course of the project and when you’re ready submit your data sets from your project space in SEAD for publication or long-term repository, saving you time and effort of having to organize and describe your data again at the end of your project at the time of publication. So SEAD project spaces are like Dropbox, but better for scientists. SEAD project spaces are secure team controlled work areas and each project space has a public-facing homepage, which you can add your own branding to. You can invite your team and others who you’d like to share your data with and assign them appropriate roles and permissions. And if your goal is to publish your data at ICPSR, you can potentially invite ICPSR’s professional curator to your project space to help guide you on how to best prepare your data for archiving at ICPSR, which metadata are required, and which metadata are recommended to ensure that your data are discoverable and reusable. As long as SEAD recognizes the file types, it will extract whatever metadata it can from within the files. Things like file size, creation date, location. SEAD uses plug-in infrastructure that allows you to easily connect existing extractors for video, geospatial, Office, and other file formats. SEAD also allows you to add your own custom metadata, which you can add incrementally as you collect and upload your data. Also in SEAD, you can make individual datasets in your project space public and I will explain this option a few minutes later. And finally when you’re ready to publish your data and you need a DOI for citation, you can easily submit your data for publication directly from your project space in SEAD to a long-term repository and all it takes is a click of a button. SEAD allows you to set your own standards in your project space. You can add your branding, as I mentioned, to your space, customized metadata options in your space, and even point to external controlled vocabularies that then would define which values are allowed for a specific term and you can attach metadata at a different level of hierarchy, at dataset level or file level. To upload data to SEAD, you start with creating a dataset. You can either browse for files or simply drag and drop your files anywhere on the page. Once your files are listed on the page you can uphold them individually or all at once. You can delete some file from being uploaded at that particular moment. And each dataset and file you upload to SEAD is given a unique and persistent URL that can be linked to it directly. A dataset can contain any number of files in any format. And datasets are the primary organizing structure for files in SEAD and there also are publishable objects in SEAD. So when you are ready to publish, you can submit your data set for publication and have SEAD’s matchmaker match you with a repository that can preserve your data for the long-term. Data sets can be further organized with folders and subfolders. There’s no real limit on how many folders and subfolders your dataset can have. One thing, though, to remember is that repositories often do have a limit on how deep the hierarchy of your dataset is. openICPSR has no such limit. In terms sharing your datasets across projects spaces, by default datasets are only accessible to users who created them. You can choose to share your dataset with a project space that you created or have access to. Another option for sharing your datasets is for you to copy your datasets to project spaces where you have access. And you can copy your dataset an unlimited number of times to project spaces where you have access. So I mentioned making your data public earlier. You have the option of making individual datasets public, and it’s a great option to easily share your data if you don’t have to publish it. Project space administrators can set the entire project space public in SEAD, if necessary. They’re simple toggle mechanisms that you can see on the screen in one example, that allow you to switch the access setting from public to private and back. The making data public option is quite different from publishing data. It allows anybody to view and download your live data that live inside your active project space. Your public datasets will change over time when you update and further develop that dataset. And obviously a public… those public datasets are not assigned DOIs which you would need to properly cite your data. So this is not a solution for preserving your data and making it available for the long-term. SEAD makes it easy to manage data. You don’t have to worry up front whether data on your project space is good data or bad data, correct or incorrect. When you’re ready to publish, then you can decide which subset of data in your project space you want to publish and easily construct that dataset which you will be submitting for publication by moving files from one folder to another, from one dataset to another. SEAD allows you to add free form tags and custom metadata. You can use them to search and navigate around your data helping it to be more effective and efficient with managing and working with your data and your project space. SEAD also makes working with the data easier. For supported file types, which are basically any commonly used file types out there. SEAD provides thumbnails and in browser previews. Thumbnails help you quickly navigate through your datasets and files. And with previews you do not need to download your files to see what’s inside. You can playback videos and audio, navigate between spreadsheets in an Excel workbook. For files like PDFs, Word documents, and PowerPoint presentations, we can flip from page to page. SEAD offers the ability to see a map overlay of your geospatial datasets and if your dataset contains multiple geospatial files, you can actually see them all on a single map. You can also turn those data layers on and off and adjust the opacity of individual layers. Furthermore, each dataset and file you upload will have their own information pages that will include the other extracted metadata, as well as the custom metadata that you and your team can add. To facilitate better organization of datasets in your project space SEAD also provides the option to create collections. You can group datasets into collections and sub-collections by topic or by any other meaningful attribute that will help you in your work. As you can see in this simple example on your screen, one dataset can belong to an unlimited number of collections. So I hope I was able to help you see that SEAD provides many useful options for you and your team to effectively collaborate around your data and SEAD project spaces. From sharing datasets with your team’s project space, and copying your datasets to project spaces, to describing the data with custom metadata and free form tags and use them to search and navigate around your data. Being able to have conversations around your data, being able to work with in browser previews of your files. You can also follow people and data in SEAD and see related updates on your dashboard every time you log in to SEAD. SEAD’s publication process can literally be initiated with a push of a button and depending on the repository chosen be completed in the entirely automated workflow. Publication in SEAD involves automated data and metadata review and a submission to long-term a repository. Working with SEAD, which have documented policies and practices for preserving data and assignment of a persistent global identifier, in most cases are DOI, digital object identifier, that you and others can use to cite your data. SEAD also automatically registers all published data with the DataONE federated catalog, which provides a faceted search over data from a broad range of projects. The important difference about SEAD is that it allows you to incrimentally upload and describe data over time. So SEAD is a place for you and your team to work and collaborate over your data during your project, and when you’re ready you can decide which subsets of data on your project space you want to publish. The unique thing about SEAD is that it works with multiple repositories and can help match your data with an appropriate repository so you can decide, for example to publish data that are directly tied to publication one way and then separately publish the larger set of raw data to make them available to those who maybe want to really dig into your research. And you may even decide to publish data that are not perfect for historical purposes. The publication process in SEAD’s new platform SEAD 2.0 is easy, and streamlined, and allows you to quickly address any basic issues that may prevent your dataset from being accepted for publication. For example, SEAD will inform you if a required metadata field is missing, if your dataset exceeded the total size that can be accepted. And SEAD will take care of sending the data out to the repository and will provide you with feedback and status updates. So you may ask, what exactly happens when you push the publish button in your project space in SEAD? So when you decide to publish your dataset, a frozen copy of your live dataset, called Curation Object, is created in the staging area of your project space where you can make final adjustments to further prepare it for publication. Any changes you make to the Curation Object in the staging area will not affect the live version of that data set in your active project space where you can continue working on it and submit new versions for publication over time. There are three easy steps that will guide you through the work flow in the staging area. Step 1, you can choose to remove some files or add additional metadata. And when you’re satisfied you move to step 2, where you review the feedback from SEAD’s Matchmaker and select your repository. SEAD’s Matchmaker recommends repositories for publishing your data set and provides instant feedback on why the dataset would or would not meet the requirements of a given repository. At any time you can go back to step one to address a particular issue pointed out by the Matchmaker. How does SEAD’s Matchmaker know which repository to recommend? Each repository working with SEAD creates a profile [unintelligible] system where it specifies its requirements. For example, [unintelligible] of interest, any required metadata, accepted data formats, it may indicate options related to embargo retention periods, license terms, any other aspects of their curation and preservation processes. SEAD’s Matchmaker performs automated data and metadata review and uses this information from the repository profiles to recommend the most appropriate repository for publishing your dataset. And finally step 3, you click Submit to send your data on the path to publication. At any time you can go back to the staging area to check the status of your publication or see a DOI for your previously published dataset. Often once the repository receives your request for publication it will initiate the publication process which may include additional steps for you to take. The repository can send you an e-mail and invite you to log into their system to complete the process, but the most important thing is that you’ve done the work in SEAD and those final steps can be very quick and easy for you to complete the process. In a few minutes you will see how a request for publication from SEAD to openICPSR are submitted and processed. Once the repository publishes your dataset, it will assign your publication a DOI which you will be able to see the staging area of your project space. Your publication will also have a landing page, like the ones you see on your screen, which others could use to access your publication. The way the landing page will look and how it will present information about your dataset depends on the repository that generates your publications landing page. SEAD utilizes standards based, archival formats and symbols, the data using it parallel SEAD archiving library with an internal index when packaging your dataset which helps to minimize demands on the repository system with lower memory and disk space requirements. This is important for larger data packages allowing repositories to store them in a way that does not require retrieval of the whole archive to retrieve metadata or individual files. So you will be able to download individual files without downloading the entire publication. SEAD can be utilized in Data Management Plans through SEAD enabled data management plan, researchers can increase their research efficiency, promote its visibility and meet institutional grant requirements. If you’re a new user are planning to include SEAD in your data management plan we would love to know about your project and if there’s anything we can do to help you get started. So please reach out to us if you plan to include SEAD in your data management plan. We will now transition to the demo portion of the presentation. So I’m going to switch my screen here to go to the website. So on your screen you see SEAD’s main website, SEAD-DATA.NET, where you can find lots of useful information about the project. Here briefly I’d like to draw attention to the publications and presentations page where I’m going to be posting the recording of this webinar once it becomes available. You can also subscribe to SEAD updates on the website to be notified about upcoming events and new features being added to SEAD. So I’m going to access SEAD 2.0, well I’m going to logout so you can see how exactly that would look for you when you access 2.0 for the first time. So I have an account and I’m going to login. I’m going to use a local login and you can see that there are other options that SEAD supports through Google, Facebook, Twitter, and ORCiD. I’m taking you to my Dashboard where I can see the recent events. I can see the project spaces that I created using this tab Dataset that I created, collections that I created, any followers. I can also quickly access my profile and update it, if I need to. I can quickly create a new project space, a new dataset, or a new collection. There’s also a top main navigation. By the way, clicking on the SEAD logo will always take you back to your dashboard, but also under this new tab there are the same links to take you to your dashboard, to the project spaces. Well actually this particular link works slightly different, it shows not only the project spaces I created, but also the project spaces that I have access to. Same thing with the datasets and collections. And I can also explore SEAD, in general. I can see any public data out there, all the project spaces that are in SEAD, currently. And this Create tab is a quick way for me to create a new project space, a new dataset, or a new collection from anywhere on the site. So I would like to start with showing you a project space that I was planning to use for this demo. I already have some data loaded, some datasets as you can see, and some collections. I’d like to draw your attention to this set of links on the right side of my screen. I can use these links to manage users in my project space. If it’s a user that already has an account in SEAD I can use the auto fill function here to quickly find the person and invite them, assign them, automatically assign them a role. As you can see, there are three user roles that project spaces provide you by default. A user can be an admin, an editor, or a viewer. So I can just simply add someone to a particular role and click Submit and this person will be sent an email notifying them that I just added them to my project space. I can also invite someone from outside, someone who doesn’t have a SEAD account, at this particular time, by including the right email address here. I can also include some message and at the same time assign them a particular role. Also if someone noticed my project space and my project space is private, and they requested an access I will see any requests for access under this tab. I recently logged in with my other email address to show you how this works and requested access so I can accept or reject and also assign a role. So I’m going to make me or, no this other Anna if you were, and I’m going to accept and you will see that I’ve been added. I can, that easily, remove someone from having access to my project space. Other things that I can do… I can edit my project space, I can change its name, modify the description, I can add additional external links. This kind of like, part of the branding, one with the logo and the custom banner that I can add to my project space and here’s that cool toggle mechanism that I can use to instantaneously change the access settings for the entire project space. Currently set to private, I can very easily make it a public project space. So if I logout, I will be able to see the data in my project space if it hasn’t been specifically set to be private. So yeah, actually I wanted to show you how that would look so I’m going to log out and explore project spaces. Actually this is a great example, this is a public project space and it has… so this is a public facing page that has some public datasets on it. I’m going to login back. Okay. I’m going to go back to my demo space here. Another cool thing you can do in a project space, as I mentioned during the first part of the presentation, that you can actually fully customize metadata options in your project space. And this is the link for you to do it, we click on Manage Metadata, Terms and Definitions. I can easily delete any of this metadata options, just like that, or I can add a new one by giving it a label and adding it to the metadata options in my space. And in terms of adding metadata, as I said earlier, you can add metadata at a dataset level or at an individual file level. So for example, this particular dataset… I have some files loaded and I can add metadata to the entire dataset by using this feature here. I can also leave comments for the entire dataset, I can add tags. I can also go into the specific file inside my dataset, by the way this is an example of a video, courtesy of one of the projects in SEAD. Some sand, so thank you. Oh yeah, but the thing is I want to show you, these are the options for me to add metadata on the file level. And I can also add a comment on the file level too. In terms of previews, working previews… so like for example, for this PDF file I down loaded a codebook from the public study, ICPSR study, so we can scroll through the document. Actually go through page to page as they say. Another interesting thing are the geospatial files, I wanted to show you that. So for example, I have this geospatial file that has a data layer. I can change the opacity, I can zoom in, move the map around. And because there are two geospatial files here, I can go into the visualizations tab and so unfortunately, in this example, the layers are not in the same area. But I can point you, this is that other file… like if I turn the map off, you can do that too. Here’s a one-layer and here’s another one, and you can zoom in and go to, you know, specific area to examine it and change individual opacity of this particular data layer. So go back, we’re going to go back to my project space. So one thing that I’d like to do is to show you how to publish a dataset to openICSR. So I have this example of a study that we loaded to the demo project space for this demonstration. It has some study documentation materials: codebook, survey instrument, observation guide, and survey data. So loaded some metadata here already as you can see. There’s a comment, we have some tags, and I can also see some previously published versions. These are links that can take me to the staging area, which can show you as well, just very quickly before we publish. So there’s a link here to go to the staging area where I can see all my curation objects, and the status, and the dates when I submitted them for publication. And I can go into each one of them and see the status, see the updates. But we can look more closely when we actually publish this dataset. So I’m going to click the… so let’s say my team and I worked on this dataset over the course of the project, we’ve done the work, we assembled it the way we wanted, there’s metadata on the dataset level, we also (oops sorry) so this is the dataset level metadata, we also included some metadata on the file level. So for example, for this codebook we have some file level metadata. So yes, now we’re ready to publish, all I have to do is to click on this publish button, and I can give my Curation Object a name, another name so it will be easy for me in the future to find it. I’m going to add my name as a creator and create my Curation Object. Now I’m in the staging area and these are those three easy steps that I was describing before. So here I can decide to add more metadata. For example, I can add another contact. I can maybe go in and delete a particular file for some reason. I’m just going to do it just to show you how easy it is. And let’s say, yes that’s all I need to have here. I can quickly move to step 2 and this is where I interact with SEAD’s Matchmaker. I can review the live feedback, and currently in SEAD you can also do a test publication, if you want to do just a test run, see how things are going to look when you actually real publish your data. We’re going to do a real publication to publish to openICPSR. So in terms of Matchmaker’s feedback, this is where you see the different messages. And because I was so good at managing my data over the course of the project and all the metadata is there, and the openICPSR likes all the other aspects about my dataset, like the file types, the size, the affiliation which actually for openICPSR doesn’t require a particular affiliation, but here’s a simple example what the warning that I’m breaking some rules would look like in SEAD. And by the way, we’re improving how this review looks so in the next update, the coming 2.0, this will have a better presentation, this information. So I do know that I want to publish to openICPSR, so I can choose this repository, and go to step 3 to review the summary, and click the submit button. Here I am. So at this point I can go to the staging area to see the status, see the update, see the publication’s process has been started. And here I am seeing that the processing will begin. I can click the Update button to see if there’s a new update, and great it says that ICPSR received this deposit successfully and tells me to use my profile, with my ORCHiD ID to log into their system, openICPSR system, to complete the process. Wonderful, so I can grab the link, just quickly move there, and I already logged in because I played with this before the demo today, and you can see that my files are here and the metadata that I’ve been working on in my SEAD project space is being nicely packaged into this oremap.json file. So because again, the beauty of this is that I’ve done my work, I don’t have to do anything else. My files are here, my metadata is here, all I need to do is to click the Publish This Project button. Again, this is just for verification purposes, for final review and I can proceed to publish my dataset. openICPSR asked me some additional questions here. Again this is an example of how SEAD works with the repository, that at this point, the repository can have their own standard process, the steps that they want the users to take. So I’m to answer this questions whether I have any identifiable information in my dataset, any sensitive information? I’m going to say no, no, and I’m not interested in any embargo period, and I’m good with the deposit agreement. And I can just click, yes and publish my dataset. So usually we would wait a few seconds for things to go through, but in my staging area I can actually quickly see that, hey it’s been already published and here’s my DOI. Here we go. Okay, so I’m going to go back to my presentation here to tell you about some new features that are also coming to 2.0 very soon. And those are Bulk Operation capabilities for you to be able to move, and delete, and apply metadata, and geotag your data to several files at the same time. And to also set relationships between datasets, so that you would be able to say, this data set is in your version of that one, this one has a description in that other dataset. And there will be also some improvements… display improvements, as I mentioned. Some of them already related to Matchmaker review, display of metadata as well, and comment alerts. To be able to see notifications when a comment has been added to data that you created or data you have access to. So we’re going to pause here for questions and I’m going to invite Dharma Akmon, who is the Associate Director of SEAD to join the discussion. Let’s see, we’re going to pull up that questions window. So the first question we have here, “Does SEAD have precautions to stop restricted data from being shared and uploaded to openICPSR?” Well as [unintelligible] openICPSR does ask you during the final steps, they do raise a flag and ask you to tell about them if there is any identifiable or sensitive information that you’re uploading. And if you say “yes” I think at that point they will ask you to stop the process of uploading the data to openICPSR and contact them for additional steps. Dharma, do you have anything else to say? [Dharma] Yeah, I would like to add in that we really want to leave this up to the repositories themselves to manage. So SEAD has made it flexible enough that repositories can implement their own rules and safety checks on this matter. So it’s up to openICPSR on how to manage and ensure that restricted data doesn’t become shared or uploaded to their system. They could implement… for example, if they wanted to they could implement their own system to check things over before they get put in the system, but SEAD doesn’t do that. We just enforce what the repositories would like us to do. [Anna] Okay, so the next question here is, “Do you know of any curricular uses of SEAD project spaces for teaching methods, class learning, the principles of survey design by creating a survey?” SEAD it such a flexible system that it can be… I can see very easily being able to be used in a learning, in a classroom environment. As far as I know it hasn’t been. [Dharma] We have one example, a couple years ago, of a class that used it. It was an Ecology class in the School of Natural Resources here at U of M. And they were doing a project that was learning oriented that connected students, who are studying ecology, with local Land Managers in the Ann Arbor area to connect the data that they were creating in the class as part of the learning process with the Land Management people. So that’s the only curricular use that I can think of, but I think like Anna said, our system is flexible enough to allow for that. And we would welcome something like that in the system. [Anna] “Does SEAD cost anything to use?”
Currently there’s no fee to use SEAD and currently the repositories that SEAD is working with doesn’t require a fee to publish data. [Dharma] Yeah, we just say that if you have a data sizes that are beyond the sort of… right now I think we say that if you’re going to upload more than 10 gigabytes of data to please let us know. Because at a certain point we have to keep buying more space to accommodate that, but we don’t have any fees right now for using SEAD services. [Anna] Another question here, “How would you distinguish SEAD from the Open Science Framework?” I’m not familiar with all the aspects of the Open Science Framework, but I can say that the SEAD difference comes from it’s philosophy and it’s unique model of allowing you to use your project space over the course of the project. You have to worry about having like correct or incorrect data. You can make decisions about publishing anytime you want. You can decide what data you want to publish any time you want. And SEAD works with multiple repositories that can help you match your data with an appropriate repository at the time of publication. So I think this creates the unique difference. [Dharma] Yeah and I have two things to add to that, because actually we get asked this question quite frequently about comparison to other resources. So we’ve begun kind of looking at them and picking them apart so that we could highlight the differences. And I think there are two main ways that we differ from Open Science Framework. One, is the connection to institutional repositories so that when you’re done you can easily transition your data to a long-term preservation environment without a lot of extra work. I don’t think Open Science Framework has that concept at all, right now. And I think another way, is that we have pretty sophisticated metadata management. So you can customize what fields show up in your space so that you can manage how people on your team should be putting metadata to annotate the things in your space. Whereas Open Science Framework kind of just has a set of what they have and you can’t really customize what things show up as optional fields or metadata. [Anna] Okay well and the next question here is, “For what length of time will SEAD be funded by the NSF?” [Dharma] Right now we are entering into a no-cost extension. So the data in that program, which is the vehicle that we’re funded under at the NSF, the period ends at the end of this month, September. We have remaining funds that we plan to spend down on a no-cost extension over the next year. And during that time we’re going to be working on a sustainability plan and a transition plan for SEAD going forward after that. So the current NSF funding ends sometime in 2017. [Anna] Thanks Dharma. I think our time is up. So I want to show you SEAD’s contact information here as my last slide here. And I want to thank everybody for joining us for today’s webinar. And again, if you have any questions, if we went through any topic too quickly, please reach out to us. Again, you can see SEAD’s contact information on your screen. We always welcome user feedback. And I wanted to thank SEAD users who are always supporting our project. And especially at the SEN group, who I’m sure are attending today’s webinar. So thank you and please enjoy the rest of your day. Thanks.

You May Also Like

About the Author: Oren Garnes

Leave a Reply

Your email address will not be published. Required fields are marked *