International Data Panel: Sources, Research Opportunities, and Challenges

International Data Panel: Sources, Research Opportunities, and Challenges


[Back ground conversation before presentation starts] [Linda] Good morning. Alright, I’ll try again.
Good morning! Alright! So this is going to seem late because tomorrow starts at
8:30 so hopefully you got some sleep this morning. Thank you for getting
yourselves here for the 2017 OR Meeting. We’re happy to have such a full
room here. A couple of administrative announcements, well actually there’s a
little bit of, I have some bad news and some good news. The bad news is that
Larisa is unable to join us today from Russia. What happened was with
the forced reduction at the Embassy in Moscow, even though we
were working with our Congressman Dingell here in the offices here at Ann
Arbor, she was not able to get an interview for her visa. So she’s unable
to, was unable to get out of Moscow which is really too bad for that reason,
but actually for another was that she was actually the one that inspired this
session this morning because she contacted me and said “hey I’d like to
come and talk to people about about JESDA”. So that’s the bad news. The good
news is that I’m not going to necessarily take you through the slides.
I’m just going to point out a couple things. I’m not going to try and do a Russian
accent or anything like that, but I do want to make sure that you realize
that what she did send us was, and everyone should have one of these in
your packet, is a flier that is basically a lot of what the slides that she has
here. I have extra ones that I’ll put over when I’m walking over. I’ll actually
just put them here you can pick them up once we’re completed. So if you want
extra, if you find that somehow the master stuffing committee missed it or
something and it’s not in your packet, you can feel free to get some extras
here. The other way to get the information is that we have already
posted these on the online program under Presentation Materials. And effectively
it’s the same information that’s on the flyer and it has very importantly the
contact information on the last slide. So this is the International Data Panel. We have Shuming from the China Data Center here at the University of
Michigan. We have Anya who is with us from GESIS, did I say that correctly very
good. And we have Hersh here from the UK Data Archive. The way we’re going to
handle it today is that each presenter will talk about their data, their
archive, how to access it, then they’re going to answer a few questions from the
audience. Do note that we are live-streaming, so hi to anyone that is
out there. So if you have a question we’re going to have to run this
old-fashioned microphone out to you, ask the question, then we’ll have to make
sure that you guys that are speaking through the microphone otherwise they
all they can see our mouths moving on the other end of the live stream. And
then they will also spend some time up here for other questions that come about
towards the end or at the end of the session. So with that I just wanted to
take you through a couple of the slides here from the archive, the Joint Economic
and Sociological Data Archive at the Higher School of Economics in Moscow.
Just a couple of pointers on this particular slide, at present JESDA
is storing more than 3,500 datasets and they have a few thousand users who
visit the site every month. So it looks like they launched it in 2000. So we’re at
about, what are we now 2017? Seventeen years now and already 3,500 data sets.
For your information I actually talked to Bobray Bordelon who is here, and
because I know him as my super international data person, and he said
that he hasn’t actually downloaded the microdata but he has downloaded
some of the statistical tables. Was that right? And then we had a staff member
who applied for access and downloaded some data and worked around with it,
so some microdata. So this is absolutely accessible to you. So there is an
application process, I think you have to wait… it took a couple days, of course we
have time differences going on, a couple days to get that access but indeed she
was able to download a few selected datasets. The depositor list again is in
your, is in the flier but you can see the type of data producers that are putting
data in JESDA. And again about 1500 surveys she said with free
access for teaching and research. Users in Russia and all this again is in the
slides or in the flyer that she provided us, but very importantly here’s the site,
her email address for any questions, and please do explore this database. I
think it looks like it’s some great resource for you and some great data in
in Moscow and about Russia. So with that I’m going to transition over to Shuming
and give us a moment to get through the high technical pieces of exchanging the
slides here. I’d ask for questions but regarding JESDA but I would have to
refer you to the email address. All right Shuming. [Shuming] Thank you. Well good
morning. This panel is called the International Data Panel. I’m the
director of the China Data Center, but also maybe you noticed there’s a [inaudible]
called the Spatial Data Center. We start from the China Data Center. So this year
is our 20 years anniversary. We start the China Data Center in 1997. All right thank you. Yeah but the way we start that, there not many data or international for international
studies especially for China. And so for 20 years we start build a kind of the
international partnership and with China and also with some other different
agencies. And make some, the China data available, accessible, and also promote the China data to the different China studies. But when we work
on the China data we realized it’s not only the China data. If we were to
understand the China you cannot only know the China, you can only have
the China data so then we start to expand our data, the collections and the
surveys from the China data to some other regions. So that’s how we now,
we start also the new name, called the Spatial Data Center and to expand the
International Data Center. So the way we are working on the China and when
we find, so once if we were to work on China you have to have international viewing. And this is some pictures from our comparative studies. So that’s, you know, the motivation why we’re China Data Center
need to expand to the different, the international data studies. If you
look at the data from different countries we find actually it doesn’t
matter properly any countries, there’s many people already say where China is special, but if you look at some data, China not
special. If you notice [inaudible] patterns. China and the US has many similar
patterns and trends. So this is manufacture trend in the US. And we see
the manufacturer, the term of the firm, number of firm and the number of the
employment has been decline from 1970s. But the GDP, actually the manufacturer
the GDP of the manufacturers keep growing. So this is what happened in the
US, but there are probably are many stories behind this one. There might be
a globalization or maybe improvement of the efficiency, productivity. So there are
many of the new technology so this is what happened in the US. If you look at
data in China, China actually also had a similar trend. But probably a
different development stage so China’s employment, the number of the
firm in manufactory also start to decline recently especially after
2010. And also percentage of GDP have also keep decline it’s very clear even the total GDP in manufactory has been tipped green. So the China-US has a
similar trend and a similar pattern. The only difference is that at different developing stage. This is for manufacturing, but also if you look at
the spatial pattern, not only the times the change the
structure changed with time, if you look at the spatial pattern the spatial path
also has significant change in the US and in China. This [inaudible] for the manufactory in the US so you can see for this
manufacture for medical equipment, and they have been concentrated in some region so finally they have the [inaudible] for the medical equipment. This not only happen to medical, but also happened to many other industries. Similar the trend also we have in China.
If your China medical equipment you can see they’re concentrated initially from very diverse initially, but then finally the [inaudible] is some
region. Not only the industries but also population. If you look at the population
in the US, you can see this is based on a track map and we work on [inaudible]
the US data. So our US [inaudible] will allow us to track how the population
changed up the tract neighbor. So if you have to count neighbor if you look
at a state neighbor you probably can realize what’s changed there, but the way you
look half down the pop nation at track neighbor you can see significant change there.
The Central US, North US, knows population Right. This is from 1970 to 2010, last 50
years the population held in North in many the Central, North, in the US so
this would be enormous even the total population in the US has been increased
[inaudible]. If you looked at China, the similar happened. The China, this is based
on a kanji data because we don’t have you know that smaller unit and for
the population consist in the comparable. This is population data based on a
cantilever and from 2000-2010, just 10 years you can see the population
most in the central China and north China. That’s a similar pattern as
the US even the total population keep agreeing. So those pictures can help to
see what kind of the dynamics between the US and China so you can see
there are significant changes in landscape in terms of the population, in terms of
industry. So we need to understand… so this is a global issue. I didn’t do
anything on the Europe Union and other region, but I believe there is probably a similar pattern. So we need to understand the global dynamics of
the population and the business. So we need to understand which industry are growing or
shrinking in terms of the number of the business, employees, and sales. And also where is
industry growing or shrinking? And in terms of the number of the business, employees, and sales. So even some some industries growing or some industry
decline, but they are not kind of consistent and over space. So someplace are
kind of growing, someplace decline. And also what’s the link between the
population and business in terms of the changes in space over time?
So those are kind of the dynamics our study, our research macronet/micronet. So for those kind of studies we can see they’re kind of interlinked
in that we need a system. Mostly our research is often
[inaudible] or specialists right? Now if you work understand these
dynamics of this kind of complex system we needed a data from our different kind
of aspects so this is the China data and the US data we have been working on and
since that within last 20 years for China that we start with peered up a see
a series the static data population data the economic census data and a penis
data and also environment data so those they are helping and compared at
different at initiated neighbors so from the one square kilometer zip code tongue
shape country and city and provost for us data we also have the compare per the
data start from the block chapter and a CCD country got sparkling
and ass data so all this data included populations and from different years and
also please data from 50 years so those data now allow us to look into inside
how what’s the change of the structure of the publishing and a business and
also what’s the change of a spaceship heading and word hi so those information
provide a very important the comparable data and for Honor’s standard is kind of
dynamic and cano Burdine amex so if we can understand and what happened in the
US and a China and it helps to understand what’s going on in any other
country in the region in the past errors in the future please bring the charities
how to access to do this data so the current and most
data series kind of similar to the our data Kyle so you put that data set on
the server and the people they can download data or you can maybe deliver
data in a cd/dvd abaya mirrors but now we have the data can be very huge
maximum for a child vida we have about murdering six southern rappers across
the country at any different neighbors from the tongue from the county from a
city promise data for us we have our more than 45,000 waivers for any
neighbors from the plot in the trapped county and metropolitan Stata you can’t
say I were not founded everything to my desktop and then start to use a software
would process this data now it’s only in Prospero
so the only solution so we needed a theater system to have this data
integrated the data can be integrated for easy access and then also for
comparable studies so we can find a know how the US and where the US where its
China’s more similar where that is similar and where is the difference and
where to what you can see the kind of footprint there so this is the cubes for
we have been working and since 2008 so we’re can start Peter we start also
you know start from very simple G is the system and then we generally Pierre this
kind of juice for and to integrate our skating data and into the online spatial
system so that now the cubes were allow you to report data and to generate
online map and also refine identify where is the penis and also we can also
integrate the data from the different source data remanence data and a
historical religion data and also we provided many online fashions for online
and it’s not some static state and chart and graphs and specializations so this
is the interface or the China Jukes for and this Chinese war had
the data the sales data from 2010 to solving our so savers data from the
previous sense and also finished science data and economic data from different
years and different scares and also that’s a similar for the u.s. Dukes were
and we paired up so in the US where for China Department is a mini data because
we have too many data for child it is a very única because we work and with
national bureau status so there are many data actually we are getting from the
first can not from the Fisher publication so many data you are
probably cannot find from official publications majority of data but for us
theta the many data property you can knock them senseless the publication you
can find from the US Bureau of census but the big challenge is the power will
be changed were the census every census they have changed that redefine or make
some change to the trap block metric or anything so the data
even are free but if ticked I’ll work on that one arms it’s impossible for most
rest corners for most Irina if they don’t have advanced skills and also if
don’t have the extensive the data source so that’s bring some charity also the US
government don’t release the teacher data from the Colombians and penis data
so those bring some charities for us studies so we have been a nook now we
have been work with some some us some the companies and make this data
available so and comparable across the u.s. so now also peer the u.s. that you
explore so they can compare with China so this is some features of the US and
achieve China to Explorer so we provide kind of network beta snap seen so you
can snap day that by seeker by rectangle polygons or you can define your polygons
so if this region not some I were finally score this Twitter I don’t have
this kind of any publication but I can define my region and then we can react
to the data and paste on the top nation science data economic data we have
business data and also you can generate the new data set
not some you can generate it by XY coordinate and also by sticker say I
would like to find out how many population how many the within this is a
five miles and from here and how many stars and foot stores and restaurants
and within the five miles three miles and also it provides many functions for
time-saving report charts and maps and also provides a marked performance for
data export and also they can increase the user data we can upload your data
for online mapping of alumni report or enemies and also we can allow you to do
the china-us comparison so either a good image height in the challenge to where
we are as a state in any state any provost now find out what kind of the
structure in terms of age structure public housing structure education and
where’s the similarity between when provost and other parties and how it
between one province of China and also with other states in the US similar for
any US you can pick up any state can find which progress in China similar to
your state’s today I provide because in China USA we can consider it as a United
Stated Union right if we compare to the unit European Union every state can
similar countries China similar every province in china it’s kind of a country
scale it’s different so it’s important to
understand and which data and a Provost are comparable so the that now even we
have the huge now a big Najim of the data for us any chance that is this is
they’re not enough to understand the dynamics of caliber the change there
because there’s no classification official classification is not good
enough it’s not picky now to define this complex system so there’s a many kind of
different industry knocks on desk or high school and for we have the
different kind of the high score private public for the University we have the
different neighbor of University they probably have the top well the higher
one tire to hire different tires right so those data your honor you know the
many different even many different classification you you
probably cannot find criticize from official statics so that so now we
subjected a provide opportunity new opportunity for those leftover studies
for those complex system the studies where advanced Chi but when we have the
pic data the picked a that are not replaced the baseline data so we can
find the baseline data even so that they’re sensitive many people start
where maybe our survey data maybe stands data maybe people were not needed this
kind there anymore that’s not correct those that
traditional that pay the canopy de and alternative for Christina where defying
so provide a kind of very important baseline that we heard baseline data so
that baseline data has very good represent human you know
say what’s data or core countries and therefore kind of stand for data
connection and procedures and also they have very good quality control
that’s ice cubes are data and a different idea in the many other survey
data the public and downloaded from the or different place or individual
authorities and the the issue is the problems for this baseline data may be
may be can it can be outdated knock some data we have for sensitive then the
newest one is 2010 for economics has a new slide 2013 so it’s already in the
saviour’s past and also there have Nimet authority there probably only cover the
paste on the question errors they’re connected right through there many
questions were they’re not connected by the census or by the stand in the survey
so that’s why our that we have different pretty at forty minutes away and also
it’s very expensive right to connect this kind of national wide at a– it’s
only can be an export by the governor and so the big data provide a kind of
the complete perry venue for our baseline data so why is the the good the
the monkeys they provided very expensive carriage basically you can cover
everything right any top and also they can’t be provide a high
spatial temporal resolution Maxim if you use social media if your cell phone you
can find actual XY locations and but for most of the government data they don’t
provide the precise and occasions and all of the data can keep updated you can
cut the knot a minute data can be from connect from internet but the issue is
any big data you can find and from any source the kind of sample data but the
problem is you don’t know what combination of this Emperor so that kind
of issue and also it’s unstructured and many of kind of data separate from the
different source and and also there’s many missing data there and also many
noisy there so that’s kind of issue of the big data so that kind of the problem
we picked a reaction it can be and actually our makeup by our pace 9 the
data from the carbonyl of sort the resource so we are being working and try
to integrate the big data with juice for so with that you are basically and for
to integrate a big data so the number one we need to the style the text
normalization and convert the non structured data into the stock data and
converted text data into the numerical data an or specialization and for many
data properly they don’t have specific the new creation so we have to identify
and where’s the pisser location and paste and then in equated with our
spatial data and other data and also realization and with the increasing the
size of the depict data how to rewrite data how to we’re not the data in not
only the size but also the term in different structures so this is kind of
the our Explorer work for how to integrate the big data for the China and
also plover studies so this is the wines that we are at the
web search and into our to score basically you can enter any keywords and
any keywords say I would like to find out door activities all right to find us
the Buddhists our like to find us a indoor activities or indoor sports
so those probably it cannot find any stats for this kind of data and based on
the web search and we can find okay whereas where each region has more
active more attention and for those keywords and we can find they use that
by that different the provost of different state different counties now
we can generate a special table and the analysis they can generate the map so in
this way we can convert the data from non structured data to the structured
data and convert from non spatial data to space your data caps data to the
numeric data and also the advantages because there’s unlimited key words we
can generate an unlimited data and those data can provide a complimentary data
and to the baseline data and for any analysis so we Pierre de cambiar de
online coronation the metrics and immediately once we catch the result
from the search by the different keywords so we did some pilot studies
for Tiffany religion in China the system the results are very consistent with our
analysts use the data from economic sense data so this is a big number how
to an integrated data from way back crony so we can properly crown the data
from the difference papers and different a tuner papers and then find out where
location say where’s the location of the specialties between of the orders and
where the specialties fishing of the different the topics like some crimes
and then we can generate an app and a frown so these state hours convert from
attack data and this is a big data from social media so we applied can use and
on QQ Twitter data and find where’s the top
where the people and where the topic on Twitter from the different region where
the people and use a QQ and and from the different the place and and
what time and where so they can keep the last minute people or the updated data
so the charities here so there’s many opportunities now how to
integrate the big data baseline data survey data for the china-us and a
clover studies the challenge is because we have more diverse hide the data
source and we increasing theta smaller size the data size has been Tarbert
reaper and every day so so it’s a how to another second or how to improve our
working efficiency because for this kind of procedures we probably need a you
know a series you need to understand what’s kind of that software process
data the GIS and also for the inert these primary studies inch cost is
pretty high you need to understand the different softwares so that lasts our
future directions were work on how to develop the skin over the spacer data
service and for increasing the status service and also not to appear the
workflow based spatial data service so our different procedures can be Anna
finished just one web pattern and that can be configured and that can improve
the research in fish see look very the very fish stone and also the non-key
paste a special day recovery discovery and so for people most people work on
probably some fear but when you have work on the exam for complex system you
need to work on a different fear so the knowledge based special data discovery
can help you to a final data fine topical final relationship across a
boundary across a different fear so that’s the project and sound project and
we are working on chorus thank you I guess you want to record their so
Kenny all right shall I look toward the camera thank you very much this is very
interesting material I went up to the website I have two questions for you one
is about the 2010 data and the other is about access I went to your website and
I don’t see tables for 2010 in the area that I was looking on population census
data and so my question is are the are they available from your website yes
excellent all right well maybe I went to the wrong
place – Artie there’s a – different place
sir the census data under China the sense maps you can find the data sense
data for 2010 $2,000 to Anam sensitiveness or but if you want tables
can you get it there also yes okay all right great great okay I will explore
further there’s a map and also you can download the data in the excel file you
can switch between say alright we have the city county I can turn the data into
excel file all right now the second was about the
access now is this particular one free or because one of the problems is when I
go up to access it is I don’t know if my library has a subscription or not Minister I have for access or data set
and also we provide so if you didn’t see the free metal the free mapping for
those they will try to make the data access port by everyone different topic
deeper neighbors so or our data you can if you don’t have a subscription
you can always get from this free metal free mapping is the same data in the
same but you can get access to the map not you don’t have access to the poor
data set but you can access yeah okay and so from this I could get say a table
of age by sex yes by county by county alright great thank you very much okay so we are trying to make the data
accessible by every month but you know depend on different neighbors because we
also need to survive right yeah hi I am actually from Minnesota my name is
Alicia and I have a question related to that and whenever I use we have a
subscription to China data online and then the China geo Explorer and I always
struggle when I’m on your website figuring out what we actually have
access to and what we don’t and what the difference is so if you could maybe
speak a little bit to that I just get really confused when I’m trying to
figure out you know what are we paying for and what is freely accessible so if
you see there’s a if you see the arrow here then that means you have access
okay so in also appears you can see my account my account will show you what
you have access here okay so what is the difference I guess so what when we have
that subscription what are we getting that people who don’t have it don’t get
where you can get access to all the data behind it so you can you know get search
the paper and retrieve the table and so if the people don’t have access then
they don’t have this option they were not get access to these four data so is
it more the geographic units that are different okay okay for if you don’t
have access to Chinese for you can only get access to the map not that data
defining that okay I see okay these for the map and use that when you’re
transitioning public a but not the original data okay great
thank you access and to original data but also there’s the many different
functions for you to query data extract the data realign the data okay great
thank you welcome my name is Annie I’m from Jesus
the Jesus data archive in Cologne and Germany and I’m very happy to be here
and to be able to present Jesus and the data archive to you okay so first I must
say I actually chose a little bit of a different focus I’m not going to be
talking so much so much about our data and I’m actually going to talk more
about our services in general reaching from research data management archiving
preservation and also data access first of all I would like to introduce Jesus
to you Jesus is a Research Institute but it’s also a service and infrastructure
provider for the social sciences in Germany and we offer a research-based
services which means that almost all of our employees are researchers they have
a research topic that they are working on for third of their time and for 2/3
of the time they are providing services to the research community and we are
providing services all around the research data lifecycle that you can see
here so researchers can search for information and for data at Jesus we
help them in study planning we offer a pretest lab where they can test their
survey questions we help them with sampling methods we also provide
workshops for data analysis and also and that’s why I’m here today we offer data
archiving and data registration so I’m working at the data archive for the
Social Sciences and this is actually the oldest department of cases and it was
established in 1960 which makes us to use older than the ICPs eyes I learned
yesterday and 10 years ago when Jesus was established
we became a department of Jesus and also what I learned yesterday if you want to
know about an institution you should read the mission and our mission is to
advance the social science research and promoting wide wide data sharing and
providing a rich data resource and we are also part of sista and cesta is the
consortium of European social science data archives there currently 60 members
ancestor and each member is represented by a single data archive and one of our
partners is also UK data and recently this year
Chester became an eric which probably doesn’t mean so much to you oh you’re
not very very familiar with this term but this is actually a long term
establishment of a research infrastructure and so cesta this year
was put on very stable financial ground and Chester provides large-scale
integrated sustainable data services and it’s working on improving the
possibilities for researchers across Europe to engage in research data
management and make the data available to other researchers and also one
important part of cesta is training and research data management and assess the
trainings I actually also hosted by the khazars data archive and we have a very
comprehensive expertise in a number of areas and these are data discovery data
usage data processing preservation and support and I’m going to talk about
these five areas and more detail in a minute I just want to point out to you
that during the recent years we went through some very profound structural
changes at catechesis data archive and in the past we provided all of these
services in one package and it was always a very cumbersome process to
tailor our services to each each of our customers that came to us
and wanted to use our services and for the last year’s we broke down our
products into different modules that are presented here in these boxes they’re
not exactly the same modules that we are going to roll out soon but so they
they’re going to be much more detailed however we with these modules that we
created we will be able to offer a much better tailoring towards our customers
so the researchers needs that come to us that either want to use our data or come
to us to preserve their data and first of all I’m going to talk about data
discovery and of course here we have our data catalogues in which you can search
for studies on the study level and also for some very central and very important
studies you can search through the variable levels and also we every data
that comes into our archive is registered so it receives a DUI and Dera
which is the German DUI registration service for the Social Sciences is also
hosted at Jesus and we collaborate with data sight regarding data usage we have
our data service and here we offer about 5700 national international studies and
we also host three research data centers and in total Jesus has five data
research data centers and these data centers as well as Jesus itself is part
of a very large German research data infrastructure and research data centers
usually cluster around one large study that is typically very complex and it
provides its users very comprehensive service regarding how to analyze the
data they often they advise users on how to
analyze the data they also provide workshops and on data and analysis but
also very often workshops where people using the same study can connect with
each other and network and also this is where icpsr comes in also through our
data service German researchers can access the icpsr data and this is my
role as at gazes as well I’m the four for icpsr in Germany and for now all the
data requests and icpsr go over my desk so to say also when we have highly
restricted data data that cannot be anonymized for example expert interviews
or data on when we add additional data like on the municipality or street level
this can be access access in our secure data center it’s a physical and life in
cologne we are working on collaborating with other enclaves so that not everyone
has travel to Cologne and we are also working on remote access for this but
this is these are plans for the future for now on processing here we will
distinguish between standard and premium level of data archiving in the future so
before it was kind of a like one service that we offer to everyone and we always
negotiated what is best for certain project from now on we will offer our
standard level of data archiving where we perform interest control of course in
documentation and study level to make this study findable and our data catalog
however for larger studies that are very important to the field and they reach a
lot of researchers we will offer a premium level of data archiving or added
value and here we will do intensive checks the offer come in come you
lations when studies cover different countries for example or different
years and we provide a very comprehensive documentation on variable
level our so-called variable reports that include the original question and
also frequencies and crosstabs and the standard level of data archiving
the remain cost tree as it has always been but the premium level we were
charged for that because we will put a lot of manpower in it in the future and
also preservation we offer long term preservation using bitstream and we
document the data according to international standards and we are also
we are also certified as a trusted digital repository and this is another
point where icpsr comes in all the studies that get into our archive they
are also listed at icpsr so all also all the gazes studies that we have are
accessible through icpsr or at least findable through icpsr then we also have
self deposit available that’s a very low threshold platform for smaller research
projects we do some curation for the surf deposit but it’s not as extensive
as in the standard archiving process that we offer here we check for example
for anonymization weather so to make sure that no person can be identified
through a data set that just comes in through editorial our self deposit
preservation and also the Atomium can be used for example to store your syntax if
a journal requires you to upload your syntax that you used for your analysis
in the paper you can start with tutorial and the last column that is support of
course we advise survey researchers on every step in the research process and
we always welcome researchers to come to us very
for example this is unfortunately often the case that researchers want to store
their data with us but they didn’t have the informed consent by the people that
they asked and so they without this consent they
cannot store it at an archive and that’s why we always reach out to the
researchers very early in the process to talk to us and discuss the informed
consent with us but we also advise on data on Animas ation and data sharing of
course we also provide training and this is partly done through cesta as I said
we offer sister training on research data management and data processing and
data sharing but we also have our own classes and we do that in-house in in
our workshops and we also visit our clients if it’s a Research Institute or
a customer a company that wants to have our service then we also travel to them
and provide training and we not only have researchers as our customers we
also for example provided provided training to telecom the company behind
t-mobile also we developed tools a lot of effort is put into search tools to
make our information that we gather at Jesus assess accessible to everyone and
also for example we have a tool for data harmonization if you use multiple
surveys you can smooth out the differences between the variables with
this tool and also we advocate open science I guess data sharing is open
science and this is our mission and here I just listed some projects products of
ours and communities that we are part of that all work towards open science I’m
sure that there are many more and if you look at cases they’re even more but this
is what our mission to make data open and to allow open science and this is
already my last slide and I’m happy to take questions hi Anya cuz you clarify statement you
made you said that the case of studies were findable at a high level through
icpsr other than the German national election studies in Eurobarometer and
camp and the candidate barometers I’m not aware of any study showing up in the
icpsr interface so what did you mean by that I I think that all the studies are
reported that they’re listed and the ice appears are that’s that’s what I was
told I can check into that if it’s not working they’re not and that that would
be you know really amazing and I I would love to see the same with UK DA and the
same with others because you know your country in your country you know you’re
wonderful you provide the data for free but of course we have to remember to
look in the different services so having that cross listing would really open up
research a lot mm-hmm yeah I will check that so you so you talk about the restricted
data you know the secure data center so I’m interesting who are eligible to
access the data is only for the German citizen or that you people or the if the
us researcher in the United States want to see the restrict data if they will
have to go to the Euro Center they can access a so our data is well most of our
we tailor or we target our services to the German researchers because they they
pay us but our data generally is excess accessible to everyone we I don’t think
we ever had any foreign researcher come to the secure data center yet because
it’s also a bit far to travel for most of them but I’m pretty sure it’s
accessible for foreigners as well thanks I new is very interesting talk and nice
to hear where European colleagues are doing as well I have a follow-up
question on restricted access and secure of access data collections you said that
at the moment you physically have to visit Jesus to access those collections
if you’re eligible to do so but you mentioned the possibility of having some
kind of remote access system in the future do you know where Jesus is with
with plans in that regard they are working out for a while already I think
the problem is still that you don’t know what this person is doing in front of
their computer if they’re at home so in the secure data center you really cannot
take your phone your USB stick anything with you no camera you can’t have this
you can’t have this control with remote access and maybe it’s a bit maybe we
don’t really have this differentiation of different levels yet so the secure
data center data is very very sensitive and maybe this data will always remain
in the physical and clave just as like here at icpsr so far I can’t think of
any data that we have that is suitable for remote access because we just don’t
have this distinction yet between these different axis ways but yeah Debbie
Debbie bishop is working on it so maybe she can tell you more or I can I can get
back to you and tell you more about it yeah my question is about you know the
contents that you have in your system do you actively seek out materials are are
you primarily just sort of waiting for researchers or institutions to deposit
with you so in the past it was more like evading position but it changed through
the restructuring it changed quite a bit so we still have a lot of incoming data
just by itself but we also developed what’s looking at what’s there what
could be premium data that we can acquire and offer premium services to
them to make it available unfortunately they are a lot of researchers that come
to us anyways to offer their data and they are aware of the data sharing and
how important it is when we find other very important data these researchers
often don’t really appreciate the data sharing philosophy and they’re very
hesitant and they are very yea worried about data protection and so we have to
actually work a lot on convincing them unfortunately okay anyone else well good morning everyone and thanks
for having us here at this icpsr meeting it’s always nice to be in Ann Arbor now
I always like to start these things with a bit of interaction with the audience
but because it’s early and it’s the first session of the day I’m not going
to make it difficult for you it’s going to be really really easy who in the
audience today is registered with the UK data service and is already using the UK
data service okay there’s a few hands there that’s good that’s positive who
has at least heard of us but isn’t using us okay there’s a few more okay and
where were the people who are completely clueless and have no idea what’s going
on yeah me as well okay so this is a quick overview of what I’m going to be
covering today I’ll introduce what you could service tonight and I do mean the
UK data service not the UK data archive I will talk about your kyv as well
because that’s where I work I work at the UK data archive but I work for the
UK data service and I explain how you can register an excess material from as’
and go through some of the other additional supporting resources that we
have for you if you do and download any material from us then you know you do
have access to a vast wealth of resources so that we can help you
understand what you’ve what you’ve got the most I want to highlight some of our
most important most popular data collections things hopefully but even if
the particular titles I mentioned she want familiar you release familiar with
the kinds of data that that I’m referring to so that you know you can
think of what a US equivalent or what a North American equivalent would be and
I’ll explain how you can keep in touch with this and get further further help
well the UK data this is we think of it as a one-stop
shop and it’s a comprehensive resource funded by the Economic and Social
Research Council I suppose the closest equivalent in the
United States would be the National Science Foundation and that’s where our
money comes from and we are here to provide you with access to data for
teaching and research and secondary analysis and then to support you with
whatever work you’re doing or what your students are doing or the academics that
you’re supporting what kind of work they’re doing and we’re here of course
to train and provide guidance on a very wide range of research data issues those
those of you who work with icps are familiar with ICPs last work well you
know we do broadly the same things and you was talking about a lot of the
activity at Jesus and those are all really familiar things to us too
and we want to be a developing best practice and you mentioned some of the
problems of getting people to do the work early on during their academic
projects to get data in good shape make sure they have a good research data
management plan that’s really really key because if you’re funded by the ESRC in
the UK if you’ve got money for a project from money from the ESRC for some
academic project you are obligated to share whatever data you collect at the
end of that project and deposit deposit with us at the UK data archive and you’d
be amazed how many people get fundamental things wrong because they
don’t come to us early on in the process and talk to us and get some advice you
know we are here to help with those things but you’d be amazed how many
people do simple things simple things wrong I’ll come back to some of these
things generally that courses talk here well you could at a service has a number
of different institutions involved in it and depending on what kinds of data
you’re interested in using you will get support and help from one of these
different institutions if you’re using UK survey micro data then we have
specialists at the Kathy Marsh Institute at the University of Manchester who
support that and then if you’re using census resources we had colleagues
University College London Edina the Edinburgh University of Southampton and
JISC in Manchester as well so depending on what kinds of census information or
what kinds of aggregated data you’re using you can rely on help from one of
these different different services when I first started using Survey Micro data
I did register with the UK data walk Wi-Fi this is quite a long time ago
and that was before the UK data service came into being in 2012 and before the
previous service the economic and social day service came into being in 2003 so
what that meant was I had access to the UK data archive but I didn’t have access
to all of this expertise which exists in these other partner institutions the UK
later archive is the largest service provider for the UK data service and
didn’t take a bit of time talking about about the archive because I always get
asked lots of questions about what we do and who we are and where we are and so
as it says on the slide we are the curators of the largest collection of
social science and economic research data digital data in the UK we are with
Department of the University of Essex University of Essex is very much a
social sciences institution and we benefit a lot from working with D
academic departments there and we provide a lot of expertise to those
departments and we’re celebrating our fiftieth anniversary this year so a more
youthful than icpsr thesis and we were involved in all kinds of major projects
and annual was already talking about says the Westchester members as well and
UK data service is of course our biggest project but we also coordinate the
administrative data service as well from from the UK data archive so I’m just
gonna tell you a little story about how we came into being because this is just
it’s just kind of explained to worry Alvin and and how we came about it was
in those sort of early to mid 1960s that the the Economic and Social Research
Council at least whatever the ESRC was called back then started to notice that
research data were being lost in the UK you know they were funding these
projects and data weren’t being shared work was being replicated some of the
data was being sold to institutions elsewhere as well so they set up a
committee to look into the possibility of setting up some kind of archive and
they invited proposals and some bids came in there were three that made it to
the final shortlist the first one came from a department called policy and
economic planning and they were based at the LSE there was a second one which
came from the research funders themselves they had an interesting idea
which was well worth funding the research it makes sense that all of the
data comes back to us and we look after it and then the third bid came from
these promising up-and-comers these upstarts from the University of Essex
very young institution it had only been around for a couple of years back then
now it became quite clear after a while that locating the Eco data archive in
London was going to be really quite expensive the PEP bid was basically the
LSC and it kind of relied on the fact that well you know where the LSE were in
London were important we should host this thing and yes
the Social Sciences Research Council bid was more interesting it relied on the
fact that you know well we’re funding this so we should hold it but they
didn’t really have the accommodation and it was going to be expensive they didn’t
have the computing infrastructure there either so it sort of became apparent
quite soon that the University of Essex was the leading institution in this
particular process so so PEP and Research Fund has actually got together
to try and defeat this bid from Essex and well I didn’t work out it’s going to
be too expensive we saw the computing infrastructure didn’t exist at the time
that those institutions so the the data archive came to the University of Essex
an unkind person would say that’s because LS c stands for loozer School of
Economics I I wouldn’t do that y-you know I wouldn’t do that you know
even though I’m a three-time graduate of a political science department at Essex
of rival department I wouldn’t say anything
so disparaging about our colleagues the LS II love and Taunton isn’t here so I
can get away with those of you who are involved in Isis
will know him so the University of Essex then where on earth is that well if
you’re in central London and you drive northeast for 50 55 miles you come to a
Roman town called Colchester it’s Britain’s oldest recorded town some
people around here have been there I know and the University of Essex sits on
the edge of Colchester in Wivenhoe Park and this is women ho park looks idyllic
doesn’t it well I mean this is how it looked to John Constable in 1816 they
weren’t cows anymore but there are geese and ducks and swans and lots of rabbits
and squirrels and this is the more modern day view of the University of
Essex and this work yeah so the data archive is there and we were on the edge
here and this disposed to go up here is actually about three or four years old
but there’s at least four new buildings that have grown up since this was taken
and it gets bigger all the time to the north you can see here the town of
Colchester not very interesting 50 books 59 landowners of Wivenhoe Park at the
time and you can actually see it 250 years and it has an interesting history
during the Second World War it was requisitioned by the British Army and he
actually became the headquarters of the SAS the SAS aren’t there anymore and
that that particular building is now a hotel it’s actually a teaching hotel and
it’s a really really nice place to stay if you ever come and visit us at the
University of Essex so these are the kinds of data that you can access from
the UK data service lots of big Micro data surveys there we also have various
international macro data who is also based on things like data from OECD UN
World Bank International Energy Agency things like that there were all kinds of
census resources for aggregate statistics flow date centers micro data
we’ve got census Micro data at the moment from 1991 to 2011 we’re expecting
more for earlier periods to come along and we also have a lot of qualitative
and mixed methods data resources as well for those of you who are here for the
our our boot camp yesterday eula third one talked about the fact that we have
quite a lot of qualitative collections as well now the different collections
that we have have different levels of access and those three icons that you
can see at the top indicate what what level of access they sit under obviously
there’s lots of open freely available to all collections but then beyond that we
have what we call safeguarded collections which at the very least
require you to register with the UK data service and accept our end-user license
and that simply just means that you know you you promise that you won’t pass on
these data to unauthorized parties you won’t try to identify any individual
from the the Micro data things like that and then beyond that you might have some
additional agreement some of them are just you know click licenses some of
them might require that you fill out an additional form and get permission from
the original depositor some of those are restricted only to
researchers in the UK and that kind of links to some of the restrictions that
we have on controlled data these are the most highly sensitive most highly
restricted data collections that we have that sit in our in our secure lab and
you was talking about their research data Enclave that they have physically
at geeses with our most restricted data collections we do provide remote access
to these to researchers based in UK institutions UK institutions that
receive money from the ESRC in particular and the reason for that is
not because we wouldn’t want to give any of you or any researchers and in other
countries access to these data collections the reason for the
restriction is that there has to be some sanction that can be applied against you
and if you’re from an institution that doesn’t receive money from the
see then you know if you’ve misused the data then there’s no penalty that can be
applied against you so if you are from a UK academic department and you misuse
what we have in the secure lab when we do you know we do spot things that go
wrong occasionally nothing very serious but we
do occasionally ban individual departments I don’t think any department
has yet had ESRC funding withheld completely but you don’t want to be that
academic who you know take something out of the secure lab reports it without you
know without us having checked it and then for all of your colleagues in your
department suddenly to have all of their funding cut off because of one stupid
mistake that you made so I am going to talk just a little bit about that
because even though it’s not accessible 21 out of the outside of the UK just
because it is really really important resource we think we’re doing quite well
in in relation to supplying access to these materials but it is incredibly
resource intensive the way that we set up our our secure lab first of all you
have to go through an expensive half-day of face-to-face training that we hold in
London a couple of times a month where you’re taught you know this is what you
should do this is what you shouldn’t do we teach you how to understand
statistical disclosure control if you’re reporting statistics these are the
parameters that you have to adhere to or if you’re doing any kind of complex
econometric modeling these are the kind of coefficients that you should withhold
in order to ensure that your output is safe and only one way of satisfying and
we physically check twice you know with real people everything every single
thing that’s that’s produced that a researcher wants to have released from a
secure lab it’s incredibly resource intensive work but I do have to mention
it because it’s my team you know my staff who actually perform that work and
I want to take my hats off to them because they do an amazing job so when I
talk about the UK debt service a lot of people say yeah but I’m not based in the
UK how can I get access to it well it doesn’t matter you know anyone can
register with the UK data service if you’re outside of the UK higher
education system it just means there’s an extra step that you have to go
through and that step is to apply for the
user name from the UK data archive there’s the URL and to do that and if
you apply for credentials that way you’ll get an automated message back
that says we’ll deal with this in three days usually it’s the next day that
someone will supply your your UK data archive credentials and then you can use
that to log in complete a registration form and then within minutes you can be
downloading to your own desktop any one of seven thousand data collections
within minutes literally it’s that easy you just need to complete the
registration form and accept that end-user license and then once you’ve
got access to your material we are here to help you understand what you’ve got
you know we received thousands and thousands and thousands of data queries
every single year typically there are you know asking questions about do you
have data form this where can I find you know whatever it is and then those
people who are using data they have questions about how to analyze what
they’ve got how to understand things that you know why is this documentation
file missing what on earth this is variable mean things like that we deal
with these questions all the time obviously there are lots of questions
about research data management as well trying to explain to people what their
legal obligations are this is what you shouldn’t shouldn’t do and then of
course secure lab support takes up a lot of their time we’re creating resources
online all the time there were lots of video tutorials written guides and then
we produce data for the classroom as well I see something that we try to
encourage instructors and teachers to do is to come to us use materials but they
find in our data collections we have lots of specific data data collections
that they can use in the classroom but we also encourage people to create their
own instructional materials and then send them back to us and deposit deposit
with us and some people might find that there’s a particular survey that they
like but it’s you know it’s got thousands of variables in there there’s
millions of cases and you don’t want to start with that you know if you’re
teaching undergraduates who are new to this sort of thing you don’t necessarily
want to start off with a data collection that contains millions of cells you
maybe want to cut it down and if you don’t find anything that
meets your requirements oh why don’t you why don’t you create your own one send
us the syntax will we create it and then maybe you can deposit it with us and
then we also do lots of face-to-face training events across the UK as well we
talk to academics and students and librarians and other researchers
explaining explaining to them what we have just just as we’re doing today so
here are some of the most widely used most popular data collections that we
have there might be some some names up there which are familiar to you in the
session that’s following I know there’s going to be a talk from ISR and they’re
going to be talking about some of their projects I looked at some of the surveys
that are mentioned then we have UK equivalents of those I mean the American
national election study is one where we have the British election study at the
Yukiteru archive it’s not listed up there but that’s one that’s very popular
it’s one that I know very well another one that was mentioned was the the
health and retirement study the English equivalent of that is the English
longitudinal study of aging else’ lots of you will be familiar with the panel
study of income dynamics the UK equivalent of that would be the British
household panel study and now understanding Society the bhp s started
in 1991 following five and a half thousand households twelve thousand
individuals that went for 18 years that survey has now morphed into as now
it combined into the UK household longitudinal study as well they’re
aiming for a hundred thousand individuals in that 40,000 households
it’s one of the biggest household panel surveys in the world and it covers a
really really broad range of topics the first three surveys up there were birth
cohorts and the national child development study started in 1958
following 17,000 children born in one week in 1958 they were approaching their
60th birthdays now there’s been nine nine sweeps of day two of that so far
the 1970 cohort is a follow-up study and then the Millennium Cohort is another
one children born in 2000 2001 and then the bottom two there are
cohort studies of young people we’re going to get access hopefully to a new
study of Aging in Scotland is called healthy
aging in Scotland it gives you the rather amusing acronym of haggis don’t
have data from it yet but we’re expecting it quite soon
I think haggis is banned in the United States isn’t it you can get it in Canada
aren’t you but I don’t think you can get it in the
United States go go over the border if you want a sample some plate that’s
missing and then we have lots of international macro data these are the
data banks which are based on IMF OECD World Bank UN sources most of those are
open they used to be restricted to people just in UK higher education but
most of these now are open there are some restrictions relating to
International Energy Agency data and UN data but a lot more of these materials
are now freely accessible to anyone and you just look at them in your own web
browser that’s the dot stat interface that you can see there you don’t need
any special software to actually start building up tables and things like that
and then we have data from the UK census we’re going to get more micro data so
that time timeframe will expand eventually and aggregate statistics you
can use an online tool called infuse and get and build up your tables using
infuse and the boundary data sources that are supplied and supported by Adina
at University of Edinburgh a lot of those have opened there lots of shape
files and things like that you know if some of them are restricted so some of
them are behind a login and that login will identify if you’re in the UK higher
education system or not the flow dates are supported by UCL are mostly
restricted only to researchers in the UK but then there are some micro data
sources that are available more widely if you do want to start exploring some
of these sections I always tell some of these collections of data I always tell
people to a good place to start is on our key data page so if you go to the UK
data so this home page is three steps to get there you want to first start on the
get data item right at the top of the page then select key data on the left
there and then use those tabs in the middle to search through whichever
sought topic it is you’ve got UK surveys cross national surveys know me to draw
international macro data census collie and
and so forth and then they’re all listed underneath and that’s I think that’s the
good place to start you can of course use the search tool it’s on the it’s on
the front page there just type in a term and then use the facets on the left-hand
side to start targeting your search more more precisely according to spatial
identifier or your time frame or whatever it happens to be I don’t need
to explain how to do that to a bunch of librarians I’m sure and then we also
have the verbal in question and question Bank as well so you can search according
to the text from from questions and and they’ve been asked in those surveys
we’re very interested to know what people are doing with with data so if
any of you are inspired to go away after this and actually start using our
resources get in touch with us let us know what you’ve been up to
and whether you’ve been supporting data for teaching or research or you’ve done
some work of your own and let us know if you’ve created any publications I mean
it’s part of our end user license that if you do create a publication you let
us know so that we can add it to the online catalog you know it’s a win-win
situation for everybody it promotes your work it you know makes it more visible
to everybody and we’d love to feature any of you in a case study if you’ve
been doing something interesting so do please let us know we’ll tweet about it
and share it with the world just want to take a little little bit of time here
just to explain the kinds of things that we’re up to at the moment
a UK data service phase two actually began this month now you can a service
phase one van from October 2012 up until September 2017 we’re now in Phase two
and the biggest sort of change in activity for phase two will be that we
went out into great inter sensor support more fully than we have done before and
the other activity that we’ve been working on quite recently is is our use
of experience program this is being headed up by Katherine McNeil who some
of you will know who previously was at MIT is now my manager at the UK data
service and she’s been looking extensively at you know what people are
doing with with our materials and being this use of experience program
we’ve had a survey that’s been running on our web pages for the last few weeks
I think it closed a couple of days ago when I last checked he had had something
like eight on eight or nine hundred responses so there should be some good
materials in there do please stay in touch here are some ways in which you
can do that like I said we were very interested to know what people are doing
with our material sign up to a mailing list to stay in touch with new data
collections and new activities and very happy to take any questions if you have
any thank you no okay that’s easy always one ask a
question I forgive me for monopolizing the Q&A but you you have a brilliant
website I enjoy searching on your website you know just to see what’s up
there and I just really applaud what you’ve done my question has to do with
the labor force surveys okay I just went up there and looked and
you have both to your stat version and the UK version and you caution that the
two are not comparable and I’m just wondering you know what what’s the issue
there this is really a question for my colleagues at the University of
Manchester but with the Eurostat that’s part of a wider research program that
they have where they’ve tried to hire harmonized quite a lot of the material
we only have actually the UK element of that particular survey that the Cross
you appear in one if you like if you do want access to the European labor for
surveys you actually have to go directly to Eurostat and negotiate access with
them so our collections only include the UK element of the Eurostat survey as far
as comparability goes I’m not I’m not sure what the issues would be there but
you can get the UK Cortile before survey from the UK so this that’s not a problem
like I can’t give you a more explore answer than a monthly but thank you for
the comments about the website I will pass that on to the cops team I’m around
for the rest of the meeting if anyone has any questions do please feel free to
come up and ask whatever you like thank you okay so for your egg data the sin
theta so we have the GIS map to match the synthetic New Year’s and what’s kind
of scare you have we do have shapefiles and mapping tools to use with
the census microdata and those are supported by our colleagues at Edina at
the University of Edinburgh I think it should go back to 1981 but I’m not 100%
sure about that what’s an oyster the boundary of 2011 is
a chapter is in kind of similar to attract top rock kind of neighbors young
okay okay we’ll continue this later okay thank you

You May Also Like

About the Author: Oren Garnes

Leave a Reply

Your email address will not be published. Required fields are marked *