W4A2016 Session 6 & 9 Doctoral Consortium Presentations

W4A2016 Session 6 & 9 Doctoral Consortium Presentations


– Ok, well as we wait for Tiago to run and get our tech support people back I’ll just give you guys
a quick introduction. So I’m Erin Brady, I’m
one of the co-chairs of the Google Doctoral Consortium for W4A along with Volker Sorge. So we had our doctoral
consortium on Sunday the 10th. We had– Oh, great, thank you. Perfect. Ok, so this is a picture
of our doctoral consortium. We had six panelists
with us, or six students with us and four panelists
for a half-day session. Each of the students was
able to give a 20 minute presentation about their
ongoing PhD research and then get 20 minutes of feedback from experts in the field. So first of all we’d
like to thank our panel. It was chaired by Rahman from Google, and then our panelists were Chris Bailey, Hiro Takagi from IBM Tokyo and Yu Xiang from Google as well. We also wanna thank our review panel, Shadhi, Chris, Denal, Simon, Ahbi, Phillip and Elise and the University of Quebec at Montreal for hosting us, and then also thank you to Google for their generous funding for student travel and accommodations. So with that, we’re gonna have three of our doctoral
consortium presenters give lightning talks in this session. The other three will
present in the afternoon. And so we’ll get started with Michael from the University of Waterloo. – Hello everyone, I guess
I’m Michael Cormier, my supervisors are Richard
Madden and Robin Cohen and we also work very
closely with Karen Moffat of McGill University. I’m going to be talking about the use of computer vision based
analysis of webpage structure for system interfaces. So we define page structure analysis as the process of determining
the semantic structure, the contents of a page. The typical approach to this uses analysis of the DOM tree of the page for evidence about the semantic structure. Now this is convenient to access, but the implementation structure may not match the semantic structure, and it is implementation-dependent. Our proposed approach is to analyse an image of the rendered webpage for evidence about semantic structure. Now this allows us to
use the representation created by the page designer to convey the semantic structure to users. So what can vision do
that the DOM tree can’t? Well, it gives us
implementation-independence. We can see inside
infographics and Flash objects and we’re not dependent on any particular coding conventions or techniques. What interfaces can a backend based on this system support? Well we’ve been looking at decluttering screen readers and region magnification, although I’m sure there are many others. Now we start with page segmentation, the process of recursively dividing a page into a hierarchical segmentation tree of semantically significant regions, and you can see in the… A lot of people can see an example there. This is useful on its
own, but it’s also useful for further stages in
page structure analysis, such as region classification. Here’s an example of a page segmented with our algorithm. It picks out the three main columns, and within the columns it picks out, for example, news article blurbs and then segments the text of the blurb from the thumbnail or
thumbnail placeholder. And here I show an example of a mockup decluttering interface with can gray out non-focus regions, and there’s two different sizes displayed. This is just a mockup of the interface, but it’s based on a real segmentation produced by our algorithm. Region classification is the process of labeling regions and
the segmentation tree according to their
semantic role in the page. We use a graphical model with a structure based on the segmentation tree, and we can do efficient inference on this because of the model structure. We’ve performed a series of experiments in region classification using area labels to create a ground-proof dataset. We attempt to predict the area labels. Now we have encountered a few problems with this data. We have a lot of unlabeled regions and the classes tend to
be highly imbalanced, which is a huge challenge for any machine learning algorithm. Our results from tests on a 35 page dataset with well over
4,000 regions in total were promising, although they do show room for improvement. Common classes were recognized much more accurately than rare ones as one might expect. There are two… One common error was misclassification of a labelled region as
unlabelled or vice versa, and we suspect that this may be due to the fact that a region may be unlabelled either because
no label applies to it or simply because the page developer didn’t assign a label to it. This can cause noise in
the ground truth data and that’s always a problem for
machine learning algorithms. Now we hope that region classification could be useful for screen readers because, well area labels are already used by screen readers, but they can’t be applied to component parts of things like Flash objects and they’re not always present at all. If we can infer them from the visual structure of the page, then we’re not dependent on including them and we’re not dependent on the implementation of the page. So in conclusion the proposed project is to develop computer vision techniques for the analysis of the
semantic structure of webpages. We think that vision-based technology has significant advantages for this area and also that the analysis of man-made images such as webpages that are designed to hit human perceptual cues also has great potential for
computer vision research. Thank you. (clapping) – [Erin] Are there any
questions for Michael? Ok, I actually have a question. I’m gonna scoot up just so I can speak into the microphone. Yeah, Simon, go ahead. – [Voiceover] Can you tell me, why, you’re using computer vision, but why do you not use the synthesis of the cascading stylesheets, the DOM, the rendering itself so you can kind of pretend you’re a browser… – Right, we… I think in a practical system, we would do exactly that. For the current research project, we’re trying to push the limits of what can be done using purely the rendered page. Our system is Bayesian, top to bottom, so it would be relatively
easy to integrate evidence from the, say
the DOM tree as well. Which is an advantage over a lot of, say, rule-based systems that… for example, VIPS has a lot of heuristics and it’s a well-known limit. That’s actually based on the DOM tree but it’s based on visual
properties defined there. So yes, I think in a practical system we would do that, but
for experimental control and for the… for our hopes for this
as a computer vision project as well, we’re trying to push the pure vision as far as we can. – Alright, well I’d like to present you with a certificate to commemorate your participation, thank you. (clapping) Our next presenter will
be Maria Rauschenberger, I hope I pronounced it ok, from the… From Pompeu Fabra, so… – Hi, I’m Maria Rauschenberger from the University
Pompeu Fabra in Barcelona. My supervisors are Ricardo
Baeza-Yates and Luz Rello and today I’m really
happy to talk about my PhD topic which I started
in January this year. It’s about detecting
dyslexia by a web-based game with music elements. So first of all I would
like to give you some basic information about dyslexia. Dyslexia is a learning disability which comes with visual
and hearing difficulties, and I have a small
cartoon here which shows a child which is wishing
something for Christmas. I know Christmas is far away,
but it comes every year, so he’s wishing for a
beer, a bike and a dog and he’s writing it down,
and probably if you can see there are some mistakes in it. So it’s a common mistake to exchange a letter for dyslexic people, so like a pike means like a bike but he uses a p instead of a b. (mumbles) I mean, even if it wouldn’t have been printed, a spellchecker probably wouldn’t have got it. And then he sees what the child is wishing for, and this happens a lot in regular life. So just imagine you’re a person with dyslexia and you’re writing letters and you’re getting errors, real world errors, and how many are affected? 10% of the population have dyslexia, so this is quite a lot. And I’m normally talking about children, but it’s not true, so it sticks with you your whole life. You have it, you normally get diagnosed with bad grades. It doesn’t have to be,
but it’s still common. So people have bad grades in school because of spelling mistakes, and they’re just as average intelligence
than everybody else, so this is really frustrating and just not for the child, also for parents and everyone is really frustrated. So it does affect your whole life and you have to practice a lot to overcome it, and there has been
already some work on it, so exercises where people can learn how to spell, how to write,
and improve themselves, also with music, how to overcome it. Also the detection have
been investigated already, so how to detect a child which… a person to detect if
they have dyslexia or not, and they normally relate
it to words, to letters. The idea is, or the thing is, if you want to give people more time to practice and not go through the
frustration of school, like being detected because of bad grades, and start earlier to give them more time, how can we do this, because how can we detect the child with dyslexia when they don’t even know any letters. I mean, this is the main point. So how are we going to pose that if they don’t know any letters but we still want to give them more time? And as I said, they’re also comparing difficulties with it, so the idea is to find the indicators for pre-readers with dyslexia and relate it to music to see if we can find indicators. And if we find them, as I’m pretty sure because I already looked into that, we integrated that into and detected… into a tool to detect and
support people with dyslexia. So the approach is what do you hear, and it’s, the idea is that
you use the indicators we’ve found already, like
short term memory difficulties, phonological memory and working memory, and transform it into a task. So, like, finding the same sound, distinguish between sounds,
short time interval perception, and this can be tasked to get to children, like the age of three, and
see if this is working. And I know what you’re thinking. At the age of three, they’re really small and they probably, if you give them a task they don’t want to do
it, but there is a game called memory, you probably know it, and it’s with pictures normally and you try to find the
same picture around it and it’s a game. So if we transform this into… for music elements and
say find the same sound instead of find the same picture, then you have something to work with, and if we go long and use it on the digital basis, then we can
get the metadata of it, go for machine learning and try to find the distinguished parameters
between the groups. So this is the main goal, to see if the indicators are working out, finding them while playing a series game. So the next steps will be to find independent measurements,
design a longitudinal experience study, and
explore all the populations. To sum up, early and
easy detection with music while playing a series game. Thank you very much. (clapping) – Are there questions for Maria? – [Blonde woman] Thank
you very much for that, it really interesting. I was wondering, do you have any ideas about what the risks
of false positives might be, so is there the risk
that children might be, using this kind of approach,
diagnosed very early on and then parents assume they have dyslexia and then it turns out they don’t, and it might cause a bit
of unnecessary worry? – I haven’t looked into
that, but it’s a really nice idea to do, actually, yeah. I think that also the
current default methods, they are not like… they’re not 100%, so this
is exactly the problem, and we are using it as a ground truth, so this could happen and we should look into that as well, yeah. – Well with that, we’re gonna present you with a certificate, and thank you very much. (clapping) Our last presenter will be Neil Rogers. – Thank you for the introduction, Erin. As she already said,
my name is Neil Rogers. I’m a second year PhD candidate at the University of Southhampton
in the United Kingdom. My research is funded
by an EPSRC case award and an industrial
partner called Microlink. So I’m here today to
give a very brief talk about my research into evaluating the mobile web accessibility of electronic text for print-impaired people in higher education. So what I want to do just to start out is to introduce you to what I’m currently
referring to as five layers in an accessibility evaluation framework. I just want to focus just
very specifically on that. There are other areas of my research, but because of the time limits I just want to focus on this. The first layer is the discovery layer and as you can see, I’m sure you’ve all probably used the Google search box, and this relates directly to perhaps an institute search, say like, for example, the ACM digital library, and so in the public domain perhaps it might be the Kindle e-book store, or perhaps it might be the Project Gutenberg where they make available many thousands of e-books. The second layer is what I’m referring to as the metadata layer. I just want you to take a moment to imagine the great
library of Alexandria, so for example, scrolls, manuscripts, floor to ceiling. Now in each of those scrolls was attached a piece of string or cord, and on each of those pieces of string was a tag that had information written on it, and that information, it might have included, for example the title of the manuscript. It might have also
included the information about the content of the manuscript. Now that was simply there to help people to find and locate the
information quickly. And that’s what we refer to as metadata. So essentially it’s data about data, or information about information. The third layer I’m referring to is the e-content format layer. Now I’m just using a essentially a web analogy here as you probably already gathered, so we’ve currently got web-based formats such as HTML and CSS, which I’m sure you all know are used to create websites. Now those relate directly to the electronic formats that are used for digital reading, so an example, the EPUB format, PDF, or perhaps even Amazon’s AZW format. Now the fourth layer I’m referring to as the e-reader layer. Now as you currently see here we’ve got five mainstream browsers and they relate directly
to the applications and devices that are out there. Now it’s a challenge to test the accessibility
of five main browsers. It’s a completely different story to do the same for well in excess of 200 applications and devices alike. The final layer is what I’m referring to as the e-content layer, and that simply just refers to essentially a webpage relates directly to the page from e-book. Now this might be where a person perhaps with a disability might need to work their way down through different levels within the document, so they might go through from chapters down through to headings, maybe down through too to paragraphs, and then perhaps to, say, sentence level. They may even need to
isolate single characters. I want to bring it to a close in terms of how do these different layers integrate or work with a device, and they integrate into this accessibility evaluation framework. Now this is actually based on
a review of the literature. It was also based on taking two mainstream devices, so for example the Android platform and the iPhone and actually sitting down with the devices and going through it systematically and working out exactly
what a user needs to do in order to actually establish and understand these separate layers. Now I just want to say that this is a work in progress and some of the terminology might change, so for example, I refer to them as layers. It might be that I just… I might change it to component instead. Now the other thing to point out is that this evaluation
framework is currently focused on higher
education, and the reason that this is the case is that a user might need to search for, or require an academic e-text. So for example they might look for a journal paper, they might also look for a previous exam, an exam paper for revision purposes, and they’d use their device in
conjunction with user agents or the applications or browsers on the mobile device. They use this to search for, through the discovery layer, to search for the required academic e-text that they need. Once they’ve found it, they will then be given the option to perhaps upload it to the cloud so they could use it across
many different devices. Now, just drawing to a close, and this is the sort of take-home message as it were, is to say that navigation in this context is fundamental to accessibility,
to this entire process. But more specifically, the exchange, and the use of information between each of these different layers, so just here, you’ve got, where the arrows are, if the exchange and use of information is impeded or prevented between any one of those layers, then the
accessibility is affected. But I’d just like to say thank you for giving me the opportunity to present my research to you today, it’s been a real pleasure, and thank
you for your attention. (clapping) – Are there any questions for Neil? Ok, I have a quick question if it’s ok. So going back one slide, there’s, at least to me, it seems like there’s a pretty big split between the discovery layer and then the layers that come after it, like the metadata, the e-content format, the reader and the content itself. So do you think that the gap between those, between the discovery layer and then the four layers that follow is maybe more significant, or there’s bigger opportunities there for information to be lost
or harder to access? – Yes, I totally agree with you. That’s certainly something that we are beginning to (mumbles) I think in terms of drilling down into that is that we’re looking at potential of different categories. Also the possiblity of looking at the accessibility requirements
between each layer and exactly that point is that there is a bit of a disconnect and so we are exploring that. So it’s a very good question. – Well great, thank you so much Neil. – Thank you very much. (clapping) – Alright, thank you all, and then we’ll have the other three participants this afternoon in the later session. – Right, so I’m going to chair the second half of the doctorial
consortium presentations which is again three students presenting their work who were in
the doctorial consortium and there’s five minutes per presentation and five minutes for question answering and we have scheduled 20 minutes for this. If anybody thinks there’s a
mathematical problem there, then you’re probably right. Anyway, the first speaker today is Julio Vega from the
University of Manchester in the UK and he’s going to talk about using web interaction to
monitor Parkinson’s disease through behavioral inferences on the web. – Yep. Hello everyone, I’m Julio Vega. This is a quick overview of my work that I’ve been doing for the past year and a half back in Manchester. The very basic idea behind my work is to use behavioral
inferences and smartphones to measure the progression of the disease. A bit of context on the problem. So Parkinson’s disease is a
neurodegenerative disorder, that means it’s gonna
get worse no matter what, so patients can only use interventions and medication to try to lessen the symptoms. The critical point with Parkinson’s is that it has a wide variety of symptoms. So you can have tremor, which is probably the most famous one, but you can also have gait
issues, slowed movement, depression, apathy, and
other functional problems. And also it’s very
unlikely that you’re gonna have two patients with the same set of symptoms develop at the same rate. So it’s difficult to tailor medication to each patient. Taking into account previous works using wearables and smartphones and also taking advantage
of their ubiquity, we came up with this approach which is use a smartphone with data, plus external data sources to infer proxies about
their activities or habits and then link that to the progression of the disease. We have collected, for a
pilot study of three months, 27 different data sources. We are including two more for a year-long study which includes
web and interaction data. This includes, uses apps and websites, touch interaction data,
and also keyword types. And we are aiming to monitor patients for, as I said, a long period of time, and also without imposing
any evaluation tasks or asking them to use a
phone in any particular way, because this is gonna be for many months throughout their lives, then you start having adherence
problems with approach. So so far we have carried
out a literature review and also laid out a methodology to complete this approach, and we can talk about the data analysis. So we have two tasks, proxy identification and profile of living generation. So for the first one,
proxy identification, we have identified six
hypothetical proxies that we think might link to the
progression of the disease. So we have three important, or that we consider more promising ones, which are typing patterns, phone usage patterns including touch interactions, and also going up and
down the stairs episodes. And once we have identified these proxies, then we want to measure how
they fluctuate throughout time. To do that, we came up with a metric called profile of living,
which has two parts: an individual baseline and then fluctuations measured over that baseline. To get into a bit more
of detail with this, so this is only for illustrative purposes. We can have… Sorry, we can see a graph with time in the x-axis on a monthly basis from January to December,
then on the y-axis we can have… a scale to measure the different course of the metrics that compose each proxy. So for this example we have three metrics for the typing patterns proxy. We have typing speed, typing errors, and typing halts. Those are plotted in three gray lines. Then we have in the red line a clinical score of Parkinson’s disease that we are gonna use as ground proof and we are collecting at periodic points in time
during the monitoring period. So from the first four months we are gonna generate the baseline, this is just an example, and then once we have an individual baseline for each participant, we are gonna measure fluctuations from that baseline. And once we have this course of these fluctuations we can run a correlation analysis to
see if we are measuring Parkinson’s disease at the same rate that the clinical scores. Future work includes recruiting more participants for our study, also collecting more data during the following 11 months, finding and only focusing on one proxy because this is a very broad problem, so each proxy can have very simple metrics but also very complicated ones. And we also want to evaluate our proxy using the clinical scales that I already mentioned. This includes cognitive scales, motor scales, activities
of daily living scales, and so on. And that’s it, thank you. (clapping) – Any questions, please? – [Voiceover] Hi, I’m just wondering, (mumbles) – Can you get a bit closer to the mic? – [Voiceover] Oh, sorry. Using phone, is it possible to accurately measure the gait when going
up and down the stairs? The reason I ask is that (mumbles) determining movement for individuals with slow or impaired gait
and it’s been a nightmare for it, you know, using
generic technologies to pick up, just like… – Yeah, so one of the… one of the characteristics of this work is to focus on a macro scale of behavior. So we don’t want to look at a specific gait trait, or very fine motor movements because I’m aware that can be very difficult and you can, from noisy data without controlled conditions, that can be difficult, so instead of measuring gait details when going up or down the stairs, want to focus for example on episodes. So how often do you go up and down, or how long do you take,
or the time of the day that you’re doing that, so give it more context with the whole dataset that we have with the
different data sources. – [Voiceover] Just wondering if you were considering on using something beyond a mobile device, so watches, (mumbles) but like activity bracelets or even other external devices that
can measure other facets. – Yeah, so it would be very interesting. We considered it for some time. So we considered Androidwear watches or activity trackers like the (mumbles) Because of time constraints,
budget constraints, and also because we are trying to be the less intrusive as possible with a patient’s body, probably we are gonna have that out just for this project. But I mean, I also forgot to mention this, if you have any suggestions, ideas on the methodology, the proxies, any issues that you think we might run into, I’m quite happy to discuss that later in the (mumbles) – I have one question. Have you considered using
a control group as well? Or only with patients? – Yes, it’s only with patients. So this is a bit different from diagnosis, and so we are hoping because of the longitudinal approach we are trying to tailor that to their specifications, and thus we can evaluate our inferences only among patients and not
with a control group. I hope that kind of makes sense. – Alright, thank you. Well, thank you very much and of course we’ve got a nice certificate for you, we can trade. (clapping) Right, so our next award winner is Flynn Wolf from the
University of Maryland and he’s going to talk about the developing of
wearable tactile prototype to support situational awareness. – So hello, my name is Flynn Wolf. I’m working with Dr. Kuber at the University of Maryland, Baltimore County. My research has been supported by the NSF. I’m working on assisted tactile cues which obviously have helpful purposes with people with disability and also trying to help overcome situation impairments, so situations where
visual/auditory channels are blocked or unavailable
because of noise or glare or darkness
or something like that. There’s promise using these cues with devices because the technology that supports them can be quite small and low-power, can be
inconspicuous, and quite durable. And I’m also working
on continuing projects that have been ongoing at UMBC in the HCC department working
with head-mounted devices. So the head is an interesting place. It brings its own issues in
sort of doing tactile support, but it could be quite promising because, for one thing, it supports all sorts of realistic manual hands-free tasks. I also should say it might work in conjunction with augmented reality and augmented vision hardware which is just starting to
sort of reach the market and catch people’s interest, and it’s sort of an underresearched area. So what I’m interested in is can these tactile cues help with effective attention redirection and allocation,
so can people use these cues and interpret them precisely and without getting frustrated. And if that is the case,
how best can we do that, how can those cues be formatted so that people get the most out of them, how much is too much, how much can be presented, for instance, through a head-mounted device realistically? And in the real world, how do things like distraction, exertion,
interact with these things, are those problems that have to be accounted for in terms of
perception and comprehension, and last and certainly not least, what are the unique requirements for using these for assistive purposes? And what I’m hoping to produce, ultimately, is contribute to guidelines and obviously we talked about these, there are tactile guidelines
for all sorts of things, but contribute hopefully to the discussion on multiparameter tactile cues, so cues can come in different types. You can vary things like pattern, amplitude, frequency and wave form, I’m particularly interested in pattern. I’m also interested in
terms of how research can be done on the subject in helping with further, how things
like participatory design can be used with tactile, so everyone has these
experiences constantly but trying to find common
frames of reference so that these discussions can proceed, and also, since I’m interested in the fine details of how
these signals are used, being able to look at what is happening with interpretation in process, so instead of looking
maybe simply at outcomes, try to understand how they’re being interpreted in real time, so to that end I’ve been looking at using situational awareness assessment methods like SAGET which are designed for that purpose. So I’d like to talk about three studies that are conducted along these lines. The first was an exploratory participatory design study, started with questionnaires and interviews to gather experiences that people had with situational impairment
and derived from that a set of seven use case scenarios which I posed to focus groups, so we’ll try to get these people to think about real problems and how
tactile cues could help. I did try using a vocabulary exercise to establish that kind of
common frame of reference for language, so they gave them real cues and asked them to describe them so people could talk about these things in a consistent way, and they came up with some interesting tactile solutions and they, in true
participatory design fashion, did have interesting
and kind of surprising things to say. They talked about keeping
designs fairly simple, the need for the form factor, the wearable to be something they wouldn’t
get beat up for wearing, inconspicuous and
fashionable, and that there might be some advantage in thinking about exertion and things like
that, so if I’m biking and my pulse is going in my temples, does the mode of it need
to account for that, and additionally, if you’re
giving directional cues and the head is moving,
how do you account for that in terms of the azimuth
that you are trying to cue the person towards. And they also gave a basic
set of tactile designs to respond to those use cases which I used in the second study, where I presented those cues through a head-mounted device using just two tactors and working with waveform pattern and those two positions. I also used a distraction condition, so to see how that would
interact with people’s ability to identify these cues accurately, so I gave them a whack-a-mole game to try to realistically represent visual distraction and multitasking. So it was a game that they have to sort of look at and play at the same time, and used a NASA-TLX measure to see cognitive loading, how frustrating it is to try to do all these
things at the same time. And found some interesting
results, there are… there’s error, a fairly high rate in all of the work conditions, but there were some interesting findings in terms of what types of signals seem to work better than others in terms of error rate and cognitive workload, and also these ideas of patterns. So for instance the focus group had talked about using increasing and decreasing patterns to represent spacial hazards that might be coming or going away from the user, so sort of almost like a simulated
echolocation thing. And people struggled with using those. So I took those findings
into a third study and presented people with
a tactile story line. So basicallly, set up
cues, presented again through a head-mounted
device that represented a series of spatial hazards sort of coming and going, additional hazard coming and going from their proximity, and I asked people to
describe what they were experiencing in terms of the SAGET situational awareness
method, so stopping them at various points through
the storyline and asking them what they’re experiencing
in terms of the situational, the standard situational awareness models, so what they were perceiving,
comprehending and predicting, so essentially, what did you just feel, what do you think it means, and what do you think will happen next, and I also included some
exertion conditions, so sitting, walking, standing. Again, some interesting findings, I think there’s more work to do
in applying those methods, I’ll hurry up. Yeah, so I’d like to understand more about those change in proximity
pattern signals can work better and also, again, I started
with use cases that drove tactile storylines
that were based in people without
disabilities, that’s who was available for those
initial set of findings, so I’d like to apply them to, specifically to assistive purposes. So, my thanks again to my department and the organizers. (clapping) – Time for a quick question. (mumbles) – [Voiceover] I was just wondering, did you bring any sort of the prototype… – I did not bring a
prototype with me, no, sorry. Go right ahead, I’ll just
point to some photos. – Simon, quick question. – [Voiceover] So my
question is, does it matter particularly that you’re directing feedback based on the location of the (mumbles) I suppose or whatever it is, or is it more important
to just have a single…

You May Also Like

About the Author: Oren Garnes

Leave a Reply

Your email address will not be published. Required fields are marked *