Energy Data Analytics PhD Fellows – Tianyu Wang

Energy Data Analytics PhD Fellows – Tianyu Wang


Hello everybody. So, first of all, it has been a great journey
to participate in this fellowship program and I’d like to thank my mentors, my advisor,
and my peers, and fellow students. So the topic of my project is Power Plant
Pollution Monitoring via Bandit Algorithms. I am Tianyu Wang, I am a Ph.D. student in
the CS Department. And this work is partially based on joint
work with Weicheng Ye, Dawei Geng, and Cynthia Ruden. So as I learned from this program in particular
from my friend Edgar. In the US the pollution monitoring in power
plants, we have a pretty nice system called Continous Emission Monitoring System. In many less developed countries, people have
to periodically, manually spot check those power plants. Meaning that that government sends a person
to there and takes a number down from there. And the challenges: it is very costly. to always spot check all of the power plants. Very naturally, a question to ask is, “Which
one should we check at each time point?” The solution… A very natural solution is: at each time,
only check one of the power plants and, hopefully, that one is the most polluting one. By knowing the most polluting one, we know
that overall the other power plants are not polluting too much. So at this point, we abstract this problem
into a contextual bandit problem which is kind of a shortsighted reinforcement learning
problem. In this problem, at each time “t” we observe
some context information. For example, time, weather, etc. denoted by
“t”. And based on that information, as well as
past experience we pick an action. In our case, that would be a power plant denoted
“at”. And our goal is to-overall- minimize the regret. Meaning that we spot check more polluting
power plants more frequently. And to quantify that we define the reward-in
our case it’s pollution- our case it’s a bad reward. We define the reward on the joint place of
context and action to the unit interval. And the regret at each time is measured as
the reward. The best reward, given the current context,
minus the reward of your choice and we sum them up and that gives you the total regret. So that’s the problem of contextual bandit
and that’s how we’ll formulate the manual spot check as a shortsighted reinforcement
learning problem. And our algorithm is at each timestamp “t”,
based on the past data we have, the past context, past action, past reward, we observe– here
“y” is the reward– we choose an action… we create a regression tree based on the past
observations. We chose an action which maximizes the function
value of our model plus an uncertainty term. That uncertainty term is composed of a concentration
term, which is the number of past data points in the same leaf as the action you’re considering. The diameter of the leaf contains the action
you’re considering. After we make that choice, we observe the
corresponding reward of the choice. For this algorithm under some assumptions,
the environment doesn’t change. Meaning the reward function doesn’t change,
then the expected reward is asymptotically 0. Which means always spot check the most polluting
power plant in the long run. Now let’s test this method on a real dataset. We will verify our problem, to track the highest
temperature in a building. By the way, the data we will use is from Intel
Lab Berkley. Formulate this highest temperature tracking
into our framework. At each time “t” we know the following: from
the past, the wallclock time- which is the context, monitors selected- which is the action
or past action as analogue to problem selected, and observed temperatures from selected monitors
“y”. And on the right is a table, is a snapshot
of the data table. Further, we fit a model based on the time
“id” which is context, action, space, to the temperature and we select the monitor to maximize
our fitted function value plus uncertainty. As we have discussed before, uncertainty is
a concentration type plus the size of the leaf. This is the map of the sensors and the map
of the building. Each black dot, each numbered black dot, is
a sensor. This is the kitchen. This is the storage. This is a general map of the building. This is our result. The dark blue dots
represent the error in the proposed method and the red dot is the error for our render
method. And here the error is measured in respect
to the true highest temperature at that time. As we can see from this plot, using only 1
of 32 sensors, using our method, we can a fairly good estimation of the highest temperature. As opposed to a randomly selected sensor. Or in the power plant seting, randomly selected
power plant person to inspect. And that’s all of my presentation, thank you.

You May Also Like

About the Author: Oren Garnes

Leave a Reply

Your email address will not be published. Required fields are marked *