"Identifying Latent Attributes from Video Scenes Using Knowledge Acquired From Large Collections of Text Documents."
University of Arizona.
[ abstract |
Peter Drucker, a well-known influential writer and philosopher in the field of management theory
and practice, once claimed that "the most important thing in communication is hearing what isn't
said." It is not difficult to see that a similar concept also holds in the context of video scene
understanding. In almost every non-trivial video scene, most important elements, such as the motives
and intentions of the actors, can never be seen or directly observed, yet the identification of
these latent attributes is crucial to our full understanding of the scene. That is to say,
latent attributes matter.
In this work, we explore the task of identifying latent attributes in video scenes, focusing on
the mental states of participant actors. We propose a novel approach to the problem based on the use of
large text collections as background knowledge and minimal information about the videos, such as
activity and actor types, as query context. We formalize the task and a measure of merit that
accounts for the semantic relatedness of mental state terms, as well as their distribution weights.
We develop and test several largely unsupervised information extraction models that identify the
mental state labels of human participants in video scenes given some contextual information about
the scenes. We show that these models produce complementary information and their combination
significantly outperforms the individual models, and improves performance over several baseline
methods on two different datasets. We present an extensive analysis of our models and close with
a discussion of our findings, along with a roadmap for future research.
Anh Tran, Mihai Surdeanu, and Paul Cohen.
"Extracting Latent Attributes from Video Scenes Using Text as Background Knowledge."
In Proceedings of the Third Joint Conference on Lexical and Computational Semantics (*SEM 2014).
[ abstract |
We explore the novel task of identifying latent attributes in video scenes, such as the mental
states of actors, using only large text collections as background knowledge and minimal information
about the videos, such as activity and actor types. We formalize the task and a measure of merit that
accounts for the semantic relatedness of mental state terms. We develop and test several largely
unsupervised information extraction models that identify the mental states of human participants in
video scenes. We show that these models produce complementary information and their combination
significantly outperforms the individual models as well as other baseline methods.
Anh Tran, Jinyan Guan, Thanima Pilantanakitti, and Paul Cohen.
"Action Recognition in the Frequency Domain."
[ abstract |
In this paper, we describe a simple strategy for mitigating variability in temporal data series by
shifting focus onto long-term, frequency domain features that are less susceptible to variability.
We apply this method to the human action recognition task and demonstrate how working in the frequency
domain can yield good recognition features for commonly used optical flow and articulated pose features,
which are highly sensitive to small differences in motion, viewpoint, dynamic backgrounds, occlusion
and other sources of variability. We show how these frequency-based features can be used in
combination with a simple forest classifier to achieve good and robust results on the popular KTH
Wesley Kerr, Anh Tran, and Paul Cohen.
"Activity Recognition with Finite State Machines."
In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence (IJCAI'11).
[ abstract |
This paper shows how to learn general, Finite State Machine representations of activities that
function as recognizers of previously unseen instances of activities. The central problem is to tell
which differences between instances of activities are unimportant and may be safely ignored for the
purpose of learning generalized representations of activities. We develop a novel way to find the
"essential parts" of activities by a greedy kind of multiple sequence alignment, and a method to
transform the resulting alignments into Finite State Machine that will accept novel instances of
activities with high accuracy.
Tasneem Kaochar, Raquel Torres Peralta, Clayton T Morrison, Thomas J Walsh, Ian R Fasel, Sumin Beyon,
Anh Tran, Jeremy Wright, and Paul R Cohen.
"Human Natural Instruction of a Simulated Electronic Student."
AAAI Spring Symposium: Help Me Help You: Bridging the Gaps in Human-Agent Collaboration.
[ abstract |
Humans naturally use multiple modes of instruction while teaching one another. We would like our robots
and artificial agents to be instructed in the same way, rather than programmed. In this paper, we review
prior work on human instruction of autonomous agents and present observations from two exploratory
pilot studies and the results of a full study investigating how multiple instruction modes are used
by humans. We describe our Bootstrapped Learning User Interface, a prototype multi-instruction
interface informed by our human-user studies.
Jianqiang Shen, Jed Irvine, Xinlong Bao, Michael Goodman, Stephen Kolibaba, Anh Tran, Fredric Carl,
Brenton Kirschner, Simone Stumpf, and Thomas G Dietterich.
"Detecting and Correcting User Activity Switches: Algorithms and Interfaces."
In Proceedings of the International Conference on Intelligent User Interfaces (IUI'09).
[ abstract |
The TaskTracer system allows knowledge workers to define a set of activities that characterize their
desktop work. It then associates with each user-defined activity the set of resources that the user
accesses when performing that activity. In order to correctly associate resources with activities
and provide useful activity-related services to the user, the system needs to know the current
activity of the user at all times. It is often convenient for the user to explicitly declare which
activity he/she is working on. But frequently the user forgets to do this. TaskTracer applies
machine learning methods to detect undeclared activity switches and predict the correct activity
of the user. This paper presents TaskPredictor2, a complete redesign of the activity predictor in
TaskTracer and its notification user interface. TaskPredictor2 applies a novel online learning
algorithm that is able to incorporate a richer set of features than our previous predictors. We
prove an error bound for the algorithm and present experimental results that show improved accuracy
and a 180-fold speedup on real user data. The user interface supports negotiated interruption and
makes it easy for the user to correct both the predicted time of the task switch and the predicted
* preprint version