Research Publications

My Google Scholar profile can be found here.


2014

Anh Tran. "Identifying Latent Attributes from Video Scenes Using Knowledge Acquired From Large Collections of Text Documents." Doctoral Dissertation. University of Arizona. 2014.
abstract | *pdf | bib | code | slides ]

Abstract: Peter Drucker, a well-known influential writer and philosopher in the field of management theory and practice, once claimed that "the most important thing in communication is hearing what isn't said." It is not difficult to see that a similar concept also holds in the context of video scene understanding. In almost every non-trivial video scene, most important elements, such as the motives and intentions of the actors, can never be seen or directly observed, yet the identification of these latent attributes is crucial to our full understanding of the scene. That is to say, latent attributes matter.

In this work, we explore the task of identifying latent attributes in video scenes, focusing on the mental states of participant actors. We propose a novel approach to the problem based on the use of large text collections as background knowledge and minimal information about the videos, such as activity and actor types, as query context. We formalize the task and a measure of merit that accounts for the semantic relatedness of mental state terms, as well as their distribution weights. We develop and test several largely unsupervised information extraction models that identify the mental state labels of human participants in video scenes given some contextual information about the scenes. We show that these models produce complementary information and their combination significantly outperforms the individual models, and improves performance over several baseline methods on two different datasets. We present an extensive analysis of our models and close with a discussion of our findings, along with a roadmap for future research.

Anh Tran, Mihai Surdeanu, and Paul Cohen. "Extracting Latent Attributes from Video Scenes Using Text as Background Knowledge." In Proceedings of the Third Joint Conference on Lexical and Computational Semantics (*SEM 2014). 2014.
abstract | pdf | bib | code | slides ]

Abstract: We explore the novel task of identifying latent attributes in video scenes, such as the mental states of actors, using only large text collections as background knowledge and minimal information about the videos, such as activity and actor types. We formalize the task and a measure of merit that accounts for the semantic relatedness of mental state terms. We develop and test several largely unsupervised information extraction models that identify the mental states of human participants in video scenes. We show that these models produce complementary information and their combination significantly outperforms the individual models as well as other baseline methods.

Anh Tran, Jinyan Guan, Thanima Pilantanakitti, and Paul Cohen. "Action Recognition in the Frequency Domain." arXiv.org. 2014.
abstract | pdf | bib | code ]

Abstract: In this paper, we describe a simple strategy for mitigating variability in temporal data series by shifting focus onto long-term, frequency domain features that are less susceptible to variability. We apply this method to the human action recognition task and demonstrate how working in the frequency domain can yield good recognition features for commonly used optical flow and articulated pose features, which are highly sensitive to small differences in motion, viewpoint, dynamic backgrounds, occlusion and other sources of variability. We show how these frequency-based features can be used in combination with a simple forest classifier to achieve good and robust results on the popular KTH Actions dataset.

2011

Wesley Kerr, Anh Tran, and Paul Cohen. "Activity Recognition with Finite State Machines." In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence (IJCAI'11). 2011.
abstract | pdf | bib | code ]

Abstract: This paper shows how to learn general, Finite State Machine representations of activities that function as recognizers of previously unseen instances of activities. The central problem is to tell which differences between instances of activities are unimportant and may be safely ignored for the purpose of learning generalized representations of activities. We develop a novel way to find the "essential parts" of activities by a greedy kind of multiple sequence alignment, and a method to transform the resulting alignments into Finite State Machine that will accept novel instances of activities with high accuracy.

Tasneem Kaochar, Raquel Torres Peralta, Clayton T Morrison, Thomas J Walsh, Ian R Fasel, Sumin Beyon, Anh Tran, Jeremy Wright, and Paul R Cohen. "Human Natural Instruction of a Simulated Electronic Student." AAAI Spring Symposium: Help Me Help You: Bridging the Gaps in Human-Agent Collaboration. 2011.
abstract | pdf | bib ]

Abstract: Humans naturally use multiple modes of instruction while teaching one another. We would like our robots and artificial agents to be instructed in the same way, rather than programmed. In this paper, we review prior work on human instruction of autonomous agents and present observations from two exploratory pilot studies and the results of a full study investigating how multiple instruction modes are used by humans. We describe our Bootstrapped Learning User Interface, a prototype multi-instruction interface informed by our human-user studies.

2009

Jianqiang Shen, Jed Irvine, Xinlong Bao, Michael Goodman, Stephen Kolibaba, Anh Tran, Fredric Carl, Brenton Kirschner, Simone Stumpf, and Thomas G Dietterich. "Detecting and Correcting User Activity Switches: Algorithms and Interfaces." In Proceedings of the International Conference on Intelligent User Interfaces (IUI'09). 2009.
abstract | pdf | bib | code ]

Abstract: The TaskTracer system allows knowledge workers to define a set of activities that characterize their desktop work. It then associates with each user-defined activity the set of resources that the user accesses when performing that activity. In order to correctly associate resources with activities and provide useful activity-related services to the user, the system needs to know the current activity of the user at all times. It is often convenient for the user to explicitly declare which activity he/she is working on. But frequently the user forgets to do this. TaskTracer applies machine learning methods to detect undeclared activity switches and predict the correct activity of the user. This paper presents TaskPredictor2, a complete redesign of the activity predictor in TaskTracer and its notification user interface. TaskPredictor2 applies a novel online learning algorithm that is able to incorporate a richer set of features than our previous predictors. We prove an error bound for the algorithm and present experimental results that show improved accuracy and a 180-fold speedup on real user data. The user interface supports negotiated interruption and makes it easy for the user to correct both the predicted time of the task switch and the predicted activity.

* preprint version