Evandro Gouvêa’s Publication List
Ph.D. Thesis, Master's Dissertation
- Gouvêa, E. B., “Acoustic-Feature-Based Frequency Warping for Speaker Normalization”,
Ph.D. Thesis, Carnegie Mellon University, Pittsburgh, 1999.
[pdf]
-
Abstract
- Speaker-dependent
automatic speech recognition systems are known to outperform
speaker-independent systems when enough data are available for
training to overcome the variability of acoustical properties among
speakers. Speaker normalization techniques modify the spectral
representation of incoming speech waveforms in an attempt to reduce
variability between speakers.
While a number of recent
successful speaker normalization algorithms have incorporated
speaker-specific frequency warping to the initial signal
processing, these algorithms do not make extensive use of
acoustic features contained in the incoming speech.
In this work we study the
possible benefits of the use of acoustic features that are
believed to be key to speech perception in speaker
normalization algorithms using frequency warping. We study
the extent to which the use of such features, including
specifically the first three formant frequencies, can improve
recognition accuracy and reduce computational complexity for
speaker normalization compared to conventional techniques. We
examine the characteristics and limitations of several types
of feature sets and warping functions as we compare to their
performance relative to that of existing algorithms.
We have found that the
specific shape of the warping function appears to be
irrelevant in terms of improvement in recognition
accuracy. The use of a linear function, the simplest choice,
allowed us to employ linear regression to define which
features to use and how to weigh them. We present a method
that finds the optimal set of weights for a set of speakers
given the slope of the best warping function. Selection of a
limited subset of features for use is a special case of this
method where the weights are restricted to one or zero.
The application of our
speaker normalization algorithm on the ARPA Resource
Management task resulted in sizable improvements compared to
previous techniques. Speaker normalization applied to the
ARPA Wall Street Journal (WSJ) and Broadcast News (Hub 4)
tasks resulted in more modest improvements. We have
investigated the possible causes of this. Our experiments
indicate that . xii normalization is less effective with a
larger number of speakers presumably because in this case the
output probability densities of HMMs tend to be broader and
hence representative of a large class of speakers. In
addition to this, increasing the vocabulary size tends to
increase the search space, causing correct hypotheses to be
replaced by errorful ones. The benefit brought about by
normalization is thus diluted.
The amount of improvement
provided by normalization also increases with increasing
sentence duration in Hub 4. Since the actual Hub 4 contains a
large number of short segments, the normalization provides a
more limited improvement in performance.
- Gouvêa, E. B., “Speech Synthesis in Portuguese”, Master’s Thesis, Universidade de São
Paulo, São Paulo, 1993.
-
Abstract
- This
dissertation presents the implementation of a speech synthesis system
based on phoneme concatenation. The basic idea entails building a
library of basic units, which can be phonemes, diphones, triphones,
etc. We then concatenate these basic units to build the spoken
stream. We used a linear prediction models to represent the basic
units. This model is made up of a filter modeling the vocal tract and
an excitation function. We used line spectrum pairs (LSP) to model the
filter. The excitation was modeled by code excited linear prediction
(CELP), a technique based on an analysis-by-synthesis approach.
Besides building the phoneme library, we proposed
algorithms for smoothing the transitions between concatenated
diphones, interpolating the appropriate parameters. This smoothing was
targeted at improving the quality of the synthesized speech.
Published Research
- Radeck-Arneth, S., Milde, B., Lange, A., Gouvêa, E., Radomski, S., Mühlhäuser, M., Biemann, C., “Open-Source German Distant Speech Recognition: Corpus and Acoustic Model”, Proceedings of the 18th. International Conference of Text, Speech and Dialogue, Plzen, 2015. [pdf]
-
Abstract
- We present a new freely available corpus for German distant speech recognition and report speaker-independent word error rate (WER) results for two open source speech recognizers trained on this corpus. The corpus has been recorded in a controlled environment with three different microphones at a dis- tance of one meter. It comprises 180 different speakers with a total of 36 hours of audio recordings. We show recognition results with the open source toolkit Kaldi (20.5% WER) and PocketSphinx (39.6% WER) and make a complete open source solution for German distant speech recognition possible.
- Gouvêa, E., Moreno-Daniel, A., Reddy, A., Chengalvarayan, R., Thomson, D., Ljolje, A. “The AT&T Speech
API: A Study on Practical Challenges for Customized Speech to Text Service”, Proceedings of the 14th. Annual Conference of the International Speech Communication Association, Lyon, 2013. [pdf]
-
Abstract
- AT&T has recently opened its extensive portfolio of state-of-the-art Speech Technology to external end-developers as a platform called The AT&T Speech API. This study discusses a series of practical challenges found in an industrial deployment of speech to text services, particularly, we examine different strategies for customizing the speech to text process by considering intrinsic factors, inherent to the audio signal, or extrinsic factors, available from other sources, in an industry-grade implementation.
- Gouvêa, E., “Hybrid Speech Recognition for Voice Search: a Comparative Study”, Proceedings of the 12th. Annual Conference of the International Speech Communication Association, Florence, 2011. [pdf]
-
Abstract
- We compare different units for use in information retrieval of items by voice. We compare a word based system with a subword based one, a combination of these into a hybrid system, and a phonetic one. The subword set is derived by splitting words using a Minimum Description Length (MDL) criterion. In general, we convert an index written in terms of words into an index written in terms of these different units. A speech recognition engine that uses a language model and pronunciation dictionary built from each such an inventory of units is completely independent from the information retrieval task, and can, therefore, remain fixed, making this approach ideal for resource constrained systems. We demonstrate that recognition accuracy and recall results at higher OOV rates are much superior for the hybrid system than the alternatives. On a music lyrics task at 80% OOV, the hybrid system has a recall of 82.9%, compared to 75.2% for the subword-based one and 47.4% for a word system.
- Gouvêa, E., Davel, M. H., “Kullback-Leibler divergence-based ASR training data selection”, Proceedings of the 12th. Annual Conference of the International Speech Communication Association, Florence, 2011. [pdf]
-
Abstract
- Data preparation and selection affects systems in a wide range of complexities. A system built for a resource-rich language may be so large as to include borrowed languages. A system built for resource scarce language may be affected by how carefully the training data is selected and produced. Accuracy is affected by the presence of enough samples of qualitatively relevant information. We propose a method using the Kullback-Leibler divergence to solve two problems related to data preparation: the ordering of alternate pronunciations in a lexicon, and the selection of transcription data. In both cases, we want to guarantee that a particular distribution of n-grams is achieved. In the case of lexicon design, we want to ascertain that phones will be present often enough. In the case of training data selection for scarcely resourced languages, we want to make sure that some n-grams are better represented than others. We show that our proposed technique yields encouraging results.
- Reddy, S., Gouvêa, E., “Learning from Mistakes: Expanding Pronunciation Lexicons using Word Recognition Errors”, Proceedings of the 12th. Annual Conference of the International Speech Communication Association, Florence, 2011. [pdf]
-
Abstract
- We introduce the problem of learning pronunciations of out of vocabulary words from word recognition mistakes made by an ASR system. This question is especially relevant in cases where the ASR engine is a black-box -- meaning that the only acoustic cues about the speech data come from word recognition output. This paper presents an EM approach to inferring pronunciations from n-best word recognition hypotheses, which outperforms pronunciation estimates of a grapheme-to-phoneme system.
- Gouvêa, E.B., Ezzat, T., “Vocabulary Independent Spoken Query: a Case for Subword
Units”, Proceedings of the 11th. Annual Conference of the International Speech
Communication Association, Tokyo, 2010. [pdf]
-
Abstract
- In this work, we describe a subword unit approach for information retrieval
of items by voice. An algorithm based on the minimum description length (MDL)
principle converts an index written in terms of words into an index written in terms
of phonetic subword units. A speech recognition engine that uses a language model
and pronunciation dictionary built from such an inventory of subword units is
completely independent from the information retrieval task. The recognition engine
can remain fixed, making this approach ideal for resource constrained systems. In
addition, we demonstrate that recall results at higher out of vocabulary (OOV)
rates are much superior for the subword unit system. On a music lyrics task at 80%
OOV, the subword-based recall is 75.2%, compared to 47.4% for a word system.
- Gouvêa, E.B., Ezzat, T., Raj, B., “Subword Unit Approaches For Retrieval By Voice”,
SpokenQuery 2010 Workshop on Voice Search, Dallas, 2010. [pdf]
-
Abstract
- In this work, we describe a subword unit approach for information retrieval
of items by voice. An algorithm based on the minimum description length (MDL)
principle converts an index written in terms of words with vocabulary size V into an
index written in terms of phonetic subword units of size M << V . We demonstrate
that, with this highly reduced vocabulary of subword units, improvements in ASR
decode speed and memory footprint can be achieved, at the expense of a small drop
in recall performance. Results on a music lyrics retrieval task are demonstrated.
- Gouvêa, E.B., Raj, B., “Word Particles Applied to Information Retrieval”, European
Conference on Information Retrieval, Toulouse, 2009. [pdf]
-
Abstract
- Document retrieval systems conventionally use words as the basic unit of
representation, a natural choice since words are primary carriers of semantic
information. In this paper we propose the use of a different, phonetically defined
unit of representation that we call “particles”. Particles are phonetic sequences
that do not possess meaning. Both documents and queries are converted from
their standard word-based form into sequences of particles. Indexing and retrieval
is performed with particles. Experiments show that this scheme is capable of
achieving retrieval performance that is comparable to that from words when the
text in the documents and queries are clean, and can result in significantly improved
retrieval when they are noisy.
- Stern, R.M., Gouvêa, E.B., Kim, C., Kumar, K., Park, H.-M., “Binaural and
Multiple-Microphone Signal Processing Motivated by Auditory Perception”, HSCMA Joint
Workshop on Hands-free Speech Communication and Microphone Arrays, Trento,
2008. [pdf]
-
Abstract
- It is well known that binaural processing is very useful for separating
incoming sound sources as well as for improving the intelligibility of speech
in reverberant environments. This paper describes and compares a number of
ways in which the classic model of interaural cross-correlation proposed by
Jeffress, quantified by Colburn, and further elaborated by Blauert, Lindemann,
and others, can be applied to improving the accuracy of automatic speech
recognition systems operating in cluttered, noisy, and reverberant environments.
Typical implementations begin with an abstraction of cross-correlation of the
incoming signals after nonlinear monaural bandpass processing, but there are many
alternative implementation choices that can be considered. These implementations
differ in the ways in which an enhanced version of the desired signal is developed
using binaural principles, in the extent to which specific processing mechanisms are
used to impose suppression motivated by the precedence effect, and in the precise
mechanism used to extract interaural time differences.
- Singh, R., Gouvêa, E.B., Raj, B., “Probabilistic Deduction of Symbol Mappings for
Extension of Lexicons”, Proceedings of the 8th. Annual Conference of the International Speech
Communication Association, Antwerp, 2007.
-
Abstract
- This paper proposes a statistical mapping-based technique for guessing
pronunciations of novel words from their spellings. The technique is based on
the automatic determination and utilization of unidirectional mappings between
n-tuples of characters and n-tuples of phonemes, and may be viewed as a statistical
extension of analogy-based pronunciation guessing algorithms.
- Stern, R.M., Gouvêa, E.B., Thattai, G., “Polyaural Array Processing for Automatic
Speech Recognition in Degraded Environments”, Proceedings of the 8th. Annual
Conference of the International Speech Communication Association, Antwerp,
2007. [pdf]
-
Abstract
- In this paper we present a new method of signal processing for robust speech
recognition using multiple microphones. The method, loosely based on the human
binaural hearing system, consists of passing the speech signals detected by multiple
microphones through bandpass filtering and nonlinear halfwave rectification
operations, and then cross-correlating the outputs from each channel within
each frequency band. These operations provide rejection of off-axis interfering
signals. These operations are repeated (in a non-physiological fashion) for the
negative of the signal, and an estimate of the desired signal is obtained by
combining the positive and negative outputs. We demonstrate that the use of this
approach provides substantially better recognition accuracy than delay-and-sum
beamforming using the same sensors for target signals in the presence of additive
broadband and speech maskers. Improvements in reverberant environments are
tangible but more modest.
- Mostow, J., Beck, J., Cen, H., Cuneo, A., Gouvêa, E., Heiner, C, “An Educational Data
Mining Tool to Browse Tutor-Student Interactions: Time Will Tell!”, Proceedings of the
Workshop on Educational Data Mining, Pittsburgh, 2005. [pdf]
-
Abstract
- A basic question in mining data from an intelligent tutoring system is,
“What happened when...?” We identify requirements for a tool to help answer
such questions by finding occurrences of specified phenomena and browsing
them in human-understandable form. We describe an implemented tool and
how it meets the requirements. The tool applies to MySQL databases whose
representation of tutorial events includes student, computer, start time, and end
time. It automatically computes and displays the temporal hierarchy implicit in
this representation. We illustrate the use of this tool to mine data from Project
LISTENs automated Reading Tutor.
- Mostow, J., Beck, J., Cen, H., Gouvêa, E., Heiner, C., “Interactive Demonstration of a
Generic Tool to Browse Tutor-Student Interactions.”, Interactive Events Proceedings of the
12th International Conference on Artificial Intelligence in Education (AIED 2005),
Amsterdam.
-
Abstract
- Project LISTENs Session Browser is a generic tool to browse a database of
students interactions with an automated tutor. Using databases logged by Project
LISTENs Reading Tutor, we illustrate how to specify phenomena to investigate,
explore events and the context where they occurred, dynamically drill down and
adjust which details to display, and summarize events in human-understandable
form. The tool should apply to MySQL databases from other tutors as well.
- Mostow, J., Beck, J., Cuneo, A., Gouvêa, E., Heiner, C, “A Generic Tool to Browse
Tutor-Student Interactions: Time Will Tell!”, Proceedings of the 12th International Conference
on Artificial Intelligence in Education (AIED 2005), Amsterdam, 2005. [pdf]
-
Abstract
- A basic question in mining data from an intelligent tutoring system is, What
happened when...? A generic tool to answer such questions should let the user
specify which phenomenon to explore; explore selected events and the context in
which they occurred; and require minimal effort to adapt the tool to new versions,
to new users, or to other tutors. We describe an implemented tool and how it meets
these requirements. The tool applies to MySQL databases whose representation of
tutorial events includes student, computer, start time, and end time. It infers the
implicit hierarchical structure of tutorial interaction so humans can browse it. A
companion paper [1] illustrates the use of this tool to explore data from Project
LISTENs automated Reading Tutor.
- Walker, W., Lamere, P., Kwok, P., Raj, B., Singh, R., Gouvêa, E., Wolf, P., Woelfel, J., “Sphinx-4: A Flexible Open Source Framework for Speech Recognition”, Sun Microsystems Technical Report, Menlo Park, 2004. [pdf]
-
Abstract
- Sphinx-4 is a flexible, modular and pluggable framework to help foster new innovations in the core research of hidden Markov model (HMM) speech recognition systems. The design of Sphinx-4 is based on patterns that have emerged from the design of past systems as well as new requirements based on areas that researchers currently want to explore. To exercise this framework, and to provide researchers with a “research- ready” system, Sphinx-4 also includes several implementations of both simple and state-of-the-art techniques. The framework and the implementations are all freely available via open source.
- Lamere, P., Kwok, P., Walker, W., Gouvêa, E., Singh, R., Raj, B., Wolf, P., “Design of the
CMU Sphinx-4 Decoder”, Proceedings of the 8th. European Conference on Speech
Communication and Technology, Genebra, 2003. [pdf]
-
Abstract
- Sphinx-4 is an open source HMM-based speech recognition system written
in the Java™ programming language. The design of the Sphinx-4 decoder
incorporates several new features in response to current demands on HMM-based
large vocabulary systems. Some new design aspects include graph construction
for multilevel parallel decoding with multiple feature streams without the use
of compound HMMs, the incorporation of a generalized search algorithm that
subsumes Viterbi decoding as a special case, token stack decoding for efficient
maintenance of multiple paths during search, design of a generalized language
HMM graph from grammars and language models of multiple standard formats,
that can potentially toggle between flat search structure, tree search structure, etc.
This paper describes a few of these design aspects, and reports some preliminary
performance measures for speed and accuracy.
- Seymore, K., Chen, S., Doh, S., Eskenazi, M., Gouv\^ea, E., Raj, B., Ravishankar, M., Rosenfeld, R., Siegler, M., Stern, R., Thayer, E., “The 1997 CMU Sphinx-3 English Broadcast News Transcription System”, Proc. DARPA Speech Recognition Workshop, Chantilly, 1998. [pdf]
-
Abstract
- This paper describes the 1997 Hub-4 Broadcast News Sphinx- 3 speech recognition system. This year’s system includes full- bandwidth acoustic models trained on Broadcast News and Wall Street Journal acoustic training data, an expanded vocabulary, and a 4-gram language model for N-best list rescoring. The system struc- ture, acoustic and language models, and adaptation components are described in detail, and results are presented to establish the con- tributions of multiple recognition passes. Additionally, experimen- tal results are presented for several different acoustic and language model configurations.
- Gouvêa, E.B., Stern, R.M., “Speaker Normalization Through Formant-Based Warping of
the Frequency Scale”, Proceedings of the 5th. European Conference on Speech Communication and Technology, Rhodes, 1997. [pdf]
-
Abstract
- Speaker-dependent automatic speech recognition systems are known to
outperform speaker-independent systems when enough training data are available
to model acoustical variability among speakers. Speaker normalization techniques
modify the spectral representation of incoming speech waveforms in an attempt
to reduce variability between speakers. Recent successful speaker normalization
algorithms have incorporated a speaker-specific frequency warping to the initial
signal processing stages. These algorithms, however, do not make extensive use of
acoustic features contained in the incoming speech.
In this paper we study the possible benefits of the use of acoustic features in speaker
normalization algorithms using frequency warping. We study the extent to which
the use of such features, including specifically the use of formant frequencies, can
improve recognition accuracy and reduce computational complexity for speaker
normalization. We examine the characteristics and limitations of several types of
feature sets and warping functions as we compare their performance relative to
existing algorithms.
- Raj, B., Gouvêa, E.B., Stern, R.M., “Cepstral Compensation Using Statistical
Linearization”, Proceedings of the ESCA (European Speech Communication Association)
Tutorial and Research Workshop on Robust Speech Recognition for Unknown Communication
Channels, Pont-a-Mousson, 1997. [pdf]
-
Abstract
- Speech recognition systems perform poorly on speech degraded by even
simple effects such as linear filtering and additive noise. One solution to
this problem is to modify the probability density function (PDF) of clean
speech to account for the effects of the degradation. However, even for the
case of linear filtering and additive noise, it is extremely difficult to do this
analytically. Previously-attempted analytical solutions for the problem of noisy
speech recognition have either used an overly-simplified mathematical description
of the effects of noise on the statistics of speech, or they have relied on the
availability of large environment-specific adaptation sets. In this paper we present
the Vector Polynomial approximations (VPS) method to compensate for the effects
of linear filtering and additive noise on the PDF of clean speech. VPS also
estimates the parameters of the environment, namely the noise and the channel,
by using statistically linearized approximations of these effects. We evaluate the
performance of this method (VPS) using the CMU SPHINX-II system on the
alphanumeric CENSUS database corrupted with artificial white Gaussian noise.
VPS provides improvements of up to 15 percent in relative recognition accuracy
over our previous best algorithm, VTS, while being up to 20 percent more
computationally efficient.
- Campos, G.L., Gouvêa, E.B., “Speech Synthesis using the CELP Algorithm”, Proceedings of
the 4th. International Conference on Spoken Language Processing, Philadelphia,
1996. [pdf]
-
Abstract
- This paper presents a phoneme/diphone based speech synthesis system for
the (Brazilian) Portuguese language. The basic idea bearing this system is the
construction of a library of phonetic units, and processing of those basic units to
build an utterance. The system is complemented by a text to phoneme translator
described in [Cam95].
The phonemes representation in the library is based on a linear prediction model;
the filter which models the vocal tract is represented by Line Spectrum pairs, and
the excitation by Code Excited Linear Prediction (CELP) parameters.
Thus paper is organized as follows. After a brief introduction, CELP coding is
briefly presented in part 2. Part 3 presents the relevant points to be applied
in speech synthesis. Parts 4 and 5 constitutes the main contribution of this
paper, detailing the process of building the phoneme library and the interpolation
techniques used. Part 6 presents some concluding remarks.
- Raj, B., Gouvêa, E.B., Stern, R.M., “Cepstral Compensation by Polynomial
Approximation for Environment-Independent Speech Recognition”, Proceedings of
the 4th. International Conference on Spoken Language Processing, Philadelphia,
1996. [pdf]
-
Abstract
- Speech recognition systems perform poorly on speech degraded by even
simple effects such as linear filtering and additive noise. One possible solution
to this problem is to modify the probability density function (PDF) of clean
speech to account for the effects of the degradation. However, even for the case of
linear filtering and additive noise, it is extremely difficult to do this analytically.
Previously attempted analytical solutions to the problem of noisy speech
recognition have either used an overly-simplified mathematical description of the
effects of noise on the statistics of speech, or they have relied on the availability
of large environment-specific adaptation sets. Some of the previous methods
required the use of adaptation data that consists of simultaneously-recorded or
stereo recordings of clean and degraded speech. In this paper we introduce an
approximation-based method to compute the effects of the environment on the
parameters of the PDF of clean speech.
In this work, we perform compensation by Vector Polynomial approximationS
(VPS) for the effects of linear filtering and additive noise on the clean speech.
We also estimate the parameters of the environment, namely the noise and the
channel, by using piecewise-linear approximations of these effects.
We evaluate the performance of this method (VPS) using the CMU SPHINX-II
system and the 100-word alphanumeric CENSUS database. Performance is
evaluated at several SNRs, with artificial white Gaussian noise added to the
database. VPS provides improvements of up to 15 percent in relative recognition
accuracy.
- Gouvêa, E.B., Moreno, P.J., Raj, B., Sullivan, T.M., Stern, R.M., “Adaptation and
Compensation: Approaches to Microphone and Speaker Independence in Automatic Speech
Recognition”, Proceedings of the DARPA Speech Recognition Workshop, Harriman,
1996. [pdf]
-
Abstract
- This paper describes recent efforts by the CMU speech group to address the
important problems of robustness to changes in environment and speaker. Results
are presented in the context of the 1995 ARPA common Hub 3 evaluation of speech
recorded through different microphones at different signal-to-noise ratios (SNRs).
For speech that is considered to be of high quality we addressed the problem of
speaker variability through a speaker normalization technique. For speech recorded
at lower SNRs, we used a combination of environmental compensation techniques
previously developed in our group. Speaker normalization reduced the relative
error rate for clean speech by 3.5 percent, and the combination of environmental
compensation with the use of noise-corrupted speech in the training process
reduced the relative error rate for noisy speech by 54.9 percent.
- Jain, U., Siegler, M.A., Doh, S.-J., Gouvêa, E.B., Moreno, P.J., Raj, B., Stern, R.M.,
“Recognition of Continuous Broadcast News With Multiple Unknown Speakers And
Environments”, Proceedings of the DARPA Speech Recognition Workshop, Harriman,
1996. [pdf]
-
Abstract
- Practical applications of continuous speech recognition in realistic
environments place increasing demands for speaker and environment independence.
Until recently, this robustness has been measured using evaluation procedures
where speaker and environment boundaries are known, with utterances containing
complete or nearly complete sentences. This paper describes recent efforts by the
CMU speech group to improve the recognition of speech found in long sections
of the broadcast news show Marketplace. Most of our effort was concentrated in
two areas: the automatic segmentation and classification of environments, and the
construction of a suitable lexicon and language model. We review the extensions to
SPHINX-II that were necessary to enable it to process continuous broadcast news
and we compare the recognition accuracy of the SPHINX-II system for different
environmental and speaker conditions.
- Moreno, P.J., Raj, B., Gouvêa, E.B., Stern, R.M., “Multivariate-Gaussian-Based Cepstral
Normalization for Robust Speech Recognition”, Proceedings of the International Conference
on Acoustics, Speech, and Signal Processing, Detroit, 1995. [pdf]
-
Abstract
- In this paper we introduce a new family of environmental compensation
algorithms called Multivariate Gaussian Based Cepstral Normalization (RATZ).
RATZ assumes that the effects of unknown noise and filtering on speech features
can be compensated by corrections to the mean and variance of components of
Gaussian mixtures, and an efficient procedure for estimating the correction factors
is provided. The RATZ algorithm can be implemented to work with or without
the use of stereo development data that had been simultaneously recorded in the
training and testing environments. Blind RATZ partially overcomes the loss of
information that would have been provided by stereo training through the use of
a more accurate description of how noisy environments affect clean speech. We
evaluate the performance of the two RATZ algorithms using the CMU SPHINX-II
system on the alphanumeric census database and compare their performance with
that of previous environmental-robustness developed at CMU.
Book Chapter
- Mostow, J., Beck, J., Cuneo, A., Gouvêa, E.B., Heiner, C., Juarez, O., “Lessons from
Project LISTEN’s Session Browser” in Handbook of Educational Data Mining, Chapman &
Hall/CRC Data Mining and Knowledge Discovery Series, 2010. [pdf]
-
Abstract
- A
basic question in mining data from an intelligent tutoring system is,
“What happened when?” A tool to answer such questions
should let the user specify which phenomena to explore; find instances
of them; summarize them in human-understandable form; explore the
context where they occurred; dynamically drill down and adjust which
details to display; support manual annotation; and require minimal
effort to adapt to new tutor versions, new users, new phenomena, or
other tutors.
This chapter describes the
Session Browser, an educational data mining tool that
supports such case analysis by exploiting three simple but
powerful ideas. First, logging tutorial interaction directly
to a suitably designed and indexed database instead of to log
files eliminates the need to parse them and supports
immediate efficient access. Second, a student, computer, and
time interval together suffice to identify a tutorial
event. Third, a containment relation between time intervals
defines a hierarchical structure of tutorial
interactions. Together, these ideas make it possible to
implement a flexible, efficient tool to browse tutor data in
understandable form yet with minimal dependency on
tutor-specific details.
We illustrate how we have
used the Session Browser with MySQL databases of millions of
events logged by successive versions of Project LISTENs
Reading Tutor. We describe tasks we have used it for,
improvements made, and lessons learned in the years since the
first version of the Session Browser [1-3].
Patent Applications
- Method for Retrieving Items Represented by Particles from an Information Database, U.S. Pat. 8,055,693, Granted November 2011.
-
Abstract
- A set of words is converted to a corresponding set of particles, wherein the
words and the particles are unique within each set. For each word, all possible
partitionings of the word into particles are determined, and a cost is determined
for each possible partitioning. The particles of the possible partitioning associated
with a minimal cost are added to the set of particles.
- Method for Determining Distributions of Unobserved Classes of a Classifier, U.S. Pat. 8,219,510, Granted July 2012.
-
Abstract
- In pattern recognition, a classifier is normally trained by selecting classes
whose parameters are estimated from observed data. This method allows for the
training of classes for which all observations are assigned to other classes. The
method builds on discriminative training methods by estimating the unbserved
classes’ parameters by how much the classes’ centroids are repelled by all the
observed data.
- Method for Indexing for Retrieving Documents Using Particles, U.S. Pat. 8,229,921, Granted July 2012.
-
Abstract
- An information retrieval system stores and retrieves documents using
particles and a particle-based language model A set of particles for a collection of
documents in a particular language is constructed from training documents such
that a perplexity of the particle-based language model is substantially lower than
the perplexity of a word-based language model constructed from the same training
documents. The documents can then be converted to document particle graphs
from which particle-based keys are extracted to form an index to the documents.
Users can then retrieve relevant documents using queries also in the form of particle
graphs.
Contact
Email: evandro dot gouvea at alumni dot cmu dot edu
|
|