Evandro Gouvêa
Speech Scientist, Consultant

Evandro Gouvêa’s Publication List


Ph.D. Thesis, Master's Dissertation

  • Gouvêa, E. B., “Acoustic-Feature-Based Frequency Warping for Speaker Normalization”, Ph.D. Thesis, Carnegie Mellon University, Pittsburgh, 1999. [pdf]
    Abstract
    Speaker-dependent automatic speech recognition systems are known to outperform speaker-independent systems when enough data are available for training to overcome the variability of acoustical properties among speakers. Speaker normalization techniques modify the spectral representation of incoming speech waveforms in an attempt to reduce variability between speakers.

    While a number of recent successful speaker normalization algorithms have incorporated speaker-specific frequency warping to the initial signal processing, these algorithms do not make extensive use of acoustic features contained in the incoming speech.

    In this work we study the possible benefits of the use of acoustic features that are believed to be key to speech perception in speaker normalization algorithms using frequency warping. We study the extent to which the use of such features, including specifically the first three formant frequencies, can improve recognition accuracy and reduce computational complexity for speaker normalization compared to conventional techniques. We examine the characteristics and limitations of several types of feature sets and warping functions as we compare to their performance relative to that of existing algorithms.

    We have found that the specific shape of the warping function appears to be irrelevant in terms of improvement in recognition accuracy. The use of a linear function, the simplest choice, allowed us to employ linear regression to define which features to use and how to weigh them. We present a method that finds the optimal set of weights for a set of speakers given the slope of the best warping function. Selection of a limited subset of features for use is a special case of this method where the weights are restricted to one or zero.

    The application of our speaker normalization algorithm on the ARPA Resource Management task resulted in sizable improvements compared to previous techniques. Speaker normalization applied to the ARPA Wall Street Journal (WSJ) and Broadcast News (Hub 4) tasks resulted in more modest improvements. We have investigated the possible causes of this. Our experiments indicate that . xii normalization is less effective with a larger number of speakers presumably because in this case the output probability densities of HMMs tend to be broader and hence representative of a large class of speakers. In addition to this, increasing the vocabulary size tends to increase the search space, causing correct hypotheses to be replaced by errorful ones. The benefit brought about by normalization is thus diluted.

    The amount of improvement provided by normalization also increases with increasing sentence duration in Hub 4. Since the actual Hub 4 contains a large number of short segments, the normalization provides a more limited improvement in performance.

  • Gouvêa, E. B., “Speech Synthesis in Portuguese”, Master’s Thesis, Universidade de São Paulo, São Paulo, 1993.
    Abstract
    This dissertation presents the implementation of a speech synthesis system based on phoneme concatenation. The basic idea entails building a library of basic units, which can be phonemes, diphones, triphones, etc. We then concatenate these basic units to build the spoken stream. We used a linear prediction models to represent the basic units. This model is made up of a filter modeling the vocal tract and an excitation function. We used line spectrum pairs (LSP) to model the filter. The excitation was modeled by code excited linear prediction (CELP), a technique based on an analysis-by-synthesis approach.

    Besides building the phoneme library, we proposed algorithms for smoothing the transitions between concatenated diphones, interpolating the appropriate parameters. This smoothing was targeted at improving the quality of the synthesized speech.

Published Research

  • Radeck-Arneth, S., Milde, B., Lange, A., Gouvêa, E., Radomski, S., Mühlhäuser, M., Biemann, C., “Open-Source German Distant Speech Recognition: Corpus and Acoustic Model”, Proceedings of the 18th. International Conference of Text, Speech and Dialogue, Plzen, 2015. [pdf]
    Abstract
    We present a new freely available corpus for German distant speech recognition and report speaker-independent word error rate (WER) results for two open source speech recognizers trained on this corpus. The corpus has been recorded in a controlled environment with three different microphones at a dis- tance of one meter. It comprises 180 different speakers with a total of 36 hours of audio recordings. We show recognition results with the open source toolkit Kaldi (20.5% WER) and PocketSphinx (39.6% WER) and make a complete open source solution for German distant speech recognition possible.
  • Gouvêa, E., Moreno-Daniel, A., Reddy, A., Chengalvarayan, R., Thomson, D., Ljolje, A. “The AT&T Speech API: A Study on Practical Challenges for Customized Speech to Text Service”, Proceedings of the 14th. Annual Conference of the International Speech Communication Association, Lyon, 2013. [pdf]
    Abstract
    AT&T has recently opened its extensive portfolio of state-of-the-art Speech Technology to external end-developers as a platform called The AT&T Speech API. This study discusses a series of practical challenges found in an industrial deployment of speech to text services, particularly, we examine different strategies for customizing the speech to text process by considering intrinsic factors, inherent to the audio signal, or extrinsic factors, available from other sources, in an industry-grade implementation.
  • Gouvêa, E., “Hybrid Speech Recognition for Voice Search: a Comparative Study”, Proceedings of the 12th. Annual Conference of the International Speech Communication Association, Florence, 2011. [pdf]
    Abstract
    We compare different units for use in information retrieval of items by voice. We compare a word based system with a subword based one, a combination of these into a hybrid system, and a phonetic one. The subword set is derived by splitting words using a Minimum Description Length (MDL) criterion. In general, we convert an index written in terms of words into an index written in terms of these different units. A speech recognition engine that uses a language model and pronunciation dictionary built from each such an inventory of units is completely independent from the information retrieval task, and can, therefore, remain fixed, making this approach ideal for resource constrained systems. We demonstrate that recognition accuracy and recall results at higher OOV rates are much superior for the hybrid system than the alternatives. On a music lyrics task at 80% OOV, the hybrid system has a recall of 82.9%, compared to 75.2% for the subword-based one and 47.4% for a word system.
  • Gouvêa, E., Davel, M. H., “Kullback-Leibler divergence-based ASR training data selection”, Proceedings of the 12th. Annual Conference of the International Speech Communication Association, Florence, 2011. [pdf]
    Abstract
    Data preparation and selection affects systems in a wide range of complexities. A system built for a resource-rich language may be so large as to include borrowed languages. A system built for resource scarce language may be affected by how carefully the training data is selected and produced. Accuracy is affected by the presence of enough samples of qualitatively relevant information. We propose a method using the Kullback-Leibler divergence to solve two problems related to data preparation: the ordering of alternate pronunciations in a lexicon, and the selection of transcription data. In both cases, we want to guarantee that a particular distribution of n-grams is achieved. In the case of lexicon design, we want to ascertain that phones will be present often enough. In the case of training data selection for scarcely resourced languages, we want to make sure that some n-grams are better represented than others. We show that our proposed technique yields encouraging results.
  • Reddy, S., Gouvêa, E., “Learning from Mistakes: Expanding Pronunciation Lexicons using Word Recognition Errors”, Proceedings of the 12th. Annual Conference of the International Speech Communication Association, Florence, 2011. [pdf]
    Abstract
    We introduce the problem of learning pronunciations of out of vocabulary words from word recognition mistakes made by an ASR system. This question is especially relevant in cases where the ASR engine is a black-box -- meaning that the only acoustic cues about the speech data come from word recognition output. This paper presents an EM approach to inferring pronunciations from n-best word recognition hypotheses, which outperforms pronunciation estimates of a grapheme-to-phoneme system.
  • Gouvêa, E.B., Ezzat, T., “Vocabulary Independent Spoken Query: a Case for Subword Units”, Proceedings of the 11th. Annual Conference of the International Speech Communication Association, Tokyo, 2010. [pdf]
    Abstract
    In this work, we describe a subword unit approach for information retrieval of items by voice. An algorithm based on the minimum description length (MDL) principle converts an index written in terms of words into an index written in terms of phonetic subword units. A speech recognition engine that uses a language model and pronunciation dictionary built from such an inventory of subword units is completely independent from the information retrieval task. The recognition engine can remain fixed, making this approach ideal for resource constrained systems. In addition, we demonstrate that recall results at higher out of vocabulary (OOV) rates are much superior for the subword unit system. On a music lyrics task at 80% OOV, the subword-based recall is 75.2%, compared to 47.4% for a word system.
  • Gouvêa, E.B., Ezzat, T., Raj, B., “Subword Unit Approaches For Retrieval By Voice”, SpokenQuery 2010 Workshop on Voice Search, Dallas, 2010. [pdf]
    Abstract
    In this work, we describe a subword unit approach for information retrieval of items by voice. An algorithm based on the minimum description length (MDL) principle converts an index written in terms of words with vocabulary size V into an index written in terms of phonetic subword units of size M << V . We demonstrate that, with this highly reduced vocabulary of subword units, improvements in ASR decode speed and memory footprint can be achieved, at the expense of a small drop in recall performance. Results on a music lyrics retrieval task are demonstrated.
  • Gouvêa, E.B., Raj, B., “Word Particles Applied to Information Retrieval”, European Conference on Information Retrieval, Toulouse, 2009. [pdf]
    Abstract
    Document retrieval systems conventionally use words as the basic unit of representation, a natural choice since words are primary carriers of semantic information. In this paper we propose the use of a different, phonetically defined unit of representation that we call “particles”. Particles are phonetic sequences that do not possess meaning. Both documents and queries are converted from their standard word-based form into sequences of particles. Indexing and retrieval is performed with particles. Experiments show that this scheme is capable of achieving retrieval performance that is comparable to that from words when the text in the documents and queries are clean, and can result in significantly improved retrieval when they are noisy.
  • Stern, R.M., Gouvêa, E.B., Kim, C., Kumar, K., Park, H.-M., “Binaural and Multiple-Microphone Signal Processing Motivated by Auditory Perception”, HSCMA Joint Workshop on Hands-free Speech Communication and Microphone Arrays, Trento, 2008. [pdf]
    Abstract
    It is well known that binaural processing is very useful for separating incoming sound sources as well as for improving the intelligibility of speech in reverberant environments. This paper describes and compares a number of ways in which the classic model of interaural cross-correlation proposed by Jeffress, quantified by Colburn, and further elaborated by Blauert, Lindemann, and others, can be applied to improving the accuracy of automatic speech recognition systems operating in cluttered, noisy, and reverberant environments. Typical implementations begin with an abstraction of cross-correlation of the incoming signals after nonlinear monaural bandpass processing, but there are many alternative implementation choices that can be considered. These implementations differ in the ways in which an enhanced version of the desired signal is developed using binaural principles, in the extent to which specific processing mechanisms are used to impose suppression motivated by the precedence effect, and in the precise mechanism used to extract interaural time differences.
  • Singh, R., Gouvêa, E.B., Raj, B., “Probabilistic Deduction of Symbol Mappings for Extension of Lexicons”, Proceedings of the 8th. Annual Conference of the International Speech Communication Association, Antwerp, 2007.
    Abstract
    This paper proposes a statistical mapping-based technique for guessing pronunciations of novel words from their spellings. The technique is based on the automatic determination and utilization of unidirectional mappings between n-tuples of characters and n-tuples of phonemes, and may be viewed as a statistical extension of analogy-based pronunciation guessing algorithms.
  • Stern, R.M., Gouvêa, E.B., Thattai, G., “Polyaural Array Processing for Automatic Speech Recognition in Degraded Environments”, Proceedings of the 8th. Annual Conference of the International Speech Communication Association, Antwerp, 2007. [pdf]
    Abstract
    In this paper we present a new method of signal processing for robust speech recognition using multiple microphones. The method, loosely based on the human binaural hearing system, consists of passing the speech signals detected by multiple microphones through bandpass filtering and nonlinear halfwave rectification operations, and then cross-correlating the outputs from each channel within each frequency band. These operations provide rejection of off-axis interfering signals. These operations are repeated (in a non-physiological fashion) for the negative of the signal, and an estimate of the desired signal is obtained by combining the positive and negative outputs. We demonstrate that the use of this approach provides substantially better recognition accuracy than delay-and-sum beamforming using the same sensors for target signals in the presence of additive broadband and speech maskers. Improvements in reverberant environments are tangible but more modest.
  • Mostow, J., Beck, J., Cen, H., Cuneo, A., Gouvêa, E., Heiner, C, “An Educational Data Mining Tool to Browse Tutor-Student Interactions: Time Will Tell!”, Proceedings of the Workshop on Educational Data Mining, Pittsburgh, 2005. [pdf]
    Abstract
    A basic question in mining data from an intelligent tutoring system is, “What happened when...?” We identify requirements for a tool to help answer such questions by finding occurrences of specified phenomena and browsing them in human-understandable form. We describe an implemented tool and how it meets the requirements. The tool applies to MySQL databases whose representation of tutorial events includes student, computer, start time, and end time. It automatically computes and displays the temporal hierarchy implicit in this representation. We illustrate the use of this tool to mine data from Project LISTENs automated Reading Tutor.
  • Mostow, J., Beck, J., Cen, H., Gouvêa, E., Heiner, C., “Interactive Demonstration of a Generic Tool to Browse Tutor-Student Interactions.”, Interactive Events Proceedings of the 12th International Conference on Artificial Intelligence in Education (AIED 2005), Amsterdam.
    Abstract
    Project LISTENs Session Browser is a generic tool to browse a database of students interactions with an automated tutor. Using databases logged by Project LISTENs Reading Tutor, we illustrate how to specify phenomena to investigate, explore events and the context where they occurred, dynamically drill down and adjust which details to display, and summarize events in human-understandable form. The tool should apply to MySQL databases from other tutors as well.
  • Mostow, J., Beck, J., Cuneo, A., Gouvêa, E., Heiner, C, “A Generic Tool to Browse Tutor-Student Interactions: Time Will Tell!”, Proceedings of the 12th International Conference on Artificial Intelligence in Education (AIED 2005), Amsterdam, 2005. [pdf]
    Abstract
    A basic question in mining data from an intelligent tutoring system is, What happened when...? A generic tool to answer such questions should let the user specify which phenomenon to explore; explore selected events and the context in which they occurred; and require minimal effort to adapt the tool to new versions, to new users, or to other tutors. We describe an implemented tool and how it meets these requirements. The tool applies to MySQL databases whose representation of tutorial events includes student, computer, start time, and end time. It infers the implicit hierarchical structure of tutorial interaction so humans can browse it. A companion paper [1] illustrates the use of this tool to explore data from Project LISTENs automated Reading Tutor.
  • Walker, W., Lamere, P., Kwok, P., Raj, B., Singh, R., Gouvêa, E., Wolf, P., Woelfel, J., “Sphinx-4: A Flexible Open Source Framework for Speech Recognition”, Sun Microsystems Technical Report, Menlo Park, 2004. [pdf]
    Abstract
    Sphinx-4 is a flexible, modular and pluggable framework to help foster new innovations in the core research of hidden Markov model (HMM) speech recognition systems. The design of Sphinx-4 is based on patterns that have emerged from the design of past systems as well as new requirements based on areas that researchers currently want to explore. To exercise this framework, and to provide researchers with a “research- ready” system, Sphinx-4 also includes several implementations of both simple and state-of-the-art techniques. The framework and the implementations are all freely available via open source.
  • Lamere, P., Kwok, P., Walker, W., Gouvêa, E., Singh, R., Raj, B., Wolf, P., “Design of the CMU Sphinx-4 Decoder”, Proceedings of the 8th. European Conference on Speech Communication and Technology, Genebra, 2003. [pdf]
    Abstract
    Sphinx-4 is an open source HMM-based speech recognition system written in the Java™ programming language. The design of the Sphinx-4 decoder incorporates several new features in response to current demands on HMM-based large vocabulary systems. Some new design aspects include graph construction for multilevel parallel decoding with multiple feature streams without the use of compound HMMs, the incorporation of a generalized search algorithm that subsumes Viterbi decoding as a special case, token stack decoding for efficient maintenance of multiple paths during search, design of a generalized language HMM graph from grammars and language models of multiple standard formats, that can potentially toggle between flat search structure, tree search structure, etc. This paper describes a few of these design aspects, and reports some preliminary performance measures for speed and accuracy.
  • Seymore, K., Chen, S., Doh, S., Eskenazi, M., Gouv\^ea, E., Raj, B., Ravishankar, M., Rosenfeld, R., Siegler, M., Stern, R., Thayer, E., “The 1997 CMU Sphinx-3 English Broadcast News Transcription System”, Proc. DARPA Speech Recognition Workshop, Chantilly, 1998. [pdf]
    Abstract
    This paper describes the 1997 Hub-4 Broadcast News Sphinx- 3 speech recognition system. This year’s system includes full- bandwidth acoustic models trained on Broadcast News and Wall Street Journal acoustic training data, an expanded vocabulary, and a 4-gram language model for N-best list rescoring. The system struc- ture, acoustic and language models, and adaptation components are described in detail, and results are presented to establish the con- tributions of multiple recognition passes. Additionally, experimen- tal results are presented for several different acoustic and language model configurations.
  • Gouvêa, E.B., Stern, R.M., “Speaker Normalization Through Formant-Based Warping of the Frequency Scale”, Proceedings of the 5th. European Conference on Speech Communication and Technology, Rhodes, 1997. [pdf]
    Abstract
    Speaker-dependent automatic speech recognition systems are known to outperform speaker-independent systems when enough training data are available to model acoustical variability among speakers. Speaker normalization techniques modify the spectral representation of incoming speech waveforms in an attempt to reduce variability between speakers. Recent successful speaker normalization algorithms have incorporated a speaker-specific frequency warping to the initial signal processing stages. These algorithms, however, do not make extensive use of acoustic features contained in the incoming speech.

    In this paper we study the possible benefits of the use of acoustic features in speaker normalization algorithms using frequency warping. We study the extent to which the use of such features, including specifically the use of formant frequencies, can improve recognition accuracy and reduce computational complexity for speaker normalization. We examine the characteristics and limitations of several types of feature sets and warping functions as we compare their performance relative to existing algorithms.

  • Raj, B., Gouvêa, E.B., Stern, R.M., “Cepstral Compensation Using Statistical Linearization”, Proceedings of the ESCA (European Speech Communication Association) Tutorial and Research Workshop on Robust Speech Recognition for Unknown Communication Channels, Pont-a-Mousson, 1997. [pdf]
    Abstract
    Speech recognition systems perform poorly on speech degraded by even simple effects such as linear filtering and additive noise. One solution to this problem is to modify the probability density function (PDF) of clean speech to account for the effects of the degradation. However, even for the case of linear filtering and additive noise, it is extremely difficult to do this analytically. Previously-attempted analytical solutions for the problem of noisy speech recognition have either used an overly-simplified mathematical description of the effects of noise on the statistics of speech, or they have relied on the availability of large environment-specific adaptation sets. In this paper we present the Vector Polynomial approximations (VPS) method to compensate for the effects of linear filtering and additive noise on the PDF of clean speech. VPS also estimates the parameters of the environment, namely the noise and the channel, by using statistically linearized approximations of these effects. We evaluate the performance of this method (VPS) using the CMU SPHINX-II system on the alphanumeric CENSUS database corrupted with artificial white Gaussian noise. VPS provides improvements of up to 15 percent in relative recognition accuracy over our previous best algorithm, VTS, while being up to 20 percent more computationally efficient.
  • Campos, G.L., Gouvêa, E.B., “Speech Synthesis using the CELP Algorithm”, Proceedings of the 4th. International Conference on Spoken Language Processing, Philadelphia, 1996. [pdf]
    Abstract
    This paper presents a phoneme/diphone based speech synthesis system for the (Brazilian) Portuguese language. The basic idea bearing this system is the construction of a library of phonetic units, and processing of those basic units to build an utterance. The system is complemented by a text to phoneme translator described in [Cam95].

    The phonemes representation in the library is based on a linear prediction model; the filter which models the vocal tract is represented by Line Spectrum pairs, and the excitation by Code Excited Linear Prediction (CELP) parameters.

    Thus paper is organized as follows. After a brief introduction, CELP coding is briefly presented in part 2. Part 3 presents the relevant points to be applied in speech synthesis. Parts 4 and 5 constitutes the main contribution of this paper, detailing the process of building the phoneme library and the interpolation techniques used. Part 6 presents some concluding remarks.

  • Raj, B., Gouvêa, E.B., Stern, R.M., “Cepstral Compensation by Polynomial Approximation for Environment-Independent Speech Recognition”, Proceedings of the 4th. International Conference on Spoken Language Processing, Philadelphia, 1996. [pdf]
    Abstract
    Speech recognition systems perform poorly on speech degraded by even simple effects such as linear filtering and additive noise. One possible solution to this problem is to modify the probability density function (PDF) of clean speech to account for the effects of the degradation. However, even for the case of linear filtering and additive noise, it is extremely difficult to do this analytically. Previously attempted analytical solutions to the problem of noisy speech recognition have either used an overly-simplified mathematical description of the effects of noise on the statistics of speech, or they have relied on the availability of large environment-specific adaptation sets. Some of the previous methods required the use of adaptation data that consists of simultaneously-recorded or stereo recordings of clean and degraded speech. In this paper we introduce an approximation-based method to compute the effects of the environment on the parameters of the PDF of clean speech.

    In this work, we perform compensation by Vector Polynomial approximationS (VPS) for the effects of linear filtering and additive noise on the clean speech. We also estimate the parameters of the environment, namely the noise and the channel, by using piecewise-linear approximations of these effects.

    We evaluate the performance of this method (VPS) using the CMU SPHINX-II system and the 100-word alphanumeric CENSUS database. Performance is evaluated at several SNRs, with artificial white Gaussian noise added to the database. VPS provides improvements of up to 15 percent in relative recognition accuracy.

  • Gouvêa, E.B., Moreno, P.J., Raj, B., Sullivan, T.M., Stern, R.M., “Adaptation and Compensation: Approaches to Microphone and Speaker Independence in Automatic Speech Recognition”, Proceedings of the DARPA Speech Recognition Workshop, Harriman, 1996. [pdf]
    Abstract
    This paper describes recent efforts by the CMU speech group to address the important problems of robustness to changes in environment and speaker. Results are presented in the context of the 1995 ARPA common Hub 3 evaluation of speech recorded through different microphones at different signal-to-noise ratios (SNRs). For speech that is considered to be of high quality we addressed the problem of speaker variability through a speaker normalization technique. For speech recorded at lower SNRs, we used a combination of environmental compensation techniques previously developed in our group. Speaker normalization reduced the relative error rate for clean speech by 3.5 percent, and the combination of environmental compensation with the use of noise-corrupted speech in the training process reduced the relative error rate for noisy speech by 54.9 percent.
  • Jain, U., Siegler, M.A., Doh, S.-J., Gouvêa, E.B., Moreno, P.J., Raj, B., Stern, R.M., “Recognition of Continuous Broadcast News With Multiple Unknown Speakers And Environments”, Proceedings of the DARPA Speech Recognition Workshop, Harriman, 1996. [pdf]
    Abstract
    Practical applications of continuous speech recognition in realistic environments place increasing demands for speaker and environment independence. Until recently, this robustness has been measured using evaluation procedures where speaker and environment boundaries are known, with utterances containing complete or nearly complete sentences. This paper describes recent efforts by the CMU speech group to improve the recognition of speech found in long sections of the broadcast news show Marketplace. Most of our effort was concentrated in two areas: the automatic segmentation and classification of environments, and the construction of a suitable lexicon and language model. We review the extensions to SPHINX-II that were necessary to enable it to process continuous broadcast news and we compare the recognition accuracy of the SPHINX-II system for different environmental and speaker conditions.
  • Moreno, P.J., Raj, B., Gouvêa, E.B., Stern, R.M., “Multivariate-Gaussian-Based Cepstral Normalization for Robust Speech Recognition”, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Detroit, 1995. [pdf]
    Abstract
    In this paper we introduce a new family of environmental compensation algorithms called Multivariate Gaussian Based Cepstral Normalization (RATZ). RATZ assumes that the effects of unknown noise and filtering on speech features can be compensated by corrections to the mean and variance of components of Gaussian mixtures, and an efficient procedure for estimating the correction factors is provided. The RATZ algorithm can be implemented to work with or without the use of stereo development data that had been simultaneously recorded in the training and testing environments. Blind RATZ partially overcomes the loss of information that would have been provided by stereo training through the use of a more accurate description of how noisy environments affect clean speech. We evaluate the performance of the two RATZ algorithms using the CMU SPHINX-II system on the alphanumeric census database and compare their performance with that of previous environmental-robustness developed at CMU.

Book Chapter

  • Mostow, J., Beck, J., Cuneo, A., Gouvêa, E.B., Heiner, C., Juarez, O., “Lessons from Project LISTEN’s Session Browser” in Handbook of Educational Data Mining, Chapman & Hall/CRC Data Mining and Knowledge Discovery Series, 2010. [pdf]
    Abstract
    A basic question in mining data from an intelligent tutoring system is, “What happened when?” A tool to answer such questions should let the user specify which phenomena to explore; find instances of them; summarize them in human-understandable form; explore the context where they occurred; dynamically drill down and adjust which details to display; support manual annotation; and require minimal effort to adapt to new tutor versions, new users, new phenomena, or other tutors.

    This chapter describes the Session Browser, an educational data mining tool that supports such case analysis by exploiting three simple but powerful ideas. First, logging tutorial interaction directly to a suitably designed and indexed database instead of to log files eliminates the need to parse them and supports immediate efficient access. Second, a student, computer, and time interval together suffice to identify a tutorial event. Third, a containment relation between time intervals defines a hierarchical structure of tutorial interactions. Together, these ideas make it possible to implement a flexible, efficient tool to browse tutor data in understandable form yet with minimal dependency on tutor-specific details.

    We illustrate how we have used the Session Browser with MySQL databases of millions of events logged by successive versions of Project LISTENs Reading Tutor. We describe tasks we have used it for, improvements made, and lessons learned in the years since the first version of the Session Browser [1-3].

Patent Applications

  • Method for Retrieving Items Represented by Particles from an Information Database, U.S. Pat. 8,055,693, Granted November 2011.
    Abstract
    A set of words is converted to a corresponding set of particles, wherein the words and the particles are unique within each set. For each word, all possible partitionings of the word into particles are determined, and a cost is determined for each possible partitioning. The particles of the possible partitioning associated with a minimal cost are added to the set of particles.
  • Method for Determining Distributions of Unobserved Classes of a Classifier, U.S. Pat. 8,219,510, Granted July 2012.
    Abstract
    In pattern recognition, a classifier is normally trained by selecting classes whose parameters are estimated from observed data. This method allows for the training of classes for which all observations are assigned to other classes. The method builds on discriminative training methods by estimating the unbserved classes’ parameters by how much the classes’ centroids are repelled by all the observed data.
  • Method for Indexing for Retrieving Documents Using Particles, U.S. Pat. 8,229,921, Granted July 2012.
    Abstract
    An information retrieval system stores and retrieves documents using particles and a particle-based language model A set of particles for a collection of documents in a particular language is constructed from training documents such that a perplexity of the particle-based language model is substantially lower than the perplexity of a word-based language model constructed from the same training documents. The documents can then be converted to document particle graphs from which particle-based keys are extracted to form an index to the documents. Users can then retrieve relevant documents using queries also in the form of particle graphs.

Contact


Email: evandro dot gouvea at alumni dot cmu dot edu